Digifesto

Category: economics

What happens if we lose the prior for sparse representations?

Noting this nice paper by Giannone et al., “Economic predictions with big data: The illusion of sparsity.” It concludes:

Summing up, strong prior beliefs favouring low-dimensional models appear to be necessary to support sparse representations. In most cases, the idea that the data are informative enough to identify sparse predictive models might be an illusion.

This is refreshing honesty.

In my experience, most disciplinary social sciences have a strong prior bias towards pithy explanatory theses. In a normal social science paper, what you want is a single research question, a single hypothesis. This thesis expresses the narrative of the paper. It’s what makes the paper compelling.

In mathematical model fitting, the term for such a simply hypothesis is a sparse predictive model. These models will have relatively few independent variables predicting the dependent variable. In machine learning, this sparsity is often accomplished by a regularization step. While generally well-motivate, regularization for sparsity can be done for reasons that are more aesthetic or reflect a stronger prior than is warranted.

A consequence of this preference for sparsity, in my opinion, is the prevalence of literature on power law distributions vs. log normal explanations. (See this note on disorganized heavy tail distributions.) A dense model on a log linear regression will predict a heavy tail dependent variable without great error. But it will be unsatisfying from the perspective of scientific explanation.

What seems to be an open question in the social sciences today is whether the culture of social science will change as a result of the robust statistical analysis of new data sets. As I’ve argued elsewhere (Benthall, 2016), if the culture does change, it will mean that narrative explanation will be less highly valued.

References

Benthall, Sebastian. “Philosophy of computational social science.” Cosmos and History: The Journal of Natural and Social Philosophy 12.2 (2016): 13-30.

Giannone, Domenico, Michele Lenza, and Giorgio E. Primiceri. “Economic predictions with big data: The illusion of sparsity.” (2017).

Advertisements

The social value of an actually existing alternative — BLOCKCHAIN BLOCKCHAIN BLOCKCHAIN

When people get excited about something, they will often talk about it in hyberbolic terms. Some people will actually believe what they say, though this seems to drop off with age. The emotionally energetic framing of the point can be both factually wrong and contain a kernel of truth.

This general truth applies to hype about particular technologies. Does it apply to blockchain technologies and cryptocurrencies? Sure it does!

Blockchain boosters have offered utopian or radical visions about what this technology can achieve. We should be skeptical about these visions prima facie precisely in proportion to how utopian and radical they are. But that doesn’t mean that this technology isn’t accomplishing anything new or interesting.

Here is a summary of some dialectics around blockchain technology:

A: “Blockchains allow for fully decentralized, distributed, and anonymous applications. These can operate outside of the control of the law, and that’s exciting because it’s a new frontier of options!”

B1: “Blockchain technology isn’t really decentralized, distributed, or anonymous. It’s centralizing its own power into the hands of the few, and meanwhile traditional institutions have the power to crush it. Their anarchist mentality is naive and short-sighted.”

B2: “Blockchain technology enthusiasts will soon discover that they actually want all the legal institutions they designed their systems to escape. Their anarchist mentality is naive and short-sighted.”

While B1 and B2 are both critical of blockchain technology and see A as naive, it’s important to realize that they believe A is naive for contradictory reasons. B1 is arguing that it does not accomplish what it was purportedly designed to do, which is provide a foundation of distributed, autonomous systems that’s free from internal and external tyranny. B2 is arguing that nobody actually wants to be free of these kinds of tyrannies.

These are conservative attitudes that we would except from conservative (in the sense of conservation, or “inhibiting change”) voices in society. These are probably demographically different people from person A. And this makes all the difference.

If what differentiates people is their relationship to different kinds of social institutions or capital (in the Bourdieusian sense), then it would be natural for some people to be incumbents in old institutions who would argue for their preservation and others to be willing to “exit” older institutions and join new ones. However imperfect the affordances of blockchain technology may be, they are different affordances than those of other technologies, and so they promise the possibility of new kinds of institutions with an alternative information and communications substrate.

It may well be that the pioneers in the new substrate will find that they have political problems of their own and need to reinvent some of the societal controls that they were escaping. But the difference will be that in the old system, the pioneers were relative outsiders, whereas in the new system, they will be incumbents.

The social value of blockchain technology therefore comes in two waves. The first wave is the value it provides to early adopters who use it instead of other institutions that were failing them. These people have made the choice to invest in something new because the old options were not good enough for them. We can celebrate their successes as people who have invented quite literally a new form of social capital, quite possibly literally a new form of wealth. When a small group of people create a lot of new wealth this almost immediately creates a lot of resentment from others who did not get in on it.

But there’s a secondary social value to the creation of actually existing alternative institutions and forms of capital (which are in a sense the same thing). This is the value of competition. The marginal person, who can choose how to invest themselves, can exit from one failing institution to a fresh new one if they believe it’s worth the risk. When an alternative increases the amount of exit potential in society, that increases the competitive pressure on institutions to perform. That should benefit even those with low mobility.

So, in conclusion, blockchain technology is good because it increases institutional competition. At the end of the day that reduces the power of entrenched incumbents to collect rents and gives everybody else more flexibility.

technological determinism and economic determinism

If you are trying to explain society, politics, the history of the world, whatever, it’s a good idea to narrow the scope of what you are talking about to just the most important parts because there is literally only so much you could ever possibly say. Life is short. A principled way of choosing what to focus on is to discuss only those parts that are most significant in the sense that they played the most causally determinative role in the events in question. By widely accepted interventionist theories of causation, what makes something causally determinative of something else is the fact that in a counterfactual world in which the cause was made to be somehow different, the effect would have been different as well.

Since we basically never observe a counterfactual history, this leaves a wide open debate over the general theoretical principles one would use to predict the significance of certain phenomena over others.

One point of view on this is called technological determinism. It is the view that, for a given social phenomenon, what’s really most determinative of it is the technological substrate of it. Engineers-turned-thought-leaders love technological determinism because of course it implies that really the engineers shape society, because they are creating the technology.

Technological determinism is absolutely despised by academic social scientists who have to deal with technology and its role in society. I have a hard time understanding why. Sometimes it is framed as an objection to technologist who are avoiding responsibility for social problems they create because it’s the technology that did it, not them. But such a childish tactic really doesn’t seem to be what’s at stake if you’re critiquing technological determinism. Another way of framing the problem is the say that the way a technology affects society in San Francisco is going to be different from how it affects society in Beijing. Society has its role in a a dialectic.

So there is a grand debate of “politics” versus “technology” which reoccurs everywhere. This debate is rather one sided, since it is almost entirely constituted by political scientists or sociologists complaining that the engineers aren’t paying enough attention to politics, seeing how their work has political causes and effects. Meanwhile, engineers-turned-thought-leaders just keep spouting off whatever nonsense comes to their head and they do just fine because, unlike the social scientist critics, engineers-turned-thought-leaders tend to be rich. That’s why they are thought leaders: because their company was wildly successful.

What I find interesting is that economic determinism is never part of this conversation. It seems patently obvious that economics drives both politics and technology. You can be anywhere on the political spectrum and hold this view. Once it was called “dialectical materialism”, and it was the foundation for left-wing politics for generations.

So what has happened? Here are a few possible explanations.

The first explanation is that if you’re an economic determinist, maybe you are smart enough to do something more productive with your time than get into debates about whether technology or politics is more important. You would be doing something more productive, like starting a business to develop a technology that manipulates political opinion to favor the deregulation of your business. Or trying to get a socialist elected so the government will pay off student debts.

A second explanation is… actually, that’s it. That’s the only reason I can think of. Maybe there’s another one?

The Data Processing Inequality and bounded rationality

I have long harbored the hunch that information theory, in the classic Shannon sense, and social theory are deeply linked. It has proven to be very difficult to find an audience for this point of view or an opportunity to work on it seriously. Shannon’s information theory is widely respected in engineering disciplines; many social theorists who are unfamiliar with it are loathe to admit that something from engineering should carry essential insights for their own field. Meanwhile, engineers are rarely interested in modeling social systems.

I’ve recently discovered an opportunity to work on this problem through my dissertation work, which is about privacy engineering. Privacy is a subtle social concept but also one that has been rigorously formalized. I’m working on formal privacy theory now and have been reminded of a theorem from information theory: the Data Processing Theorem. What strikes me about this theorem is that is captures an point that comes up again and again in social and political problems, though it’s a point that’s almost never addressed head on.

The Data Processing Inequality (DPI) states that for three random variables, X, Y, and Z, arranged in Markov Chain such that X \rightarrow Y \rightarrow Z, then I(X,Z) \leq I(X,Y), where here I stands for mutual information. Mutual information is a measure of how much two random variables carry information about each other. If $I(X,Y) = 0$, that means the variables are independent. $I(X,Y) \geq 0$ always–that’s just a mathematical fact about how it’s defined.

The implications of this for psychology, social theory, and artificial intelligence are I think rather profound. It provides a way of thinking about bounded rationality in a simple and generalizable way–something I’ve been struggling to figure out for a long time.

Suppose that there’s a big world out the, W and there’s am organism, or a person, or a sociotechnical organization within it, Y. The world is big and complex, which implies that it has a lot of informational entropy, H(W). Through whatever sensory apparatus is available to Y, it acquires some kind of internal sensory state. Because this organism is much small than the world, its entropy is much lower. There are many fewer possible states that the organism can be in, relative to the number of states of the world. H(W) >> H(Y). This in turn bounds the mutual information between the organism and the world: I(W,Y) \leq H(Y)

Now let’s suppose the actions that the organism takes, Z depend only on its internal state. It is an agent, reacting to its environment. Well whatever these actions are, they can only be so calibrated to the world as the agent had capacity to absorb the world’s information. I.e., I(W,Z) \leq H(Y) << H(W). The implication is that the more limited the mental capacity of the organism, the more its actions will be approximately independent of the state of the world that precedes it.

There are a lot of interesting implications of this for social theory. Here are a few cases that come to mind.

I've written quite a bit here (blog links) and here (arXiv) about Bostrom’s superintelligence argument and why I’m generally not concerned with the prospect of an artificial intelligence taking over the world. My argument is that there are limits to how much an algorithm can improve itself, and these limits put a stop to exponential intelligence explosions. I’ve been criticized on the grounds that I don’t specify what the limits are, and that if the limits are high enough then maybe relative superintelligence is possible. The Data Processing Inequality gives us another tool for estimating the bounds of an intelligence based on the range of physical states it can possibly be in. How calibrated can a hegemonic agent be to the complexity of the world? It depends on the capacity of that agent to absorb information about the world; that can be measured in information entropy.

A related case is a rendering of Scott’s Seeing Like a State arguments. Why is it that “high modernist” governments failed to successfully control society through scientific intervention? One reason is that the complexity of the system they were trying to manage vastly outsized the complexity of the centralized control mechanisms. Centralized control was very blunt, causing many social problems. Arguably, behavioral targeting and big data centers today equip controlling organizations with more informational capacity (more entropy), but they
still get it wrong sometimes, causing privacy violations, because they can’t model the entirety of the messy world we’re in.

The Data Processing Inequality is also helpful for explaining why the world is so messy. There are a lot of different agents in the world, and each one only has so much bandwidth for taking in information. This means that most agents are acting almost independently from each other. The guiding principle of society isn’t signal, it’s noise. That explains why there are so many disorganized heavy tail distributions in social phenomena.

Importantly, if we let the world at any time slice be informed by the actions of many agents acting nearly independently from each other in the slice before, then that increases the entropy of the world. This increases the challenge for any particular agent to develop an effective controlling strategy. For this reason, we would expect the world to get more out of control the more intelligence agents are on average. The popularity of the personal computer perhaps introduced a lot more entropy into the world, distributed in an agent-by-agent way. Moreover, powerful controlling data centers may increase the world’s entropy, rather than redtucing it. So even if, for example, Amazon were to try to take over the world, the existence of Baidu would be a major obstacle to its plans.

There are a lot of assumptions built into these informal arguments and I’m not wedded to any of them. But my point here is that information theory provides useful tools for thinking about agents in a complex world. There’s potential for using it for modeling sociotechnical systems and their limitations.

Net neutrality

What do I think of net neutrality?

I think it’s bad for my personal self-interest. I am, economically, a part of the newer tech economy of software and data. I believe this economy benefits from net neutrality. I also am somebody who loves The Web as a consumer. I’ve grown up with it. It’s shaped my values.

From a broader perspective, I think ending net neutrality will revitalize U.S. telecom and give it leverage over the ‘tech giants’–Google, Facebook, Apple, Amazon—that have been rewarded by net neutrality policies. Telecom is a platform, but it had been turned into a utility platform. Now it can be a full-featured market player. This gives it an opportunity for platform envelopment, moving into the markets of other companies and bundling them in with ISP services.

Since this will introduce competition into the market and other players are very well-established, this could actually be good for consumers because it breaks up an oligopoly in the services that are most user-facing. On the other hand, since ISPs are monopolists in most places, we could also expect Internet-based service experience quality to deteriorate in general.

What this might encourage is a proliferation of alternatives to cable ISPs, which would be interesting. Ending net neutrality creates a much larger design space in products that provision network access. Mobile companies are in this space already. So we could see this regulation as a move in favor of the cell phone companies, not just the ISPs. This too could draw surplus away the big four.

This probably means the end of “The Web”. But we’d already seen the end of “The Web” with the proliferation of apps as a replacement for Internet browsing. IoT provides yet another alternative to “The Web”. I loved the Web as a free, creative place where everyone could make their own website about their cat. It had a great moment. But it’s safe to say that it isn’t what it used to be. In fifteen years it may be that most people no longer visit web sites. They just use connected devices and apps. Ending net neutrality means that the connectivity necessary for these services can be bundled in with the service itself. In the long run, that should be good for consumers and even the possibility of market entry for new firms.

In the long run, I’m not sure “The Web” is that important. Maybe it was a beautiful disruptive moment that will never happen again. Or maybe, if there were many more kinds of alternatives, “The Web” would return to being the quirky, radically free and interesting thing it was before it got so mainstream. Remember when The Web was just The Well (which is still around), and only people who were really curious about it bothered to use it? I don’t, because that was well before my time. But it’s possible that the Internet in its browse-happy form will become something like that again.

I hadn’t really thought about net neutrality very much before, to be honest. Maybe there are some good rebuttals to this argument. I’d love to hear them! But for now, I think I’m willing to give the shuttering of net neutrality a shot.

Enlightening economics reads

Nils Gilman argues that the future of the world is wide open because neoliberalism has been discredited. So what’s the future going to look like?

Given that neoliberalism is for the most part an economic vision, and that competing theories have often also been economic visions (when they have not been political or theological theories), a compelling futurist approach is to look out for new thinking about economics. The three articles below have recently taught me something new about economics:

Dani Rodrik. “Rescuing Economics from Neoliberalism”, Boston Review. (link)

This article makes the case that the association frequently made between economics as a social science and neoliberalism as an ideology is overdrawn. Of course, probably the majority of economists are not neoliberals. Rodrik is defending a view of economics that keeps its options open. I think he overstates the point with the claim, “Good economists know that the correct answer to any question in economics is: it depends.” This is just simply incorrect, if questions have their assumptions bracketed well enough. But since Rodrik’s rhetorical point appears to be that economists should not be dogmatists, he can be forgiven this overstatement.

As an aside, there is something compelling but also dangerous to the view that a social science can provide at best narrowly tailored insights into specific phenomena. These kinds of ‘sciences’ wind up being unaccountable, because the specificity of particular events prevent the repeated testing of the theories that are used to explain them. There is a risk of too much nuance, which is akin to the statistical concept of overfitting.

A different kind of article is:

Seth Ackerman. “The Disruptors” Jacobin. (link)

An interview with J.W. Mason in the smart socialist magazine, Jacobin, that had the honor of a shout out from Matt Levine’s popular “Money Talk” Bloomberg column (column?). On of the interesting topics it raises is whether or not mutual funds, in which many people invest in a fund that then owns a wide portfolio of stocks, are in a sense socialist and anti-competitive because shareholders no longer have an interest in seeing competition in the market.

This is original thinking, and the endorsement by Levine is an indication that it’s not a crazy thing to consider even for the seasoned practical economists in the financial sector. My hunch at this point in life is that if you want to understand the economy, you have to understand finance, because they are the ones whose job it is to profit from their understanding of the economy. As a corollary, I don’t really understand the economy because I don’t have a great grasp of the financial sector. Maybe one day that will change.

Speaking of expertise being enhanced by having ‘skin in the game’, the third article is:

Nassim Nicholas Taleb. “Inequality and Skin in the Game,” Medium. (link)

I haven’t read a lot of Taleb though I acknowledge he’s a noteworthy an important thinker. This article confirmed for me the reputation of his style. It was also a strikingly fresh look at economics of inequality, capturing a few of the important things mainstream opinion overlooks about inequality, namely:

  • Comparing people at different life stages is a mistake when analyzing inequality of a population.
  • A lot of the cause of inequality is randomness (as opposed to fixed population categories), and this inequality is inevitable

He’s got a theory of what kinds of inequality people resent versus what they tolerate, which is a fine theory. It would be nice to see some empirical validation of it. He writes about the relationship between ergodicity and inequality, which is interesting. He is scornful of Piketty and everyone who was impressed by Piketty’s argument, which comes off as unfriendly.

Much of what Taleb writes about the need to understand the economy through a richer understanding of probability and statistics strikes me as correct. If it is indeed the case that mainstream economics has not caught up to this, there is an opportunity here!

Personal data property rights as privacy solution. Re: Cofone, 2017

I’m working my way through Ignacio Cofone’s “The Dynamic Effect of Information Privacy Law” (2017) (link), which is an economic analysis of privacy. Without doing justice to the full scope of the article, it must be said that it is a thorough discussion of previous information economics literature and a good case for property rights over personal data. In a nutshell, one can say that markets are good for efficient and socially desirable resource allocation, but they are only good at this when there are well crafted property rights to the goods involved. Personal data, like intellectual property, is a tricky case because of the idiosyncrasies of data–its has zero-ish marginal cost, it seems to get more valuable when it’s aggregated, etc. But like intellectual property, we should expect under normal economic rationality assumptions that the more we protect the property rights of those who create personal data, the more they will be incentivized to create it.

I am very warm to this kind of argument because I feel there’s been a dearth of good information economics in my own education, though I have been looking for it! I do believe there are economic laws and that they are relevant for public policy, let alone business strategy.

I have concerns about Cofone’s argument specifically, which are these:

First, I have my doubts that seeing data as a good in any classical economic sense is going to work. Ontologically, data is just too weird for a lot of earlier modeling methods. I have been working on a different way of modeling information flow economics that tries to capture how much of what we’re concerned with are information services, not information goods.

My other concern is that Cofone’s argument gives users/data subjects credit for being rational agents, capable of addressing the risks of privacy and acting accordingly. Hoofnagle and Urban (2014) show that this is empirically not the case. In fact, if you take the average person who is not that concerned about their privacy on-line and start telling them facts about how their data is being used by third-parties, etc., they start to freak out and get a lot more worried about privacy.

This throws a wrench in the argument that stronger personal data property rights would lead to more personal data creation, therefore (I guess it’s implied) more economic growth. People seem willing to create personal data and give it away, despite actual adverse economic incentives, because cat videos are just so damn appealing. Or something. It may generally be the case that economic modeling is used by information businesses but not information policy people because average users are just so unable to act rationally; it really is a domain better suited to behavioral economics and usability research.

I’m still holding out though. Just because big data subjects are not homo economicus doesn’t mean that an economic analysis of their activity is pointless. It just means we need to have a more sophisticated economic model, on that takes into account how there are many different classes of user that are differently informed. This kind of economic modeling, and empirically fitting it to data, is within our reach. We have the technology.

References

Cofone, Ignacio N. “The Dynamic Effect of Information Privacy Law.” Minn. JL Sci. & Tech. 18 (2017): 517.

Hoofnagle, Chris Jay, and Jennifer M. Urban. “Alan Westin’s privacy homo economicus.” (2014).

“The Microeconomics of Complex Economies”

I’m dipping into The microeconomics of complex economies: Evolutionary, institutional, neoclassical, and complexity perspectives, by Elsner, Heinrich, and Schwardt, all professors at the University of Bremen.

It is a textbook, as one would teach a class from. It is interesting because it is self-consciously written as a break from neoclassical microeconomics. According to the authors, this break had been a long time coming but the last straw was the 2008 financial crisis. This at last, they claim, showed that neoclassical faith in market equilibrium was leaving something important out.

Meanwhile, “heterodox” economics has been maturing for some time in the economics blogosphere, while complex systems people have been interested in economics since the emergence of the field. What Elsner, Heinrich, and Schwardt appear to be doing with this textbook is providing a template for an undergraduate level course on the subject, legitimizing it as a discipline. They are not alone. They cite Bowles’s Microeconomics as worthy competition.

I have not yet read the chapter of the Elsner, Heinirch, and Schwardt book that covers philosophy of science and its relationship to the validity of economics. It looks from a glance at it very well done. But I wanted to note my preliminary opinion on the matter given my recent interest in Shapiro and Varian‘s information economics and their claim to be describing ‘laws of economics’ that provide a reliable guide to business strategy.

In brief, I think Shapiro and Varian are right: they do outline laws of economics that provide a reliable guide to business strategy. This is in fact what neoclassical economics is good for.

What neoclassical economics is not always great at is predicting aggregate market behavior in a complex world. It’s not clear if any theory could ever be good at predicting aggregate market behavior in a complex world. It is likely that if there were one, it would be quickly gamed by investors in a way that would render it invalid.

Given vast information asymmetries it seems the best one could hope for is a theory of the market being able to assimilate the available information and respond wisely. This is the Hayekian view, and it’s not mainstream. It suffers the difficulty that it is hard to empirically verify that a market has performed optimally given that no one actor, including the person attempting the verify Hayekian economic claims, has all the information to begin with. Meanwhile, it seems that there is no sound a priori reason to believe this is the case. Epstein and Axtell (1996) have some computational models where they test when agents capable of trade wind up in an equilibrium with market-clearing prices and in their models this happens under only very particular an unrealistic conditions.

That said, predicting aggregate market outcomes is a vastly different problem than providing strategic advice to businesses. This is the point where academic critiques of neoclassical economics miss the mark. Since phenomena concerning supply and demand, pricing and elasticity, competition and industrial organization, and so on are part of the lived reality of somebody working in industry, formalizations of these aspects of economic life can be tested and propagated by many more kinds of people than the phenomena of total market performance. The latter is actionable only for a very rare class of policy-maker or financier.

References

Bowles, S. (2009). Microeconomics: behavior, institutions, and evolution. Princeton University Press.

Elsner, W., Heinrich, T., & Schwardt, H. (2014). The microeconomics of complex economies: Evolutionary, institutional, neoclassical, and complexity perspectives. Academic Press.

Epstein, Joshua M., and Robert Axtell. Growing artificial societies: social science from the bottom up. Brookings Institution Press, 1996.

Market segments and clusters of privacy concerns

One result from earlier economic analysis is that in the cases where personal information is being used to judge the economic value of an agent (such as when they are going to be hired, or offered a loan), the market is divided between those that would prefer more personal information to flow (because they are highly qualified, or highly credit-worthy), and those that would rather information not flow.

I am naturally concerned about whether this microeconomic modeling has any sort of empirical validity. However, there is some corroborating evidence in the literature on privacy attitudes. Several surveys (see references) have discovered that people’s privacy attitudes cluster into several groups, those only “marginally concerned”, the “pragmatists”, and the “privacy fundamentalists”. These groups have, respectively, stronger and stronger views on the restriction of their flow of personal information.

It would be natural to suppose that some of the variation in privacy attitudes has to do with expected outcomes of information flow. I.e., if people are worried that their personal information will make them ineligible for a job, they are more likely to be concerned about this information flowing to potential employers.

I need to dig deeper into the literature to see whether factors like income have been shown to be correlated with privacy attitudes.

References

Ackerman, M. S., Cranor, L. F., & Reagle, J. (1999, November). Privacy in e-commerce: examining user scenarios and privacy preferences. In Proceedings of the 1st ACM conference on Electronic commerce (pp. 1-8). ACM.

B. Berendt et al., “Privacy in E-Commerce: Stated Preferences versus Actual Behavior,” Comm. ACM, vol. 484, pp. 101-106, 2005.

K.B. Sheehan, “Toward a Typology of Internet Users and Online Privacy Concerns,” The Information Soc., vol. 1821, pp. 21-32, 2002.

Economic costs of context collapse

One motivation for my recent studies on information flow economics is that I’m interested in what the economic costs are when information flows across the boundaries of specific markets.

For example, there is a folk theory of why it’s important to have data protection laws in certain domains. Health care, for example. The idea is that it’s essential to have health care providers maintain the confidentiality of their patients because if they didn’t then (a) the patients could face harm due to this information getting into the wrong hands, such as those considering them for employment, and (b) this would disincentivize patients from seeking treatment, which causes them other harms.

In general, a good approximation of general expectations of data privacy is that data should not be used for purposes besides those for which the data subjects have consented. Something like this was encoded in the 1973 Fair Information Practices, for example. A more modern take on this from contextual integrity (Nissenbaum, 2004) argues that privacy is maintained when information flows appropriately with respect to the purposes of its context.

A widely acknowledged phenomenon in social media, context collapse (Marwick and boyd, 2011; Davis and Jurgenson, 2014), is when multiple social contexts in which a person is involved begin to interfere with each other because members of those contexts use the same porous information medium. Awkwardness and sometimes worse can ensue. These are some of the major ways the world has become aware of what a problem the Internet is for privacy.

I’d like to propose that an economic version of context collapse happens when different markets interfere with each other through network-enabled information flow. The bogeyman of Big Brother through Big Data, the company or government that has managed to collect data about everything about you in order to infer everything else about you, has as much to do with the ways information is being used in cross-purposed ways as it has to do with the quantity or scope of data collection.

It would be nice to get a more formal grip on the problem. Since we’ve already used it as an example, let’s try to model the case where health information is disclosed (or not) to a potential employer. We already have the building blocks for this case in our model of expertise markets and our model of labor markets.

There are now two uncertain variables of interest. First, let’s consider a variety of health treatments J such that m = \vert J \vert. The distribution of health conditions in society is distributed such that the utility of a random person i receiving a treatment j is w_{i,j}. Utility for one treatment is not independent from utility from another. So in general \vec{w} \sim W, meaning a person’s utility for all treatments is sampled from an underlying distribution W.

There is also the uncertain variable of how effective somebody will be at a job they are interested in. We’ll say this is distributed according to X, and that a person’s aptitude for the job is x_i \sim X.

We will also say that W and X are not independent from each other. In this model, there are certain health conditions that are disabling with respect to a job, and this has an effect on expected performance.

I must note here that I am not taking any position on whether or not employers should take disabilities into account when hiring people. I don’t even know for sure the consequences of this model yet. You could imagine this scenario taking place in a country which does not have the Americans with Disabilities Act and other legislation that affects situations like this.

As per the models that we are drawing from, let’s suppose that normal people don’t know how much they will benefit from different medical treatments; i doesn’t know \vec{w}_i. They may or may not know x_i (I don’t yet know if this matters). What i does know is their symptoms, y_i \sim Y.

Let’s say person x_i goes to the doctor, reporting y_i, on the expectation that the doctor will prescribe them treatment \hat{j} that maximizes their welfare:

\hat j = arg \max_{j \in J} E[X_j \vert y]

Now comes the tricky part. Let’s say the doctor is corrupt and willing to sell the medical records of her patients to her patient’s potential employers. By assumption y_i reveals information both about w_i and x_i. We know from our earlier study that information about x_i is indeed valuable to the employer. There must be some price (at least within our neoclassical framework) that the employer is willing to pay the corrupt doctor for information about patient symptoms.

We also know that having potential employers know more about your aptitudes is good for highly qualified applicants and bad for not as qualified applicants. The more information employers know about you, the more likely they will be able to tell if you are worth hiring.

The upshot is that there may be some patients who are more than happy to have their medical records sold off to their potential employers because those particular symptoms are correlated with high job performance. These will be attracted to systems that share their information across medical and employment purposes.

But for those with symptoms correlated with lower job performance, there is now a trickier decision. If doctors are corrupt, it may be that they choose not to reveal their symptoms accurately (or at all) because this information might hurt their chances of employment.

A few more wrinkles here. Suppose it’s true the fewer people will go to corrupt doctors because they suspect or know that information will leak to their employers. If there are people who suspect or know that the information that leaks to their employers will reflect on them favorably, that creates a selection effect on who goes to the doctor. This means that the information that i has gone to the doctor, or not, is a signal employers can use to discriminate between potential applicants. So to some extent the harms of the corrupt doctors fall on the less able even if they opt out of health care. They can’t opt out entirely of the secondary information effects.

We can also add the possibility that not all doctors are corrupt. Only some are. But if it’s unknown which doctors are corrupt, the possibility of corruption still affects the strategies of patients/employees in a similar way, only now in expectation. Just as in the Akerlof market for lemons, a few corrupt doctors ruins the market.

I have not made these arguments mathematically specific. I leave that to a later date. But for now I’d like to draw some tentative conclusions about what mandating the protection of health information, as in HIPAA, means for the welfare outcomes in this model.

If doctors are prohibited from selling information to employers, then the two markets do not interfere with each other. Doctors can solicit symptoms in a way that optimizes benefits to all patients. Employers can make informed choices about potential candidates through an independent process. The latter will serve to select more promising applicants from less promising applicants.

But if doctors can sell health information to employers, several things change.

  • Employers will benefit from information about employee health and offer to pay doctors for the information.
  • Some doctors will discretely do so.
  • The possibility of corrupt doctors will scare off those patients who are afraid their symptoms will reveal a lack of job aptitude.
  • These patients no longer receive treatment.
  • This reduces the demand for doctors, shrinking the health care market.
  • The most able will continue to see doctors. If their information is shared with employers, they will be more likely to be hired.
  • Employers may take having medical records available to be bought from corrupt doctors as a signal that the patient is hiding something that would reveal poor aptitude.

In sum, without data protection laws, there are fewer people receiving beneficial treatment and fewer jobs for doctors providing beneficial treatment. Employers are able to make more advantageous decisions, and the most able employees are able to signal their aptitude through the corrupt health care system. Less able employees may wind up being identified anyway through their non-participation in the medical system. If that’s the case, they may wind up returning to doctors for treatment anyway, though they would need to have a way of paying for it besides employment.

That’s what this model says, anyway. The biggest surprise for me is the implication that data protection laws serve this interests of service providers by expanding their customer base. That is a point that is not made enough! Too often, the need for data protection laws is framed entirely in terms of the interests of the consumer. This is perhaps a politically weaker argument, because consumers are not united in their political interest (some consumers would be helped, not harmed, by weaker data protection).

References

Akerlof, G. A. (1970). The market for” lemons”: Quality uncertainty and the market mechanism. The quarterly journal of economics, 488-500.

Davis, J. L., & Jurgenson, N. (2014). Context collapse: theorizing context collusions and collisions. Information, Communication & Society, 17(4), 476-485.

Marwick, A. E., & Boyd, D. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New media & society, 13(1), 114-133.

Nissenbaum, H. (2004). Privacy as contextual integrity. Wash. L. Rev., 79, 119.