Digifesto

Tag: economics of information

For a more ethical Silicon Valley, we need a wiser economics of data

Kara Swisher’s NYT op-ed about the dubious ethics of Silicon Valley and Nitasha Tiku’s WIRED article reviewing books with alternative (and perhaps more cynical than otherwise stated) stories about the rise of Silicon Valley has generated discussion and buzz among the tech commentariat.

One point of debate is whether the focus should be on “ethics” or on something more substantively defined, such as human rights. Another point is whether the emphasis should be on “ethics” or on something more substantively enforced, like laws which impose penalties between 1% and 4% of profits, referring of course to the GDPR.

While I’m sympathetic to the European approach (laws enforcing human rights with real teeth), I think there is something naive about it. We have not yet seen whether it’s ever really possible to comply with the GDPR could wind up being a kind of heavy tax on Big Tech companies operating in the EU, but one that doesn’t truly wind up changing how people’s data are used. In any case, the broad principles of European privacy are based on individual human dignity, and so they do not take into account the ways that corporations are social structures, i.e. sociotechnical organizations that transcend individual people. The European regulations address the problem of individual privacy while leaving mystified the question of why the current corporate organization of the world’s personal information is what it is. This sets up the fight over ‘technology ethics’ to be a political conflict between different kinds of actors whose positions are defined as much by their social habitus as by their intellectual reasons.

My own (unpopular!) view is that the solution to our problems of technology ethics are going to have to rely on a better adapted technology economics. We often forget today that economics was originally a branch of moral philosophy. Adam Smith wrote The Theory of Moral Sentiments (1759) before An Inquiry into the Nature and Causes of the Wealth of Nations (1776). Since then the main purpose of economics has been to intellectually grasp the major changes to society due to production, trade, markets, and so on in order to better steer policy and business strategy towards more fruitful equilibria. The discipline has a bad reputation among many “critical” scholars due to its role in supporting neoliberal ideology and policies, but it must be noted that this ideology and policy work is not entirely cynical; it was a successful centrist hegemony for some time. Now that it is under threat, partly due to the successes of the big tech companies that benefited under its regime, it’s worth considering what new lessons we have to learn to steer the economy in an improved direction.

The difference between an economic approach to the problems of the tech economy and either an ‘ethics’ or a ‘law’ based approach is that it inherently acknowledges that there are a wide variety of strategic actors co-creating social outcomes. Individual “ethics” will not be able to settle the outcomes of the economy because the outcomes depend on collective and uncoordinated actions. A fundamentally decent person may still do harm to others due to their own bounded rationality; “the road to hell is paved with good intentions”. Meanwhile, regulatory law is not the same as command; it is at best a way of setting the rules of a game that will be played, faithfully or not, by many others. Putting regulations in place without a good sense of how the game will play out differently because of them is just as irresponsible as implementing a sweeping business practice without thinking through the results, if not more so because the relationship between the state and citizens is coercive, not voluntary as the relationship between businesses and customers is.

Perhaps the biggest obstacle to shifting the debate about technology ethics to one about technology economics is that it requires a change in register. It drains the conversation of the pathos which is so instrumental in surfacing it as an important political topic. Sound analysis often ruins parties like this. Nevertheless, it must be done if we are to progress towards a more just solution to the crises technology gives us today.

Economic costs of context collapse

One motivation for my recent studies on information flow economics is that I’m interested in what the economic costs are when information flows across the boundaries of specific markets.

For example, there is a folk theory of why it’s important to have data protection laws in certain domains. Health care, for example. The idea is that it’s essential to have health care providers maintain the confidentiality of their patients because if they didn’t then (a) the patients could face harm due to this information getting into the wrong hands, such as those considering them for employment, and (b) this would disincentivize patients from seeking treatment, which causes them other harms.

In general, a good approximation of general expectations of data privacy is that data should not be used for purposes besides those for which the data subjects have consented. Something like this was encoded in the 1973 Fair Information Practices, for example. A more modern take on this from contextual integrity (Nissenbaum, 2004) argues that privacy is maintained when information flows appropriately with respect to the purposes of its context.

A widely acknowledged phenomenon in social media, context collapse (Marwick and boyd, 2011; Davis and Jurgenson, 2014), is when multiple social contexts in which a person is involved begin to interfere with each other because members of those contexts use the same porous information medium. Awkwardness and sometimes worse can ensue. These are some of the major ways the world has become aware of what a problem the Internet is for privacy.

I’d like to propose that an economic version of context collapse happens when different markets interfere with each other through network-enabled information flow. The bogeyman of Big Brother through Big Data, the company or government that has managed to collect data about everything about you in order to infer everything else about you, has as much to do with the ways information is being used in cross-purposed ways as it has to do with the quantity or scope of data collection.

It would be nice to get a more formal grip on the problem. Since we’ve already used it as an example, let’s try to model the case where health information is disclosed (or not) to a potential employer. We already have the building blocks for this case in our model of expertise markets and our model of labor markets.

There are now two uncertain variables of interest. First, let’s consider a variety of health treatments J such that m = \vert J \vert. The distribution of health conditions in society is distributed such that the utility of a random person i receiving a treatment j is w_{i,j}. Utility for one treatment is not independent from utility from another. So in general \vec{w} \sim W, meaning a person’s utility for all treatments is sampled from an underlying distribution W.

There is also the uncertain variable of how effective somebody will be at a job they are interested in. We’ll say this is distributed according to X, and that a person’s aptitude for the job is x_i \sim X.

We will also say that W and X are not independent from each other. In this model, there are certain health conditions that are disabling with respect to a job, and this has an effect on expected performance.

I must note here that I am not taking any position on whether or not employers should take disabilities into account when hiring people. I don’t even know for sure the consequences of this model yet. You could imagine this scenario taking place in a country which does not have the Americans with Disabilities Act and other legislation that affects situations like this.

As per the models that we are drawing from, let’s suppose that normal people don’t know how much they will benefit from different medical treatments; i doesn’t know \vec{w}_i. They may or may not know x_i (I don’t yet know if this matters). What i does know is their symptoms, y_i \sim Y.

Let’s say person x_i goes to the doctor, reporting y_i, on the expectation that the doctor will prescribe them treatment \hat{j} that maximizes their welfare:

\hat j = arg \max_{j \in J} E[X_j \vert y]

Now comes the tricky part. Let’s say the doctor is corrupt and willing to sell the medical records of her patients to her patient’s potential employers. By assumption y_i reveals information both about w_i and x_i. We know from our earlier study that information about x_i is indeed valuable to the employer. There must be some price (at least within our neoclassical framework) that the employer is willing to pay the corrupt doctor for information about patient symptoms.

We also know that having potential employers know more about your aptitudes is good for highly qualified applicants and bad for not as qualified applicants. The more information employers know about you, the more likely they will be able to tell if you are worth hiring.

The upshot is that there may be some patients who are more than happy to have their medical records sold off to their potential employers because those particular symptoms are correlated with high job performance. These will be attracted to systems that share their information across medical and employment purposes.

But for those with symptoms correlated with lower job performance, there is now a trickier decision. If doctors are corrupt, it may be that they choose not to reveal their symptoms accurately (or at all) because this information might hurt their chances of employment.

A few more wrinkles here. Suppose it’s true the fewer people will go to corrupt doctors because they suspect or know that information will leak to their employers. If there are people who suspect or know that the information that leaks to their employers will reflect on them favorably, that creates a selection effect on who goes to the doctor. This means that the information that i has gone to the doctor, or not, is a signal employers can use to discriminate between potential applicants. So to some extent the harms of the corrupt doctors fall on the less able even if they opt out of health care. They can’t opt out entirely of the secondary information effects.

We can also add the possibility that not all doctors are corrupt. Only some are. But if it’s unknown which doctors are corrupt, the possibility of corruption still affects the strategies of patients/employees in a similar way, only now in expectation. Just as in the Akerlof market for lemons, a few corrupt doctors ruins the market.

I have not made these arguments mathematically specific. I leave that to a later date. But for now I’d like to draw some tentative conclusions about what mandating the protection of health information, as in HIPAA, means for the welfare outcomes in this model.

If doctors are prohibited from selling information to employers, then the two markets do not interfere with each other. Doctors can solicit symptoms in a way that optimizes benefits to all patients. Employers can make informed choices about potential candidates through an independent process. The latter will serve to select more promising applicants from less promising applicants.

But if doctors can sell health information to employers, several things change.

  • Employers will benefit from information about employee health and offer to pay doctors for the information.
  • Some doctors will discretely do so.
  • The possibility of corrupt doctors will scare off those patients who are afraid their symptoms will reveal a lack of job aptitude.
  • These patients no longer receive treatment.
  • This reduces the demand for doctors, shrinking the health care market.
  • The most able will continue to see doctors. If their information is shared with employers, they will be more likely to be hired.
  • Employers may take having medical records available to be bought from corrupt doctors as a signal that the patient is hiding something that would reveal poor aptitude.

In sum, without data protection laws, there are fewer people receiving beneficial treatment and fewer jobs for doctors providing beneficial treatment. Employers are able to make more advantageous decisions, and the most able employees are able to signal their aptitude through the corrupt health care system. Less able employees may wind up being identified anyway through their non-participation in the medical system. If that’s the case, they may wind up returning to doctors for treatment anyway, though they would need to have a way of paying for it besides employment.

That’s what this model says, anyway. The biggest surprise for me is the implication that data protection laws serve this interests of service providers by expanding their customer base. That is a point that is not made enough! Too often, the need for data protection laws is framed entirely in terms of the interests of the consumer. This is perhaps a politically weaker argument, because consumers are not united in their political interest (some consumers would be helped, not harmed, by weaker data protection).

References

Akerlof, G. A. (1970). The market for” lemons”: Quality uncertainty and the market mechanism. The quarterly journal of economics, 488-500.

Davis, J. L., & Jurgenson, N. (2014). Context collapse: theorizing context collusions and collisions. Information, Communication & Society, 17(4), 476-485.

Marwick, A. E., & Boyd, D. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New media & society, 13(1), 114-133.

Nissenbaum, H. (2004). Privacy as contextual integrity. Wash. L. Rev., 79, 119.

Credit scores and information economics

The recent Equifax data breach brings up credit scores and their role in the information economy. Credit scoring is a controversial topic in the algorithmic accountability community. Frank Pasquale, for example, writes about it in The Black Box Society. Most of the critical writing on the subject points to how credit scoring might be done in a discriminatory or privacy-invasive way. As interesting as those critiques are from a political and ethical perspective, it’s worth reviewing what credit scores are for in the first place.

Let’s model this as we have done in other cases of information flow economics.

There’s a variable of interest, the likelihood that a potential borrower will not default on a loan, X. Note that any value sampled from this x will vary within the interval [0,1] because it is a value of probability.

There’s a decision to be made by a bank: whether or not to provide a random borrower a loan.

To keep things very simple, let’s suppose that the bank gets a payoff of 1 if the borrower is given a loan and does not default and gets a payoff of -1 if the borrower gets the loan and defaults. The borrower gets a payoff of 1 if he gets the loan and 0 otherwise. The bank’s strategy is to avoid giving loans that lead to negative expected payoff. (This is a gross oversimplification of, but is essentially consistent with, the model of credit used by Blöchlinger and Leippold (2006).

Given a particular x, the expected utility of the bank is:

x (1) + (1 - x) (-1) = 2x - 1

Given the domain of [0,1], this function ranges from -1 to 1, hitting 0 when x = .5.

We can now consider welfare outcomes under conditions of now information flow, total information flow, and partial information flow.

Suppose the bank has no insight into x besides a prior expectation X. Then the expected value of the bank upon offering the loan is E[2x+1]. If it is above zero, the bank will offer the loan and the borrower gets a positive payoff. If it is below zero, the bank will not offer the loan and both the bank and potential borrower will get zero payoff. The outcome depends entirely on the prior probability of loan default and is either rewards borrowers or not depending on that distribution.

If the bank has total insight into x, then the outcomes are different. The bank can use the option to reject borrowers for whom x is less than .5, and accept those for whom x is greater than .5. If we see the game as repeated over many borrowers whose chances of paying off their loan are all sampled from X. Then the additional knowledge of the bank creates two classes of potential borrowers, one that gets loans and one that does not. This increases inequality among borrowers.

It also increases the utility of the bank. This is perhaps best illustrated with a simple example. Suppose the distribution X is uniform over the unit interval [0,1]. Then the expected value of the bank’s payoff under complete information is

\int_{.5}^{1} 2x - 1 dx = 0.25

which is a significant improvement over the expected payoff of 0 in the uninformed case.

Putting off an analysis of the partial information case for now, suffice it to say that we expect partial information (such as a credit score) to lead to an intermediate result, improving bank profits and differentiating borrowers with respect to the bank’s choice to loan.

What is perhaps most interesting about this analysis is the similarity between it and Posner’s employment market. In both cases, the subject of the variable of interest X is a person’s prospects for improving the welfare of the principle decision-maker upon their being selected, where selection also implies benefit to the subject. Uncertainty about the prospects leads to equal treatment of prospective persons and reduced benefit to the principle. More information leads to differentiated impact to the prospects and benefit to the principle.

References

Blöchlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking & Finance, 30(3), 851-873.

Information flow in economics

We have formalized three different cases of information economics:

What we discovered is that each of these cases has, to some extent, a common form. That form is this:

There is a random variable of interest, x \sim X (that is, a value x sampled from a probability distribution X), that has direct effect on the welfare outcome of decisions made be agents in the economy. In our cases this was the aptitude of job applicants, consumers willingness to pay, and the utility of receiving a range of different expert recommendations, respectively.

In the extreme cases, the agent at the focus of the economic model could act with extreme ignorance of x, or extreme knowledge of it. Generally, the agent’s situation improves the more knowledgeable they are about x. The outcomes for the subjects of X vary more widely.

We also considered the possibility that the agent has access to partial information about X through the observation of a different variable y \sim Y. Upon observation of y, they can make their judgments based on an improved subjective expectation of the unknown variable, P(x \vert y). We assumed that the agent was a Bayesian reasoner and so capable of internalizing evidence according to Bayes rule, hence they are able to compute:

P(X \vert Y) \propto P(Y \vert X) P(X)

However, this depends on two very important assumptions.

The first is that the agent knows the distribution X. This is the prior in their subjective calculation of the Bayesian update. In our models, we have been perhaps sloppy in assuming that this prior probability corresponds to the true probability distribution from which the value x is drawn. We are somewhat safe in this assumption because for the purposes of determining strategy, only subjective probabilities can be taken into account and we can relax the distribution to encode something close to zero knowledge of the outcome if necessary. In more complex models, the difference between agents with different knowledge of X may be more strategically significant, but we aren’t there yet.

The second important assumption is that the agent knows the likelihood function P(Y | X). This is quite a strong assumption, as it implies that the agent knows truly how Y covaries with X, allowing them to “decode” the message y into useful information about x.

It may be best to think of access and usage of the likelihood function as a rare capability. Indeed, in our model of expertise, the assumption was that the service provider (think doctor) knew more about the relationship between X (appropriate treatment) and Y (observable symptoms) than the consumer (patient) did. In the case of companies that use data science, the idea is that some combination of data and science gives the company an edge in knowing the true value of some uncertain property than its competitors.

What we are discovering is that it’s not just the availability of y that matters, but also the ability to interpret y with respect to the probability of x. Data does not speak for itself.

This incidentally ties in with a point which we have perhaps glossed over too quickly in the present discussion, which is what is information, really? This may seem like a distraction in a discussion about economics but it is a question that’s come up in my own idiosyncratic “disciplinary” formation. One of the best intuitive definitions of information is provided by philosopher Fred Dretske (1981; 1983). Made a presentation of Fred Dretske’s view on information and its relationship to epistemological skepticism and Shannon information theory; you can find this presentation here. But for present purposes I want to call attention to his definition of what it means for a message to carry information, which is:

[A] message carries the information that X is a dingbat, say, if and only if one could learn (come to know) that X is a dingbat from the message.

When I say that one could learn that X was a dingbat from the message, I mean, simply, that the message has whatever reliable connection with dingbats is required to enable a suitably equipped, but otherwise ignorant receiver, to learn from it that X is a dingbat.

This formulation is worth mentioning because it supplies a kind of philosophical validation for our Bayesian formulation of information flow in the economy. We are modeling situations where Y is a signal that is reliably connected with X such that instantiations of Y carry information about the value of the X. We might express this in terms of conditional entropy:

H(X|Y) < H(X)

While this is sufficient for Y to carry information about X, it is not sufficient for any observer of Y to consequently know X. An important part of Dretske's definition is that the receiver must be suitably equipped to make the connection.

In our models, the “suitably equipped” condition is represented as the ability to compute the Bayesian update using a realistic likelihood function P(Y \vert X). This is a difficult demand. A lot of computational statistics has to do with the difficulty of tractably estimating the likelihood function, let alone computing it perfectly.

References

Dretske, F. I. (1983). The epistemology of belief. Synthese, 55(1), 3-19.

Dretske, F. (1981). Knowledge and the Flow of Information.

Economics of expertise and information services

We have no considered two models of how information affects welfare outcomes.

In the first model, inspired by an argument from Richard Posner, the are many producers (employees, in the specific example, but it could just as well be cars, etc.) and a single consumer. When the consumer knows nothing about the quality of the producers, the consumer gets an average quality producer and the producers split the expected utility of the consumer’s purchase equally. When the consumer is informed, she benefits and so does the highest quality producer, at the detriment of the other producers.

In the second example, inspired by Shapiro and Varian’s discussion of price differentiation in the sale of information goods, there was a single producer and many consumers. When the producer knows nothing about the “quality” of the consumers–their willingness to pay–the producer charges all consumers a profit-maximizing price. This price leaves many customers out of reach of the product, and many others getting a consumer surplus because the product is cheap relative to their demand. When the producer is more informed, they make more profit by selling as personalized prices. This lets the previously unreached customers in on the product at a compellingly low price. It also allows the producer to charge higher prices to willing customers; they capture what was once consumer surplus for themselves.

In both these cases, we have assumed that there is only one kind of good in play. It can vary numerically in quality, which is measured in the same units as cost and utility.

In order to bridge from theory of information goods to theory of information services, we need to take into account a key feature of information services. Consumers buy information when they don’t know what it is they want, exactly. Producers of information services tailor what they provide to the specific needs of the consumers. This is true for information services like search engines but also other forms of expertise like physician’s services, financial advising, and education. It’s notable that these last three domains are subject to data protection laws in the United States (HIPAA, GLBA, and FERPA) respectively, and on-line information services are an area where privacy and data protection are a public concern. By studying the economics of information services and expertise, we may discover what these domains have in common.

Let’s consider just a single consumer and a single producer. The consumer has a utility function \vec{x} \sim X (that is, sampled from random variable X, specifying the values it gets for the consumption of each of m = \vert J \vert products. We’ll denote with x_j the utility awarded to the consumer for the consumption of product j \in J.

The catch is that the consumer does not know X. What they do know is y \sim Y, which is correlated with X is some way that is unknown to them. The consumer tells the producer y, and the producer’s job is to recommend to them j \in J that will most benefit them. We’ll assume that the producer is interested in maximizing consumer welfare in good faith because, for example, they are trying to promote their professional reputation and this is roughly in proportion to customer satisfaction. (Let’s assume they pass on costs of providing the product to the consumer).

As in the other cases, let’s consider first the case where the acting party has no useful information about the particular customer. In this case, the producer has to choose their recommendation \hat j based on their knowledge of the underlying probability distribution X, i.e.:

\hat j = arg \max_{j \in J} E[X_j]

where X_j is the probability distribution over x_j implied by X.

In the other extreme case, the producer has perfect information of the consumer’s utility function. They can pick the truly optimal product:

\hat j = arg \max_{j \in J} x_j

How much better off the consumer is in the second case, as opposed to the first, depends on the specifics of the distribution X. Suppose X_j are all independent and identically distributed. Then an ignorant producer would be indifferent to the choice of \hat j, leaving the expected outcome for the consumer E[X_j], whereas the higher the number of products m the more \max_{j \in J} x_j will approach the maximum value of X_j.

In the intermediate cases where the producer knows y which carries partial information about \vec{x}, they can choose:

\hat j = arg \max_{j \in J} E[X_j \vert y] =

arg \max_{j \in J} \sum x_j P(x_j = X_j \vert y) =

arg \max_{j \in J} \sum x_j P(y \vert x_j = X_j) P(x_j = X_j)

The precise values of the terms here depend on the distributions X and Y. What we can know in general is that the more informative is y is about x_j, the more the likelihood term P(y \vert x_j = X_j) dominates the prior P(x_j = X_j) and the condition of the consumer improves.

Note that in this model, it is the likelihood function P(y \vert x_j = X_j) that is the special information that the producer has. Knowledge of how evidence (a search query, a description of symptoms, etc.) are caused by underlying desire or need is the expertise the consumers are seeking out. This begins to tie the economics of information to theories of statistical information.