Digifesto

Tag: nondiscrimination

On achieving social equality

When evaluating a system, we have a choice of evaluating its internal functions–the inside view–or evaluating its effects situated in a larger context–the outside view.

Decision procedures (whether they are embodied by people or performed in concert with mechanical devices–I don’t think this distinction matters here) for sorting people are just such a system. If I understand correctly, the question of which principles animate antidiscrimination law hinge on this difference between the inside and outside view.

We can look at a decision-making process and evaluate whether as a procedure it achieves its goals of e.g. assigning credit scores without bias against certain groups. Even including processes of the gathering of evidence or data in such a system, it can in principle be bounded and evaluated by its ability to perform its goals. We do seem to care about the difference between procedural discrimination and procedural nondiscrimination. For example, an overtly racist policy that ignores truly talent and opportunity seems worse than a bureaucratic system that is indifferent to external inequality between groups that then gets reflected in decisions made according to other factors that are merely correlated with race.

The latter case has been criticized in the outside view. The criticism is captured by the phrasing that “algorithms can reproduce existing biases”. The supposedly neutral algorithm (which can, again, be either human or machine) is not neutral in its impact because in making its considerations of e.g. business interest are indifferent to the conditions outside it. The business is attracted to wealth and opportunity, which are held disproportionately by some part of the population, so the business is attracted to that population.

There is great wisdom in recognizing that institutions that are neutral in their inside view will often reproduce bias in the outside view. But it is incorrect to therefore conflate neutrality in the inside view with a biased inside view, even though their effects may be under some circumstances the same. When I say it is “incorrect”, I mean that they are in fact different because, for example, if the external conditions of procedurally neutral institution change, then it will reflect those new conditions. A procedurally biased institution will not reflect those new conditions in the same way.

Empirically it is very hard to tell when an institution is being procedurally neutral and indeed this is the crux of an enormous amount of political tension today. The first line of defense of an institution blamed of bias is to claim that their procedural neutrality is merely reflecting environmental conditions outside of its control. This is unconvincing for many politically active people. It seems to me that it is now much more common for institutions to avoid this problem by explicitly declaring their bias. Rather than try to accomplish the seemingly impossible task of defending their rigorous neutrality, it’s easier to declare where one stands on the issue of resource allocation globally and adjust ones procedure accordingly.

I don’t think this is a good thing.

One consequence of evaluating all institutions based on their global, “systemic” impact as opposed to their procedural neutrality is that it hollows out the political center. The evidence is in that politics has become more and more polarized. This is inevitable if politics becomes so explicitly about maintaining or reallocating resources as opposed to about building neutrally legitimate institutions. When one party in Congress considers a tax bill which seems designed mainly to enrich ones own constituencies at the expense of the other’s things have gotten out of hand. The idea of a unified idea of ‘good government’ has been all but abandoned.

An alternative is a commitment to procedural neutrality in the inside view of institutions, or at least some institutions. The fact that there are many different institutions that may have different policies is indeed quite relevant here. For while it is commonplace to say that a neutral institution will “reproduce existing biases”, “reproduction” is not a particularly helpful word here. Neither is “bias”. What we can say more precisely is that the operations of procedurally neutral institution will not change the distribution of resources even though they are unequal.

But if we do not hold all institutions accountable for correcting the inequality of society, isn’t that the same thing as approving of the status quo, which is so unequal? A thousand times no.

First, there’s the problem that many institutions are not, currently, procedurally neutral. Procedural neutrality is a higher standard than what many institutions are currently held to. Consider what is widely known about human beings and their implicit biases. One good argument for transferring decision-making authority to machine learning algorithms, even standard ones not augmented for ‘fairness’, is that they will not have the same implicit, inside, biases as the humans that currently make these decisions.

Second, there’s the fact that responsibility for correcting social inequality can be taken on by some institutions that are dedicated to this task while others are procedurally neutral. For example, one can consistently believe in the importance of a progressive social safety net combined with procedurally neutral credit reporting. Society is complex and perhaps rightly has many different functioning parts; not all the parts have to reflect socially progressive values for the arc of history to bend towards justice.

Third, there is reason to believe that even if all institutions were procedurally neutral, there would eventually be social equality. This has to do with the mathematically bulletproof but often ignored phenomenon of regression towards the mean. When values are sampled from a process at random, their average will approach the mean of the distribution as more values are accumulated. In terms of the allocation of resources in a population, there is some random variation in the way resources flow. When institutions are fair, inequality in resource allocation will settle into an unbiased distribution. While their may continue to be some apparent inequality due to disorganized heavy tail effects, these will not be biased, in a political sense.

Fourth, there is the problem of political backlash. Whenever political institutions are weak enough to be modified towards what is purported to be a ‘substantive’ or outside view neutrality, that will always be because some political coalition has attained enough power to swing the pendulum in their favor. The more explicit they are about doing this, the more it will mobilize the enemies of this coallition to try to swing the pendulum back the other way. The result is war by other means, the outcome of which will never be fair, because in war there are many who wind up dead or injured.

I am arguing for a centrist position on these matters, one that favors procedural neutrality in most institutions. This is not because I don’t care about substantive, “outside view” inequality. On the contrary, it’s because I believe that partisan bickering that explicitly undermines the inside neutrality of institutions undermines substantive equality. Partisan bickering over the scraps within narrow institutional frames is a distraction from, for example, the way the most wealthy avoid taxes while the middle class pays even more. There is a reason why political propaganda that induces partisan divisions is a weapon. Agreement about procedural neutrality is a core part of civic unity that allows for collective action against the very most abusively powerful.

References

Zachary C. Lipton, Alexandra Chouldechova, Julian McAuley. “Does mitigating ML’s disparate impact require disparate treatment?” 2017

Notes on fairness and nondiscrimination in machine learning

There has been a lot of work done lately on “fairness in machine learning” and related topics. It cannot be a coincidence that this work has paralleled a rise in political intolerance that is sensitized to issues of gender, race, citizenship, and so on. I more or less stand by my initial reaction to this line of work. But very recently I’ve done a deeper and more responsible dive into this literature and it’s proven to be insightful beyond the narrow problems which it purports to solve. These are some notes on the subject, ordered so as to get to the point.

The subject of whether and to what extent computer systems can enact morally objectionable bias goes back at least as far as Friedman and Nissenbaum’s 1996 article, in which they define “bias” as systematic unfairness. They mean this very generally, not specifically in a political sense (though inclusive of it). Twenty years later, Kleinberg et al. (2016) prove that there are multiple, competing notions of fairness in machine classification which generally cannot be satisfied all at once; they must be traded off against each other. In particular, a classifier that uses all available information to optimize accuracy–one that achieves what these authors call calibration–cannot also have equal false positive and false negative rates across population groups (read: race, sex), properties that Hardt et al. (2016) call “equal opportunity”. This is no doubt inspired by a now very famous ProPublica article asserting that a particular kind of commercial recidivism prediction software was “biased against blacks” because it had a higher false positive rate for black suspects than white offenders. Because bail and parole rates are set according to predicted recidivism, this led to cases where a non-recidivist was denied bail because they were black, which sounds unfair to a lot of people, including myself.

While I understand that there is a lot of high quality and well-intentioned research on this subject, I haven’t found anybody who could tell me why the solution to this problem was to stop using predicted recidivism to set bail, as opposed to futzing around with a recidivism prediction algorithm which seems to have been doing its job (Dieterich et al., 2016). Recidivism rates are actually correlated with race (Hartney and Vuong, 2009). This is probably because of centuries of systematic racism. If you are serious about remediating historical inequality, the least you could do is cut black people some slack on bail.

This gets to what for me is the most baffling aspect of this whole research agenda, one that I didn’t have the words for before reading Barocas and Selbst (2016). A point well-made by them is that the interpretation anti-discrimination law, which motivates a lot of this research, is fraught with tensions that complicate its application to data mining.

“Two competing principles have always undergirded anti-discrimination law: nondiscrimination and antisubordination. Nondiscrimination is the narrower of the two, holding that the responsibility of the law is to eliminate the unfairness individuals experience a the hands of decisionmakers’ choices due to membership in certain protected classes. Antisubordination theory, in contrast, holds that the goal of antidiscrimination law is, or at least should be, to eliminate status-based inequality due to membership in those classes, not as a matter of procedure, but substance.” (Barocas and Selbst, 2016)

More specifically, these two principles motivate different interpretations of the two pillars of anti-discrimination law, disparate treatment and disparate impact. I draw on Barocas and Selbst for my understanding of each:

A judgment of disparate treatment requires either a formal disparate treatment (across protected groups) of similarly situated people, or an intent to discriminate. Since in a large data mining application protected group membership will be proxied by many other factors, it’s not clear if the ‘formal’ requirement makes much sense here. And since machine learning applications only very rarely have racist intent, that option seems challengeable as well. While there are interpretations of these criteria that are tougher on decision-makers (i.e. unconscious intents), these seem to be motivated by antisubordination rather than the weaker nondiscrimination principle.

A judgment of disparate impact is perhaps more straightforward, but it can be mitigated in cases of “business necessity”, which (to get to the point) is vague enough to plausibly include optimization in a technical sense. Once again, there is nothing to see here from a nondiscrimination standpoint, though a nonsubordinationist would rather that these decision-makers have to take correcting for historical inequality into account.

I infer from their writing that Barocas and Selbst believe that nonsubordination is an important principle for nondiscrimination. In any case, they maintain that making the case for applying nondiscrimination laws to data mining effectively requires a commitment to “substantive remediation”. This is insightful!

Just to put my cards on the table: as much as I may like the idea of substantive remediation in principle, I personally don’t think that every application of nondiscrimination law needs to be animated by it. For many institutions, narrow nondiscrimination seems to be adequate if not preferable. I’d prefer remediation to occur through other specific policies, such as more public investment in schools in low-income districts. Perhaps for this reason, I’m not crazy about “fairness in machine learning” as a general technical practice. It seems to me to be trying to solve social problems with a technical fix, which despite being quite technical myself I don’t always see as a good idea. It seems like in most cases you could have a machine learning mechanism based on normal statistical principles (the learning step) and then use a decision procedure separately that achieves your political ends.

I wish that this research community (and here I mean more the qualitative research community surrounding it more than the technical community, which tends to define its terms carefully) would be more careful about the ways it talks about “bias”, because often it seems to encourage a conflation between statistical or technical senses of bias and political senses. The latter carry so much political baggage that it can be intimidating to try to wade in and untangle the two senses. And it’s important to do this untangling, because while bad statistical bias can lead to political bias, it can, depending on the circumstances, lead to either “good” or “bad” political bias. But it’s important, from the sake of numeracy (mathematical literacy) to understand that even if a statistically bad process has a politically “good” outcome, that is still, statistically speaking, bad.

My sense is that there are interpretations of nondiscrimination law that make it illegal to make certain judgments taking into account certain facts about sensitive properties like race and sex. There are also theorems showing that if you don’t take into account those sensitive properties, you are going to discriminate against them by accident because those sensitive variables are correlated with anything else you would use to judge people. As a general principle, while being ignorant may sometimes make things better when you are extremely lucky, in general it makes things worse! This should be a surprise to nobody.

References

Barocas, Solon, and Andrew D. Selbst. “Big data’s disparate impact.” (2016).

Dieterich, William, Christina Mendoza, and Tim Brennan. “COMPAS risk scales: Demonstrating accuracy equity and predictive parity.” Northpoint Inc (2016).

Friedman, Batya, and Helen Nissenbaum. “Bias in computer systems.” ACM Transactions on Information Systems (TOIS) 14.3 (1996): 330-347.

Hardt, Moritz, Eric Price, and Nati Srebro. “Equality of opportunity in supervised learning.” Advances in Neural Information Processing Systems. 2016.

Hartney, Christopher, and Linh Vuong. “Created equal: Racial and ethnic disparities in the US criminal justice system.” (2009).

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807 (2016).