Digifesto

Category: artificial intelligence

On descent-based discrimination (a reply to Hanna et al. 2020)

In what is likely to be a precedent-setting case, California regulators filed a suit in the federal court on June 30 against Cisco Systems Inc, alleging that the company failed to prevent discrimination, harassment and retaliation against a Dalit engineer, anonymised as “John Doe” in the filing.

The Cisco case bears the burden of making anti-Dalit prejudice legible to American civil rights law as an extreme form of social disability attached to those formerly classified as “Untouchable.” Herein lies its key legal significance. The suit implicitly compares two systems of descent-based discrimination – caste and race – and translates between them to find points of convergence or family resemblance.

A. Rao, link

There is not much I can add to this article about caste-based discrimination in the U.S. In the law suit, a team of high caste South Asians in California is alleged to have discriminated against a Dalit engineer coworker. The work of the law suit is to make caste-based discrimination legible to American civil rights law. It, correctly, in my view, draws the connection to race.

This illustrative example prompts me to respond to Hanna et al.’s 2020 “Towards a critical race methodology in algorithmic fairness.” This paper by a Google team included a serious, thoughtful consideration of the argument I put forward with my co-author Bruce Haynes in “Racial categories in machine learning”. I like the Hanna et al. paper, think it makes interesting and valid points about the multidimensionality of race, and am grateful for their attention to my work.

I also disagree with some of their characterization of our argument and one of the positions they take. For some time I’ve intended to write a response. Now is a fine time.

First, a quibble: Hanna et al. describe Bruce D. Haynes as a “critical race scholar” and while he may have changed his mind since our writing, at the time he was adamant (in conversation) that he is not a critical race scholar, but that “critical race studies” refers to a specific intellectual project of racial critique that just happens to be really trendy on Twitter. There are lots and lots of other ways to study race critically that are not “critical race studies”. I believe this point was important to Bruce as a matter of scholarly identity. I also feel that it’s an important point because, frankly, I don’t find a lot of “critical race studies” scholarship persuasive and I probably wouldn’t have collaborated as happily with somebody of that persuasion.

So that fact that Hanna et al. explicitly position their analysis in “critical race” methods is a signpost that they are actually trying to accomplish a much more specifically disciplinarily informed project than we were. Sadly, they did not get into the question of how “critical race methodology” differs from other methodologies one might use to study race. That’s too bad, as it supports what I feel is a stifling hegemony that particular discourse has over discussions of race and technology.

The Google team is supportive of the most important contribution of our paper–that racial categories are problematic and that this needs to be addressed in the fairness in AI literature. They then go on to argue against out proposed solution of “using an unsupervised machine learning method to create race-like categories which aim to address “historical racial segregation with reproducing the political construction of racial categories.”” (their rendering). I will defend our solution here.

Their first claim:

First, it would be a grave error to supplant the existing categories of race with race-like categories inferred by unsupervised learning methods. Despite the risk of reifying the socially constructed idea called race, race does exist in the world, as a way of mental sorting, as a discourse which is adopted, as a social thing which has both structural and ideological components. In other words, although race is social constructed, race still has power. To supplant race with race-like categories for the purposes of measurement sidesteps the problem.

This paragraph does feel very “critical race studies” to me, in that it makes totalizing claims about the work race does in society in a way that precludes the possibility of any concrete or focused intervention. I think they misunderstand our proposal in the following ways:

  • We are not proposing that, at a societal and institutional level, we institute a new, stable system of categories derived from patterns of segregation. We are proposing that, ideally, temporary quasi-racial categories are derived dynamically from data about segregation in a way that destabilizes the social mechanisms that reproduce racial hierarchy, reducing the power of those categories.
  • This is proposed as an intervention to be adopted by specific technical systems, not at the level of hegemonic political discourse. It is a way of formulating an anti-racist racial project by undermining the way categories are maintained.
  • Indeed, the idea is to sidestep the problem, in the sense that it is an elegant way to reduce the harm that the problem does. Sidestepping is, imagine it, a way of avoiding a danger. In this case, that danger is the reification of race in large scale digital platforms (for example).

Next, they argue:

Second, supplanting race with race-like categories depends highly on context, namely how race operates within particular systems of inequality and domination. Benthall and Haynes restrict their analysis to that of spatial segregation, which is to be sure, an important and active research area and subject of significant policy discussion (e.g. [76, 99]). However, that metric may appear illegible to analyses pertaining to other racialized institutions, such as the criminal justice system, education, or employment (although one can readily see their connections and interdependencies). The way that race matters or pertains to particular types of structural inequality depends on that context and requires its own modes of operationalization

Here, the Google team takes the anthropological turn and, like many before them, suggests that a general technical proposal is insufficient because it is not sufficiently contextualized. Besides echoing the general problem of the ineffectualness of anthropological methods in technology ethics, they also mischaracterize our paper by saying we restrict our analysis to spatial segregation. This is not true: in the paper we generalize our analysis to social segregation, as in on a social network graph. Naturally, we would be (a) interested in and open to other systems of identifying race as a feature of social structure, and (b) would want to tailor data over which any operationalization technique was applied, where appropriate, to technical and functional context. At the same time, we are on quite solid ground in saying that racial is structural and systemic, and in a sense defined at a holistic societal level as much as it has ramifications in, and is impacted by, the micro- and contextual level as well. As we are approaching the problem from a structural sociological one, we can imagine a structural technical solution. This is an advantage of the method over a more anthropological one.

Third:

At the same time we focus on the ontological aspects of race (what is race, how is it constituted and imagined in the world), it is necessary to pay attention to what we do with race and measures which may be interpreted as race. The creation of metrics and indicators which are race-like will still be interpreted as race.

This is a strange criticism given that one of the potential problems with our paper is that the quasi-racial categories we propose are not interpretable. The authors seem think that our solution involves the institution of new quasi-racial categories at the level of representation or discourse. That’s not what we’ve proposed. We’ve proposed a design for a machine learning system which, we’d hope, would be understood well enough by its engineers to work as an intervention. Indeed, the correlation of the quasi-racial categories with socially recognized racial ones is important if they are to ground fairness interventions; the purpose of our proposed solution is narrowly to allow for these interventions without the reification of the categories.

Enough defense. There is a point the Google team insists on which strikes me as somewhat odd and to me signals a further weakness of their hyper contextualized method: its inability to generalize beyond the hermeneutic cycles of “critical race theory”.

Hanna et al. list several (seven) different “dimensions of race” based on different ways race can be ascribed, inferred, or expressed. There is, here, the anthropological concern with the individual body and its multifaceted presentations in the complex social field. But they explicitly reject one of the most fundamental ways in which race operates at a transpersonal and structural level, which is through families and genealogy. This is well-intentioned but ultimately misguided.

Note that we have excluded “racial ancestry” from this table. Genetics, biomedical researchers, and sociologists of science have criticized the use of “race” to describe genetic ancestry within biomedical research [40, 49, 84, 122], while others have criticized the use of direct-to-consumer genetic testing and its implications for racial and ethnic identification [15, 91, 113]

In our paper, we take pains to point out responsibly how many aspects of racial, such as phenotype, nationality (through citizenship rules), and class signifiers (through inheritance) are connected with ancestry. We, of course, do not mean to equate ancestry with race. Nor, especially, are we saying that there are genetic racialized qualities besides perhaps those associated with phenotype. We are also not saying that direct-to-consumer genetic test data is what institutions should be basing their inference of quasi-racial categories on. Nothing like that.

However, speaking for myself, I believe that an important aspect of how race functions at a social structural level is how it implicates relations of ancestry. A. Rao perhaps puts the point better: race is a system of inherited privilege, and racial discrimination is more often than not discrimination based on descent.

Understanding this about race allows us to see what race has in common with other systems of categorical inequality, such as the caste system. And here was a large part of the point of offering an algorithmic solution: to suggest a system for identifying inequality that transcends the logic of what is currently recognized within the discourse of “critical race theory” and anticipates forms of inequality and discrimination that have not yet been so politically recognized. This will become increasingly an issue when a pluralistic society (or user base of an on-line platform) interacts with populations whose categorical inequalities have different histories and origins besides the U.S. racial system. Though our paper used African-Americans as a referent group, the scope of our proposal was intentionally much broader.

References

Benthall, S., & Haynes, B. D. (2019, January). Racial categories in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 289-298).

Hanna, A., Denton, E., Smart, A., & Smith-Loud, J. (2020, January). Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 501-512).

Antinomianism and purposes as reasons against computational law (Notes on Hildebrandt, Smart Technologies, Sections 7.3-7.4)

Many thanks to Jake Goldenfein for discussing this reading with me and coaching me through interpreting it in preparation for writing this post.

Following up on the discussion of sections 7.1-7.2 of Hildebrandt’s Smart Technologies an the End(s) of Law (2015), this post discusses the next two sections. The main questions left from the last section are:

  • How strong is Hildebrandt’s defense of the Rule of Law, as she explicates it, as worth preserving despite the threats to it that she acknowledges from smart technologies?
  • Is the instrumental power of smart technology (i.e, its predictive function, which for the sake of argument we will accept is more powerful than unassisted human prognostication) somehow a substitute for Law, as in its pragmatist conception?

In sections 7.3-7.4, Hildbrandt discusses the eponymous ends of law. These are not its functions as could be externally and sociologically validated, but rather its internally recognized goals or purposes. And these are not particular goals, such as environmental justice, that we might want particular laws to achieve. Rather, these are abstract goals that the law as an entire ‘regime of veridiction’ aims for. (“Veridiction” means “A statement that is true according to the worldview of a particular subject, rather than objectively true.” The idea is that the law has a coherent worldview of its own.

Hildebrandt’s description of law is robust and interesting. Law “articulates legal conditions for legal effect.” Legal personhood (a condition) entails certain rights under the law (an effect). These causes-and-effects are articulated in language, and this language does real work. In Austin’s terminology, legal language is performative–it performs things at an institutional and social level. Relatedly, the law is experienced as a lifeworld, or Welt, but not a monolithic lifeworld that encompasses all experience, but one of many worlds that we use to navigate reality, a ‘mode of existence’ that ‘affords specific roles, actors and actions while constraining others’. [She uses Latour to make this point, which in my opinion does not help.] It is interesting to compare this view of society with Nissenbaum’s ((2009) view of society differentiated into spheres, constituted by actor roles and norms.

In section 7.3.2, Hildebrandt draws on Gustav Radbruch for his theory of law. Consistent with her preceding arguments, she emphasizes that for Radbruch, law is antinomian, (a strange term) meaning that it is internally contradictory and unruly, with respect to its aims. And there are three such aims that are in tension:

  • Justice. Here, justice is used rather narrowly to mean that equal cases should be treated equally. In other words, the law must be applied justly/fairly across cases. To use her earlier framing, justice/equality implied that legal conditions cause legal effects in a consistent way. In my gloss, I would say this is equivalent to the formality of law, in the sense that the condition-effect rules must address the form of a case, and not treat particular cases differently. More substantively, Hildebrandt argues that Justice breaks down into more specific values: distributive justice, concerning the fair distribution of resources across society, and corrective justice, concerning the righting of wrongs through, e.g., torts.
  • Legal certainty. Legal rules must be binding and consistent, whether or not they achieve justice or purpose. “The certainty of the law requires its positivity; if it cannot be determined what is just, it must be decided what is lawful, and this from a position that is capable of enforcing the decision.” (Radbruch). Certainty about how the law will be applied, whether or not the application of the law is just (which may well be debated), is a good in itself. [A good example of this is law in business, which is famously one of the conditions for the rise of capitalism.]
  • Purpose. Beyond just/equal application of the law across cases and its predictable positivity, the law aims at other purposes such as social welfare, redistribution of income, guarding individual and public security, and so on. None of these purposes is inherent in the law, for Radbruch; but in his conception of law, by its nature it is directed by democratically determined purposes and is instrumental to them. These purposes may flesh out the normative detail that’s missing in a more abstract view of law.

Two moves by Hildebrandt in this section seem particularly substantial to her broader argument and corpus of work.

The first is the emphasis on the contrast between the antinomian conflict between justice, certainty, and purpose with the principle of legal certainty itself. Law, at any particular point in time, may fall short of justice or purpose, and must nevertheless be predictably applied. It also needs to be able to evolve towards its higher ends. This, for Hildebrandt, reinforces the essential ambiguous and linguistic character of law.

[Radbruch] makes it clear that a law that is only focused on legal certainty could not qualify as law. Neither can we expect the law to achieve legal certainty to the full, precisely because it must attend to justice and to purpose. If the attribution of legal effect could be automated, for instance by using a computer program capable of calculating all the relevant circumstances, legal certainty might be achieved. But this can only be done by eliminating the ambiguity that inheres in human language: it would reduce interpretation to mindless application. From Radbruch’s point of view this would fly in the face of the cultural, value-laden mode of existence of the law. It would refute the performative nature of law as an artificial construction that depends on the reiterant attribution of meaning and decision-making by mindful agents.

Hildebrandt, Smart Technologies, p. 149

The other move that seems particular to Hildebrandt is the connection she draws between purpose as one of the three primary ends of law and purpose-binding a feature of governance. The latter has particular relevance to technology law through its use in data protection, such as in the GDPR (which she addresses elsewhere in work like Hildebrandt, 2014). The idea here is that purposes do not just imply a positive direction of action; they also restrict activity to only those actions that support the purpose. This allows for separate institutions to exist in tension with each other and with a balance of power that’s necessary to support diverse and complex functions. Hildebrandt uses a very nice classical mythology reference here

The wisdom of the principle of purpose binding relates to Odysseus’s encounter with the Sirens. As the story goes, the Sirens lured passing sailors with the enchantment of their seductive voices, causing their ships to crash on the rocky coast. Odysseus wished to hear their song without causing a shipwreck; he wanted to have his cake and eat it too. While he has himself tied to the mast, his men have their ears plugged with beeswax. They are ordered to keep him tied tight, and to refuse any orders he gives to the contrary, while being under the spell of the Sirens as they pass their island. And indeed, though he is lured and would have caused death and destruction if his men had not been so instructed, the ship sails on. This is called self-binding. But it is more than that. There is a division of tasks that prevents him from untying himself. He is forced by others to live by his own rules. This is what purpose binding does for a constitutional democracy.

Hildebrandt, Smart Technologies, p. 156

I think what’s going on here is that Hildebrandt understands that actually getting the GDPR enforced over the whole digital environment is going to require a huge extension of the powers of law over business, organization, and individual practice. From some corners, there’s pessimism about the viability of the European data protection approach (Koops, 2014), arguing that it can’t really be understood or implemented well. Hildebrandt is making a big bet here, essentially saying: purpose-binding on data use is just a natural part of the power of law in general, as a socially performed practice. There’s nothing contingent about purpose-binding in the GDPR; it’s just the most recent manifestation of purpose as an end of law.

Commentary

It’s pretty clear what the agenda of this work is. Hildebrandt is defending the Rule of Law as a social practice of lawyers using admittedly ambiguous natural language over the ‘smart technologies’ that threaten it. This involves both a defense of law as being intrinsically about lawyers using ambiguous natural language, and the power of that law over businesses, etc. For the former, Hildebrandt invokes Radbruch’s view that law is antinomian. For the second point, she connects purpose-binding to purpose as an end of law.

I will continue to play the skeptic here. As is suggested in the quoted package, if one takes legal certainty seriously, then one could easily argue that software code leads to more certain outcomes than natural language based rulings. Moreover, to the extent that justice is a matter of legal formality–attention to the form of cases, and excluding from consideration irrelevant content–then that too weighs in favor of articulation of law in formal logic, which is relatively easy to translate into computer code.

Hildebrandt seems to think that there is something immutable about computer code, in a way that natural language is not. That’s wrong. Software is not built like bridges; software today is written by teams working rapidly to adapt it to many demands (Gürses and Hoboken, 2017). Recognizing this removes one of the major planks of Hildebrandt’s objection to computational law.

It could be argued that “legal certainty” implies a form of algorithmic interpretability: the key question is “certain for whom”. An algorithm that is opaque due to its operational complexity (Burrell, 2016) could, as an implementation of a legal decision, be less predictable to non-specialists than a simpler algorithm. So the tension in a lot of ‘algorithmic accountability’ literature between performance and interpretability would then play directly into the tension, within law, between purpose/instrumentality and certainty-to-citizens.

Overall, the argument here is not compelling yet as a refutation of the idea of law implemented as software code.

As for purpose-binding and the law, I think this may well be the true crux. I wonder if Hildebrandt develops it later in the book. There are not a lot of good computer science models of purpose binding. Tschantz, Datta, and Wing (2012) do a great job mapping out the problem but that research program has not resulted in robust technology for implementation. There may be deep philosophical/mathematical reasons why that is so. This is an angle I’ll be looking out for in further reading.

References

Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society3.1 (2016): 2053951715622512.

Gürses, Seda, and Joris Van Hoboken. “Privacy after the agile turn.” The Cambridge Handbook of Consumer Privacy. Cambridge Univ. Press, 2017. 1-29.

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer, Cham, 2014. 31-62.

Hildebrandt, Mireille. Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing, 2015.

Koops, Bert-Jaap. “The trouble with European data protection law.” International Data Privacy Law 4.4 (2014): 250-261.

Nissenbaum, Helen. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2009.

Tschantz, Michael Carl, Anupam Datta, and Jeannette M. Wing. “Formalizing and enforcing purpose restrictions in privacy policies.” 2012 IEEE Symposium on Security and Privacy. IEEE, 2012.

Beginning to read “Smart Technologies and the End(s) of Law” (Notes on: Hildebrandt, Smart Technologies, Sections 7.1-7.2)

I’m starting to read Mireille Hildebrandt‘s Smart Technologies and the End(s) of Law (2015) at the recommendation of several friends with shared interests in privacy and the tensions between artificial intelligence and the law. As has been my habit with other substantive books, I intend to blog my notes from reading as I get to it, in sections, in a perhaps too stream-of-consciousness, opinionated, and personally inflected way.

For reasons I will get to later, Hildebrandt’s book is a must-read for me. I’ve decided to start by jumping in on Chapter 7, because (a) I’m familiar enough with technology ethics, AI, and privacy scholarship to think I can skip that and come back as needed, and (b) I’m mainly reading because I’m interested in what a scholar of Hildebrandt’s stature says when she tackles the tricky problem of law’s response to AI head on.

I expect to disagree with Hildebrant in the end. We occupy different social positions and, as I’ve argued before, people’s position on various issues of technology policy appears to have a great deal to do with their social position or habitus. However, I know I have a good deal to learn about legal theory while having enough background in philosophy and social theory to parse through what Hildebrandt has to offer. And based on what I’ve read so far, I expect the contours of the possible positions that she draws out to be totally groundbreaking.

Notes on: Hildebrandt, Smart Technologies, §7.1-7.2

“The third part of this book inquires into the implications of smart technologies and data-driven agency for the law.”

– Hildebrandt, Smart Technologies,p.133

Lots of people write about how artificial intelligence presents an existential threat. Normally, they are talking about how a superintelligence is posing an existential threat to humanity. Hildebrandt is arguing something else: she is arguing that smart technologies may pose an existential threat to the law, or the Rule of Law. That is because the law’s “mode of existence” depends on written text, which is a different technical modality, with different affordances, than smart technology.

My take is that the mode of existence of modern law is deeply dependent upon the printing press and the way it has shaped our world. Especially the binary character of legal rules, the complexity of the legal system and the finality of legal decisions are affordances of — amongst things — the ICI [information and communication infrastructure] of the printing press.

– Hildebrandt, Smart Technologies, p.133

This is just so on point, it’s hard to know what to say. I mean, this is obviously on to something. But what?

To make her argument, Hildebrandt provides a crash course in philosophy of law and legal theory, distinguishing a number of perspectives that braid together into an argument. She discusses several different positions:

  • 7.2.1 Law as an essentially contested concept (Gallie). The concept of “law” [1] denotes something valuable, [2] covers intricate complexities, that makes it [3] inherently ambiguous and [4] necessarily vague. This [5] leads interested parties into contest over conceptions. The contest is [6] anchored in past, agreed upon exemplars of the concept, and [7] the contest itself sustains and develops the concept going forward. This is the seven-point framework of an “essentially contested concept”.
  • 7.2.2 Formal legal positivism. Law as a set of legal rules dictated by a sovereign (as opposed to law as a natural moral order) (Austin). Law as a coherent set of rules, defined by its unity (Kelsen). A distinction between substantive rules and rules about rule-making (Hart).
  • 7.2.3 Hermeneutic conceptions. The practice of law is about the creative interpretation of (e.g.) texts (case law, statutes, etc.) to application of new cases. The integrity of law (Dworkin) constrains this interpretation, but the projection of legal meaning into the future is part of the activity of legal practice. Judges “do things with words”–make performative utterances through their actions. Law is not just a system of rules, but a system of meaningful activity.
  • 7.2.3 Pragmatist conceptions (Realism legal positivism). As opposed to the formal legal positivism discusses earlier that sees law as rules, realist legal positivism sees law as a sociological phenomenon. Law is “prophecies of what the courts will do in fact, and nothing more pretentious” (Holmes). Pragmatism, as an epistemology, argues that the meaning of something is its practical effect; this approach could be seen as a constrained version of the hermeneutic concept of law.

To summarize Hildebrandt’s gloss on this material so far: Gallie’s “essentially contested concept” theory is doing the work of setting the stage for Hildebrant’s self-aware intervention into the legal debate. Hildebrandt is going to propose a specific concept of the law, and of the Rule of Law. She is doing this well-aware that this act of scholarship is engaging in contest.

Punchline

I detect in Hildebrandt’s writing a sympathy or preference for hermeneutic approaches to law. Indeed, by opening with Gallie, she sets up the contest about the concept of law as something internal to the hermeneutic processes of the law. These processes, and this contest, are about texts; the proliferation of texts is due to the role of the printing press in modern law. There is a coherent “integrity” to this concept of law.

The most interesting discussion, in my view, is loaded in to what reads like an afterthought: the pragmatist conception of law. Indeed, even at the level of formatting, pragmatism is buried: hermeneutic and pragmatist conceptions of law are combined into one section (7.2.3), where as Gallie and the formal positivists each get their own section (7.2.1 and 7.2.2).

This is odd, because the resonances between pragmatism and ‘smart technology’ are, in Hildebrandt’s admission, quite deep:

Basically, Holmes argued that law is, in fact, what we expect it to be, because it is this expectation that regulates our actions. Such expectations are grounded in past decisions, but if these were entirely deterministic of future decisions we would not need the law — we could settle for logic and simply calculate the outcome of future decisions. No need for interpretation. Holmes claimed, however, that ‘the life of law has not been logic. It has been experience.’ This correlates with a specific conception of intelligence. As we have seen in Chapter 2 and 3, rule-based artificial intelligence, which tried to solve problems by means of deductive logic, has been superseded by machine learning (ML), based on experience.

– Hildebrandt, Smart Technologies, p.142

Hildebrandt considers this connection between pragmatist legal interpretation and machine learning only to reject it summarily in a single paragraph at the end of the section.

If we translate [a maxim of classical pragmatist epistemology] into statistical forecasts we arrive at judgments resulting from ML. However, neither logic nor statistics can attribute meaning. ML-based court decisions would remove the fundamental ambiguity of human language from the centre stage of the law. As noted above, this ambiguity is connected with the value-laden aspect of the concept of law. It is not a drawback of natural language, but what saves us from acting like mindless agents. My take is that an approach based on statistics would reduce judicial and legislative decisions to administration, and thus collapse the Rule of Law. This is not to say that a number of administrative decisions could not be taken by smart computing systems. It is to confirm that such decisions should be brought under the Rule of Law, notably by making them contestable in a court of law.

– Hildebrandt, Smart Technologies, p.143

This is a clear articulation of Hildebrandt’s agenda (“My take is that…”). It is also clearly an aligning the practice of law with contest, ambiguity, and interpretation as opposed to “mindless” activity. Natural language’s ambiguity is a feature, not a bug. Narrow pragmatism, which is aligned with machine learning, is a threat to the Rule of Law

Some reflections

Before diving into the argument, I have to write a bit about my urgent interest in the book. Though I only heard about it recently, my interests have tracked the subject matter for some time.

For some time I have been interested in the connection between philosophical pragmatism and the concerns about AI, which I believe can be traced back to Horkheimer. But I thought nobody was giving the positive case for pragmatism its due. At the end of 2015, totally unaware of “Smart Technologies” (my professors didn’t seem aware of it either…), I decided that I would write my doctoral dissertation thesis defending the bold thesis that yes, we should have AI replace the government. A constitution written in source code. I was going to back the argument up with, among other things, pragmatist legal theory.

I had to drop the argument because I could not find faculty willing to be on the committee for such a dissertation! I have been convinced ever since that this is a line of argument that is actually rather suppressed. I was able to articulate the perspective in a philosophy journal in 2016, but had to abandon the topic.

This was probably good in the long run, since it meant I wrote a dissertation on privacy which addressed many of the themes I was interested in, but in greater depth. In particular, working with Helen Nissenbaum I learned about Hildebrandt’s articles comparing contextual integrity with purpose binding in the GDPR (Hildebrandt, 2013; Hildebrandt, 2014), which at the time my mentors at Berkeley seemed unaware of. I am still working on puzzles having to do with algorithmic implementation or response to the law, and likely will for some time.

Recently, been working at a Law School and have reengaged the interdisciplinary research community at venues like FAT*. This has led me, seemingly unavoidably, back to what I believe to be the crux of disciplinary tension today: the rising epistemic dominance of pragmatist computational statistics–“data science”and its threat to humanistic legal authority, which is manifested in the clash of institutions that are based on each, e.g., iconically, “Silicon Valley” (or Seattle) and the European Union. Because of the explicitly normative aspects of humanistic legal authority, it asserts itself again and again as an “ethical” alternative to pragmatist technocratic power. This is the latest manifestation of a very old debate.

Hildebrandt is the first respectable scholar (a category from which I exclude myself) that I’ve encountered to articulate this point. I have to see where she takes the argument.

So far, however, I think here argument begs the question. Implicitly, the “essentially contested” character of law is due to the ambiguity of natural language and the way in which that necessitates contest over the meaning of words. And so we have a professional class of lawyers and scholars that debate the meaning of words. I believe the the regulatory power of this class is what Hildebrandt refers to as “the Rule of Law”.

While it’s true that an alternative regulatory mechanism based on statistical prediction would be quite different from this sense of “Rule of Law”, it is not clear from Hildebrandt’s argument, yet, why her version of “Rule of Law” is better. The only hint of an argument is the problem of “mindless agents”. Is she worried about the deskilling of the legal profession, or the reduced need for elite contest over meaning? What is hermeneutics offering society, outside of the bounds of its own discourse?

References

Benthall, S. (2016). Philosophy of computational social science. Cosmos and History: The Journal of Natural and Social Philosophy12(2), 13-30.

Sebastian Benthall. Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics. Ph.D. dissertation. Advisors: John Chuang and Deirdre Mulligan. University of California, Berkeley. 2018.

Hildebrandt, Mireille. “Slaves to big data. Or are we?.” (2013).

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer, Cham, 2014. 31-62.

Hildebrandt, Mireille. Smart technologies and the end (s) of law: novel entanglements of law and technology. Edward Elgar Publishing, 2015.

All the problems with our paper, “Racial categories in machine learning”

Bruce Haynes and I were blown away by the reception to our paper, “Racial categories in machine learning“. This was a huge experiment in interdisciplinary collaboration for us. We are excited about the next steps in this line of research.

That includes engaging with criticism. One of our goals was to fuel a conversation in the research community about the operationalization of race. That isn’t a question that can be addressed by any one paper or team of researchers. So one thing we got out of the conference was great critical feedback on potential problems with the approach we proposed.

This post is an attempt to capture those critiques.

Need for participatory design

Khadijah Abdurahman, of Word to RI , issues a subtweeted challenge to us to present our paper to the hood. (RI stands for Roosevelt Island, in New York City, the location of the recently established Cornell Tech campus.)

One striking challenge, raised by Khadijah Abdurahman on Twitter, is that we should be developing peer relationships with the communities we research. I read this as a call for participatory design. It’s true this was not part of the process of the paper. In particular, Ms. Abdurahman points to a part of our abstract that uses jargon from computer science.

There are a lot of ways to respond to this comment. The first is to accept the challenge. I would personally love it if Bruce and I could present our research to folks on Roosevelt Island and get feedback from them.

There are other ways to respond that address the tensions of this comment. One is to point out that in addition to being an accomplished scholar of the sociology of race and how it forms, especially in urban settings, Bruce is a black man who is originally from Harlem. Indeed, Bruce’s family memoir shows his deep and well-researched familiarity with the life of marginalized people of the hood. So a “peer relationship” between an algorithm designer (me) and a member of an affected community (Bruce) is really part of the origin of our work.

Another is to point out that we did not research a particular community. Our paper was not human subjects research; it was about the racial categories that are maintained by the Federal U.S. government and which pervade society in a very general way. Indeed, everybody is affected by these categories. When I and others who looks like me are ascribed “white”, that is an example of these categories at work. Bruce and I were very aware of how different kinds of people at the conference responded to our work, and how it was an intervention in our own community, which is of course affected by these racial categories.

The last point is that computer science jargon is alienating to basically everybody who is not trained in computer science, whether they live in the hood or not. And the fact is we presented our work at a computer science venue. Personally, I’m in favor of universal education in computational statistics, but that is a tall order. If our work becomes successful, I could see it becoming part of, for example, a statistical demography curriculum that could be of popular interest. But this is early days.

The Quasi-Racial (QR) Categories are Not Interpretable

In our presentation, we introduced some terminology that did not make it into the paper. We named the vectors of segregation derived by our procedure “quasi-racial” (QR) vectors, to denote that we were trying to capture dimensions that were race-like, in that they captured the patterns of historic and ongoing racial injustice, without being the racial categories themselves, which we argued are inherently unfair categories of inequality.

First, we are not wedded to the name “quasi-racial” and are very open to different terminology if anybody has an idea for something better to call them.

More importantly, somebody pointed out that these QR vectors may not be interpretable. Given that the conference is not only about Fairness, but also Accountability and Transparency, this critique is certainly on point.

To be honest, I have not yet done the work of surveying the extensive literature on algorithm interpretability to get a nuanced response. I can give two informal responses. The first is that one assumption of our proposal is that there is something wrong with how race and racial categories are intuitive understood. Normal people’s understanding of race is, of course, ridden with stereotypes, implicit biases, false causal models, and so on. If we proposed an algorithm that was fully “interpretable” according to most people’s understanding of what race is, that algorithm would likely have racist or racially unequal outcomes. That’s precisely the problem that we are trying to get at with our work. In other words, when categories are inherently unfair, interpretability and fairness may be at odds.

The second response is that educating people about how the procedure works and why its motivated is part of what makes its outcomes interpretable. Teaching people about the history of racial categories, and how those categories are both the cause and effect of segregation in space and society, makes the algorithm interpretable. Teaching people about Principal Component Analysis, the algorithm we employ, is part of what makes the system interpretable. We are trying to drop knowledge; I don’t think we are offering any shortcuts.

Principal Component Analysis (PCA) may not be the right technique

An objection from the computer science end of the spectrum was that our proposed use of Principal Component Analysis (PCA) was not well-motivated enough. PCA is just one of many dimensionality reduction techniques–why did we choose it in particular? PCA has many assumptions about the input embedded within it, including the component vectors of interest are linear combinations of the inputs. What if the best QR representation is a non-linear combination of the input variables? And our use of unsupervised learning, as a general criticism, is perhaps lazy, since in order to validate its usefulness we will need to test it with labeled data anyway. We might be better off with a more carefully calibrated and better motivated alternative technique.

These are all fair criticisms. I am personally not satisfied with the technical component of the paper and presentation. I know the rigor of the analysis is not of the standard that would impress a machine learning scholar and can take full responsibility for that. I hope to do better in a future iteration of the work, and welcome any advice on how to do that from colleagues. I’d also be interested to see how more technically skilled computer scientists and formal modelers address the problem of unfair racial categories that we raised in the paper.

I see our main contribution as the raising of this problem of unfair categories, not our particular technical solution to it. As a potential solution, I hope that it’s better than nothing, a step in the right direction, and provocative. I subscribe to the belief that science is an iterative process and look forward to the next cycle of work.

Please feel free to reach out if you have a critique of our work that we’ve missed. We do appreciate all the feedback!

Reading O’Neil’s Weapons of Math Destruction

I probably should have already read Cathy O’Neil’s Weapons of Math Destruction. It was a blockbuster of the tech/algorithmic ethics discussion. It’s written by an accomplished mathematician, which I admire. I’ve also now seen O’Neil perform bluegrass music twice in New York City and think her band is great. At last I’ve found a copy and have started to dig in.

On the other hand, as is probably clear from other blog posts, I have a hard time swallowing a lot of the gloomy political work that puts the role of algorithms in society in such a negative light. I encounter is very frequently, and every time feel that some misunderstanding must have happened; something seems off.

It’s very clear that O’Neil can’t be accused of mathophobia or not understanding the complexity of the algorithms at play, which is an easy way to throw doubt on the arguments of some technology critics. Yet perhaps because it’s a popular book and not an academic work of Science and Technology Studies, I haven’t it’s arguments parsed through and analyzed in much depth.

This is a start. These are my notes on the introduction.

O’Neil describes the turning point in her career where she soured on math. After being an academic mathematician for some time, O’Neil went to work as a quantitative analyst for D.E. Shaw. She saw it as an opportunity to work in a global laboratory. But then the 2008 financial crisis made her see things differently.

The crash made it all too clear that mathematics, once my refuge, was not only deeply entangled in the world’s problems but also fueling many of them. The housing crisis, the collapse of major financial institutions, the rise of unemployment–all had been aided and abetted by mathematicians wielding magic formulas. What’s more, thanks to the extraordinary powers that I loved so much, math was able to combine with technology to multiply the chaos and misfortune, adding efficiency and scale to systems I now recognized as flawed.

O’Neil, Weapons of Math Destruction, p.2

As an independent reference on the causes of the 2008 financial crisis, which of course has been a hotly debated and disputed topic, I point to Sassen’s 2017 “Predatory Formations” article. Indeed, the systems that developed the sub-prime mortgage market were complex, opaque, and hard to regulate. Something went seriously wrong there.

But was it mathematics that was the problem? This is where I get hung up. I don’t understand the mindset that would attribute a crisis in the financial system to the use of abstract, logical, rigorous thinking. Consider the fact that there would not have been a financial crisis if there had not been a functional financial services system in the first place. Getting a mortgage and paying them off, and the systems that allow this to happen, all require mathematics to function. When these systems operate normally, they are taken for granted. When they suffer a crisis, when the system fails, the mathematics takes the blame. But a system can’t suffer a crisis if it didn’t start working rather well in the first place–otherwise, nobody would depend on it. Meanwhile, the regulatory reaction to the 2008 financial crisis required, of course, more mathematicians working to prevent the same thing from happening again.

So in this case (and I believe others) the question can’t be, whether mathematics, but rather which mathematics. It is so sad to me that these two questions get conflated.

O’Neil goes on to describe a case where an algorithm results in a teacher losing her job for not adding enough value to her students one year. An analysis makes a good case that the cause of her students’ scores not going up is that in the previous year, the students’ scores were inflated by teachers cheating the system. This argument was not consider conclusive enough to change the administrative decision.

Do you see the paradox? An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad. The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.

O’Neil, WMD, p.10

Now this is a fascinating point, and one that I don’t think has been taken up enough in the critical algorithms literature. It resonates with a point that came up earlier, that traditional collective human decision making is often driven by agreement on narratives, whereas automated decisions can be a qualitatively different kind of collective action because they can make judgments based on probabilistic judgments.

I have to wonder what O’Neil would argue the solution to this problem is. From her rhetoric, it seems like her recommendation must be prevent automated decisions from making probabilistic judgments. In other words, one could raise the evidenciary standard for algorithms so that they we equal to the standards that people use with each other.

That’s an interesting proposal. I’m not sure what the effects of it would be. I expect that the result would be lower expected values of whatever target was being optimized for, since the system would not be able to “take bets” below a certain level of confidence. One wonders if this would be a more or less arbitrary system.

Sadly, in order to evaluate this proposal seriously, one would have to employ mathematics. Which is, in O’Neil’s rhetoric, a form of evil magic. So, perhaps it’s best not to try.

O’Neil attributes the problems of WMD’s to the incentives of the data scientists building the systems. Maybe they know that their work effects people, especially the poor, in negative ways. But they don’t care.

But as a rule, the people running the WMD’s don’t dwell on these errors. Their feedback is money, which is also their incentive. Their systems are engineered to gobble up more data fine-tune their analytics so that more money will pour in. Investors, of course, feast on these returns and shower WMD companies with more money.

O’Neil, WMD, p.13

Calling out greed as the problem is effective and true in a lot of cases. I’ve argued myself that the real root of the technology ethics problem is capitalism: the way investors drive what products get made and deployed. This is a worthwhile point to make and one that doesn’t get made enough.

But the logical implications of this argument are off. Suppose it is true that “as a rule”, the makers of algorithms that do harm are made by people responding to the incentives of private capital. (IF harmful algorithm, THEN private capital created it.) That does not mean that there can’t be good algorithms as well, such as those created in the public sector. In other words, there are algorithms that are not WMDs.

So the insight here has to be that private capital investment corrupts the process of designing algorithms, making them harmful. One could easily make the case that private capital investment corrupts and makes harmful many things that are not algorithmic as well. For example, the historic trans-Atlantic slave trade was a terribly evil manifestation of capitalism. It did not, as far as I know, depend on modern day computer science.

Capitalism here looks to be the root of all evil. The fact that companies are using mathematics is merely incidental. And O’Neil should know that!

Here’s what I find so frustrating about this line of argument. Mathematical literacy is critical for understanding what’s going on with these systems and how to improve society. O’Neil certainly has this literacy. But there are many people who don’t have it. There is a power disparity there which is uncomfortable for everybody. But while O’Neil is admirably raising awareness about how these kinds of technical systems can and do go wrong, the single-minded focus and framing risks giving people the wrong idea that these intellectual tools are always bad or dangerous. That is not a solution to anything, in my view. Ignorance is never more ethical than education. But there is an enormous appetite among ignorant people for being told that it is so.

References

O’Neil, Cathy. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, 2017.

Sassen, Saskia. “Predatory Formations Dressed in Wall Street Suits and Algorithmic Math.” Science, Technology and Society22.1 (2017): 6-20.

computational institutions as non-narrative collective action

Nils Gilman recently pointed to a book chapter that confirms the need for “official futures” in capitalist institutions.

Nils indulged me in a brief exchange that helped me better grasp at a bothersome puzzle.

There is a certain class of intellectuals that insist on the primacy of narratives as a mode of human experience. These tend to be, not too surprisingly, writers and other forms of storytellers.

There is a different class of intellectuals that insists on the primacy of statistics. Statistics does not make it easy to tell stories because it is largely about the complexity of hypotheses and our lack of confidence in them.

The narrative/statistic divide could be seen as a divide between academic disciplines. It has often been taken to be, I believe wrongly, the crux of the “technology ethics” debate.

I questioned Nils as to whether his generalization stood up to statistically driven allocation of resources; i.e., those decisions made explicitly on probabilistic judgments. He argued that in the end, management and collective action require consensus around narrative.

In other words, what keeps narratives at the center of human activity is that (a) humans are in the loop, and (b) humans are collectively in the loop.

The idea that communication is necessary for collective action is one I used to put great stock in when studying Habermas. For Habermas, consensus, and especially linguistic consensus, is how humanity moves together. Habermas contrasted this mode of knowledge aimed at consensus and collective action with technical knowledge, which is aimed at efficiency. Habermas envisioned a society ruled by communicative rationality, deliberative democracy; following this line of reasoning, this communicative rationality would need to be a narrative rationality. Even if this rationality is not universal, it might, in Habermas’s later conception of governance, be shared by a responsible elite. Lawyers and a judiciary, for example.

The puzzle that recurs again and again in my work has been the challenge of communicating how technology has become an alternative form of collective action. The claim made by some that technologists are a social “other” makes more sense if one sees them (us) as organizing around non-narrative principles of collective behavior.

It is I believe beyond serious dispute that well-constructed, statistically based collective decision-making processes perform better than many alternatives. In the field of future predictions, Phillip Tetlock’s work on superforecasting teams and prior work on expert political judgment has long stood as an empirical challenge to the supposed primacy of narrative-based forecasting. This challenge has not been taken up; it seems rather one-sided. One reason for this may be because the rationale for the effectiveness of these techniques rests ultimately in the science of statistics.

It is now common to insist that Artificial Intelligence should be seen as a sociotechnical system and not as a technological artifact. I wholeheartedly agree with this position. However, it is sometimes implied that to understand AI as a social+ system, one must understand it one narrative terms. This is an error; it would imply that the collective actions made to build an AI system and the technology itself are held together by narrative communication.

But if the whole purpose of building an AI system is to collectively act in a way that is more effective because of its facility with the nuances of probability, then the narrative lens will miss the point. The promise and threat of AI is that is delivers a different, often more effective form of collective or institution. I’ve suggested that computational institution might be the best way to refer to such a thing.

When *shouldn’t* you build a machine learning system?

Luke Stark raises an interesting question, directed at “ML practitioner”:

As an “ML practitioner” in on this discussion, I’ll have a go at it.

In short, one should not build an ML system for making a class of decisions if there is already a better system for making that decision that does not use ML.

An example of a comparable system that does not use ML would be a team of human beings with spreadsheets, or a team of people employed to judge for themselves.

There are a few reasons why a non-ML system could be superior in performance to an ML system:

  • The people involved could have access to more data, in the course of their lives, in more dimensions of variation, than is accessible by the machine learning system.
  • The people might have more sensitized ability to make semantic distinctions, such as in words or images, than an ML system
  • The problem to be solved could be a “wicked problem” that is itself over a very high-dimensional space of options, with very irregular outcomes, such that they are not amenable to various forms of, e.g., linear approximations
  • The people might be judging an aspect of their own social environment, such that the outcome’s validity is socially procedural (as in the outcome of a vote, or of an auction)

These are all fine reasons not to use an ML system. On the other hand, the term “ML” has been extended, as with “AI”, to include many hybrid human-computer systems, which has led to some confusion. So, for example. crowdsourced labels of images provide useful input data to ML systems. This hybrid system might perform semantic judgments over a large scale of data, at a high speed, at a tolerable rate of accuracy. Does this system count as an ML system? Or is it a form of computational institution that rivals other ways of solving the problem, and just so happens to have a machine learning algorithm as part of its process?

Meanwhile, the research frontier of machine learning is all about trying to solve problems that previously haven’t been solved, or solved as well, as alternative kinds of systems. This means there will always be a disconnect between machine learning research, which is trying to expand what it is possible to do with machine learning, and what machine learning research should, today, be deployed. Sometimes, research is done to develop technology that is not mature enough to deploy.

We should expect that a lot of ML research is done on things that should not ultimately be deployed! That’s because until we do the research, we may not understand the problem well enough to know the consequences of deployment. There’s a real sense in which ML research is about understanding the computational contours of a problem, whereas ML industry practice is about addressing the problems customers have with an efficient solution. Often this solution is a hybrid system in which ML only plays a small part; the use of ML here is really about a change in the institutional structure, not so much a part of what service is being delivered.

On the other hand, there have been a lot of cases–search engines and social media being important ones–where the scale of data and the use of ML for processing has allowed for a qualitatively different form of product or service. These are now the big deal companies we are constantly talking about. These are pretty clearly cases of successful ML.

computational institutions

As the “AI ethics” debate metastasizes in my newsfeed and scholarly circles, I’m struck by the frustrations of technologists and ethicists who seem to be speaking past each other.

While these tensions play out along disciplinary fault-lines, for example, between technologists and science and technology studies (STS), the economic motivations are more often than not below the surface.

I believe this is to some extent a problem of the nomenclature, which is again the function of the disciplinary rifts involved.

Computer scientists work, generally speaking, on the design and analysis of computational systems. Many see their work as bounded by the demands of the portability and formalizability of technology (see Selbst et al., 2019). That’s their job.

This is endlessly unsatisfying to critics of the social impact of technology. STS scholars will insist on changing the subject to “sociotechnical systems”, a term that means something very general: the assemblage of people and artifacts that are not people. This, fairly, removes focus from the computational system and embeds it in a social environment.

A goal of this kind of work seems to be to hold computational systems, as they are deployed and used socially, accountable. It must be said that once this happens, we are no longer talking about the specialized domain of computer science per se. It is a wonder why STS scholars are so often picking fights with computer scientists, when their true beef seems to be with businesses that use and deploy technology.

The AI Now Institute has attempted to rebrand the problem by discussing “AI Systems” as, roughly, those sociotechnical systems that use AI. This is one the one hand more specific–AI is a particular kind of technology, and perhaps it has particular political consequences. But their analysis of AI systems quickly overflows into sweeping claims about “the technology industry”, and it’s clear that most of their recommendations have little to do with AI, and indeed are trying, once again, to change the subject from discussion of AI as a technology (a computer science research domain) to a broader set of social and political issues that do, in fact, have their own disciplines where they have been researched for years.

The problem, really, is not that any particular conversation is not happening, or is being excluded, or is being shut down. The problem is that the engineering focused conversation about AI-as-a-technology has grown very large and become an awkward synecdoche for the rise of major corporations like Google, Apple, Amazon, Facebook, and Netflix. As these corporations fund and motivate a lot of research, there’s a question of who is going to get pieces of the big pie of opportunity these companies represent, either in terms of research grants or impact due to regulation, education, etc.

But there are so many aspects of these corporations that are neither addressed by the terms “sociotechnical system”, which is just so broad, and “AI System”, which is as broad and rarely means what you’d think it does (that the system uses AI is incidental if not unnecessary; what matters is that it’s a company operating in a core social domain via primarily technological user interfaces). Neither of these gets at the unit of analysis that’s really of interest.

An alternative: “computational institution”. Computational, in the sense of computational cognitive science and computational social science: it denotes the essential role of theory of computation and statistics in explaining the behavior of the phenomenon being studied. “Institution”, in the sense of institutional economics: the unit is a firm, which is comprised of people, their equipment, and their economic relations, to their suppliers and customers. An economic lens would immediately bring into focus “the data heist” and the “role of machines” that Nissenbaum is concerned are being left to the side.

For fairness in machine learning, we need to consider the unfairness of racial categorization

Pre-prints of papers accepted to this coming 2019 Fairness, Accountability, and Transparency conference are floating around Twitter. From the looks of it, many of these papers add a wealth of historical and political context, which I feel is a big improvement.

A noteworthy paper, in this regard, is Hutchinson and Mitchell’s “50 Years of Test (Un)fairness: Lessons for Machine Learning”, which puts recent ‘fairness in machine learning’ work in the context of very analogous debates from the 60’s and 70’s that concerned the use of testing that could be biased due to cultural factors.

I like this paper a lot, in part because it is very thorough and in part because it tees up a line of argument that’s dear to me. Hutchinson and Mitchell raise the question of how to properly think about fairness in machine learning when the protected categories invoked by nondiscrimination law are themselves social constructs.

Some work on practically assessing fairness in ML has tackled the problem of using race as a construct. This echoes concerns in the testing literature that stem back to at least 1966: “one stumbles immediately over the scientific difficulty of establishing clear yardsticks by which people can be classified into convenient racial categories” [30]. Recent approaches have used Fitzpatrick skin type or unsupervised clustering to avoid racial categorizations [7, 55]. We note that the testing literature of the 1960s and 1970s frequently uses the phrase “cultural fairness” when referring to parity between blacks and whites.

They conclude that this is one of the areas where there can be a lot more useful work:

This short review of historical connections in fairness suggest several concrete steps forward for future research in ML fairness: Diving more deeply into the question of how subgroups are defined, suggested as early as 1966 [30], including questioning whether subgroups should be treated as discrete categories at all, and how intersectionality can be modeled. This might include, for example, how to quantify fairness along one dimension (e.g., age) conditioned on another dimension (e.g., skin tone), as recent work has begun to address [27, 39].

This is all very cool to read, because this is precisely the topic that Bruce Haynes and I address in our FAT* paper, “Racial categories in machine learning” (arXiv link). The problem we confront in this paper is that the racial categories we are used to using in the United States (White, Black, Asian) originate in the white supremacy that was enshrined into the Constitution when it was formed and perpetuated since then through the legal system (with some countervailing activity during the Civil Rights Movement, for example). This puts “fair machine learning” researchers in a bind: either they can use these categories, which have always been about perpetuating social inequality, or they can ignore the categories and reproduce the patterns of social inequality that prevail in fact because of the history of race.

In the paper, we propose a third option. First, rather than reify racial categories, we propose breaking race down into the kinds of personal features that get inscribed with racial meaning. Phenotype properties like skin type and ocular folds are one such set of features. Another set are events that indicate position in social class, such as being arrested or receiving welfare. Another set are facts about the national and geographic origin of ones ancestors. These facts about a person are clearly relevant to how racial distinctions are made, but are themselves more granular and multidimensional than race.

The next step is to detect race-like categories by looking at who is segregated from each other. We propose an unsupervised machine learning technique that works with the distribution of the phenotype, class, and ancestry features across spatial tracts (as in when considering where people physically live) or across a social network (as in when considering people’s professional networks, for example). Principal component analysis can identify what race-like dimensions capture the greatest amounts of spatial and social separation. We hypothesize that these dimensions will encode the ways racial categorization has shaped the social structure in tangible ways; these effects may include both politically recognized forms of discrimination as well as forms of discrimination that have not yet been surfaced. These dimensions can then be used to classify people in race-like ways as input to fairness interventions in machine learning.

A key part of our proposal is that race-like classification depends on the empirical distribution of persons in physical and social space, and so are not fixed. This operationalizes the way that race is socially and politically constructed without reifying the categories in terms that reproduce their white supremacist origins.

I’m quite stoked about this research, though obviously it raises a lot of serious challenges in terms of validation.