Digifesto

Tag: FAT*

Response to Abdurahman

Abdurahman has responded to my response to her tweet about my paper with Bruce Haynes, and invited me to write a rebuttal. While I’m happy to do so–arguing with intellectuals on the internet is probably one of my favorite things to do–it is not easy to rebut somebody with whom you have so little disagreement.

Abdurahman makes a number of points:

  1. Our paper, “Racial categories in machine learning”, omits the social context in which algorithms are enacted.
  2. The paper ignores whether computational thinking “acolytes like [me]” should be in the position of determining civic decisions.
  3. That the ontological contributions of African American Vernacular English (AAVE) are not present in the FAT* conference and that constitutes a hermeneutic injustice. (I may well have misstated this point).
  4. The positive reception to our paper may be due to its appeal to people with a disingenuous, lazy, or uncommitted racial politics.
  5. “Participatory design” does not capture Abdurahman’s challenge of “peer” design. She has a different and more broadly encompassing set of concerns: “whose language is used, whose viewpoint and values are privileged, whose agency is extended, and who has the right to frame the “problem”.”
  6. That our paper misses the point about predictive policing, from the perspective of people most affected by disparities in policing. Machine learning classification is not the right frame of the problem. The problem is an unjust prison system and, more broadly the unequal distribution of power that is manifested in the academic discourse itself. “[T]he problem is framed wrongly — it is not just that classification systems are inaccurate or biased, it is who has the power to classify, to determine the repercussions / policies associated thereof and their relation to historical and accumulated injustice?”

I have to say that I am not a stranger most of this line of thought and have great sympathy for the radical position expressed.

I will continue to defend our paper. Re: point 1, a major contribution of our paper was that it shed light on the political construction of race, especially race in the United States, which is absolutely part of “the social context in which algorithmic decision making is enacted”. Abdurahman must be referring to some other aspect of the social context. One problem we face as academic researchers is that the entire “social context” of algorithmic decision-making is the whole frickin’ world, and conference papers are about 12 pages or so. I thought we did a pretty good job of focusing on one, important and neglected aspect of that social context, the political formation of race, which as far as I know has never previously been addressed in a computer science paper. (I’ve written more about this point here).

Re: point 2, it’s true we omit a discussion of the relevance of computational thinking to civic decision-making. That is because this is a safe assumption to make in a publication to that venue. I happen to agree with that assumption, which is why I worked hard to submit a paper to that conference. If I didn’t think computational thinking was relevant, I probably would be doing something else with my time. That said, I think it’s wildly flattering and inaccurate to say that I, personally, have any control over “civic decision-making”. I really don’t, and I’m not sure why you’d think that, except for the erroneous myth that computer science research is, in itself, political power. It isn’t; that’s a lie that the tech companies have told the world.

I am quite aware (re: point 3) that my embodied and social “location” is quite different from Abdurahman’s. For example, unlike Abdurahman, it would be utterly pretentious for me to posture or “front” with AAVE. I simply have no access to its ontological wisdom, and could not be the conduit of it into any discourse, academic or informal. I have and use different resources; I am also limited by my positionality like anybody else. Sorry.

“Woke” white liberals potentially liking our argument? (Re: point 4) Fair. I don’t think that means our argument is bad or that the points aren’t worth making.

Re: point 5: I must be forgiven for not understanding the full depth of Abdurahman’s methodological commitments on the basis of a single tweet. There are a lot of different design methodologies and their boundaries are disputed. I see now that the label of “participatory design” is not sufficiently critical or radical enough to capture what she has in mind. I’m pleased to see she is working with Tap Parikh on this, who has a lot of experience with critical/radical HCI methods. I’m personally not an expert on any of this stuff. I do different work.

Re: point 6: My personal opinions about the criminal justice system did not make it into our paper, which again was a focused scientific article trying to make a different point. Our paper was about how racial categories are formed, how they are unfair, and how a computational system designed for fairness might address that problem. I agree that this approach is unlikely to have much meaningful impact on the injustices of the cradle-to-prison system in the United States, the prison-industrial complex, or the like. Based on what I’ve heard so far, the problems there would be best solved by changing the ways judges are trained. I don’t have any say in that, though–I don’t have a law degree.

In general, while I see Abdurahman’s frustrations as valid (of course!), I think it’s ironic and frustrating that she targets our paper as an emblem of the problems with the FAT* conference, with computer science, and with the world at large. First, our paper was not a “typical” FAT* paper; it was a very unusual one, positioned to broaden the scope of what’s discussed there, motivated in part by my own criticisms of the conference the year before. It was also just one paper: there’s tons of other good work at that conference, and the conversation is quite broad. I expect the best solution to the problem is to write and submit different papers. But it may also be that other venues are better for addressing the problems raised.

I’ll conclude that many of the difficulties and misunderstandings that underlie our conversation are a result of a disciplinary collapse that is happening because of academia’s relationship with social media. Language’s meaning depends on its social context, and social media is notoriously a place where contexts collapse. It is totally unreasonable to argue that everybody in the world should be focused on what you think is most important. In general, I think battles over “framing” on the Internet are stupid, and that the fact that these kinds of battles have become so politically prominent is a big part of why our society’s politics are so stupid. The current political emphasis on the symbolic sphere is a distraction from more consequential problems of economic and social structure.

As I’ve noted elsewhere, one reason why I think Haynes’s view of race is refreshing (as opposed to a lot of what passes for “critical race theory” in popular discussion) is that it locates the source of racial inequality in structure–spatial and social segregation–and institutional power–especially, the power of law. In my view, this politically substantive view of race is, if taken seriously, more radical than one based on mere “discourse” or “fairness” and demands a more thorough response. Codifying that response, in computational thinking, was the goal of our paper.

This is a more concrete and specific way of dealing with the power disparities that are at the heart of Abdurahman’s critique. Vague discourse and intimations about “privilege”, “agency”, and “power”, without an account of the specific mechanisms of that power, are weak.

Why STS is not the solution to “tech ethics”

“Tech ethics” are in (1) (2) (3) and a popular refrain at FAT* this year was that sensitivity to social and political context is the solution to the problems of unethical technology. How do we bring this sensitivity to technical design? Using the techniques of Science and Technology Studies (STS), argue variously Dobbe and Ames, as well as Selbst et al. (2019). Value Sensitive Design (VSD) (Friedman and Bainbridge, 2004) is one typical STS-branded technique for bringing this political awareness into the design process. In general, there is broad agreement that computer scientists should be working with social scientists when developing socially impactful technologies.

In this blog post, I argue that STS is not the solution to “tech ethics” that it tries to be.

Encouraging computer scientists to collaborate with social science domain experts is a great idea. My paper with Bruce Haynes (1) (2) (3) is an example of this kind of work. In it, we drew from sociology of race to inform a technical design that addressed the unfairness of racial categories. Significantly, in my view, we did not use STS in our work. Because the social injustices we were addressing were due to broad reaching social structures and politically constructed categories, we used sociology to elucidate what was at stake and what sorts of interventions would be a good idea.

It is important to recognize that there are many different social sciences dealing with “social and political context”, and that STS, despite its interdisciplinarity, is only one of them. This is easily missed in an interdisciplinary venue in which STS is active, because STS is somewhat activist in asserting its own importance in these venues. STS frequently positions itself as a reminder to blindered technologists that there is a social world out there. “Let me tell you about what you’re missing!” That’s it’s shtick. Because of this positioning, STS scholars frequently get a seat at the table with scientists and technologists. It’s a powerful position, in sense.

What STS scholars tend to ignore is how and when other forms of social scientists involve themselves in the process of technical design. For example, at FAT* this year there were two full tracks of Economic Models. Economic Models. Economics is a well-established social scientific discipline that has tools for understanding how a particular mechanism can have unintended effects when put into a social context. In economics, this is called “mechanism design”. It addresses what Selbst et al. might call the “Ripple Effect Trap”–the fact that a system in context may have effects that are different from the intention of designers. I’ve argued before that wiser economics are something we need to better address technology ethics, especially if we are talking about technology deployed by industry, which is most of it! But despite deep and systematic social scientific analysis of secondary and equilibrium effects at the conference, these peer-reviewed works are not acknowledged by STS interventionists. Why is that?

As usual, quantitative social scientists are completely ignored by STS-inspired critiques of technologists and their ethics. That is too bad, because at the scale at which these technologies are operating (mainly, we are discussing civic- or web-scale automated decision making systems that are inherently about large numbers of people), fuzzier debates about “values” and contextualized impact would surely benefit from quantitative operationalization.

The problem is that STS is, at its heart, a humanistic discipline, a subfield of anthropology. If and when STS does not deny the utility or truth or value of mathematization or quantification entirely, as a field of research it is methodologically skeptical about such things. In the self-conception of STS, this methodological relativism is part of its ethnographic rigor. This ethnographic relativism is more or less entirely incompatible with formal reasoning, which aspires to universal internal validity. At a moralistic level, it is this aspiration of universal internal validity that is so bedeviling to the STS scholar: the mathematics are inherently distinct from an awareness of the social context, because social context can only be understood in its ethnographic particularity.

This is a false dichotomy. There are other social sciences that address social and political context that do not have the same restrictive assumptions of STS. Some of these are quantitative, but not all of them are. There are qualitative sociologists and political scientists with great insights into social context that are not disciplinarily allergic to the standard practices of engineering. In many ways, these kinds of social sciences are far more compatible with the process of designing technology than STS! For example, the sociology we draw on in our “Racial categories in machine learning” paper is variously: Gramscian racial hegemony theory, structuralist sociology, Bourdieusian theories of social capital, and so on. Significantly, these theories are not based exclusively on ethnographic method. They are based on disciplines that happily mix historical and qualitative scholarship with quantitative research. The object of study is the social world, and part of the purpose of the research is to develop politically useful abstractions from it that generalize and can be measured. This is the form of social sciences that is compatible with quantitative policy evaluation, the sort of thing you would want to use if, for example, understanding the impact of an affirmative action policy.

Given the widely acknowledge truism that public sector technology design often encodes and enacts real policy changes (a point made in Deirdre Mulligan’s keynote), it would make sense to understand the effects of these technologies using the methodologies of policy impact evaluation. That would involve enlisting the kinds of social scientific expertise relevant to understand society at large!

But that is absolutely not what STS has to offer. STS is, at best, offering a humanistic evaluation of the social processes of technology design. The ontology of STS is flat, and its epistemology and ethics are immediate: the design decision comes down to a calculus of “values” of different “stakeholders”. Ironically, this is a picture of social context that often seems to neglect the political and economic context of that context. It is not an escape from empty abstraction. Rather, it insists on moving from clear abstractions to more nebulous ones, “values” like “fairness”, maintaining that if the conversation never ends and the design never gets formalized, ethics has been accomplished.

This has proven, again and again, to be a rhetorically effective position for research scholarship. It is quite popular among “ethics” researchers that are backed by corporate technology companies. That is quite possibly because the form of “ethics” that STS offers, for all of its calls for political sensitivity, is devoid of political substance. An apples-to-apples comparison of “values”, without considering the social origins of those values and the way those values are grounded in political interests that are not merely about “what we think is important in life”, but real contests over resource allocation. The observation by Ames et al. (2011) that people’s values with respect to technology varies with socio-economic class is terribly relevant, Bourdieusian lesson in how the standpoint of “values sensitivity” may, when taken seriously, run up against the hard realities of political agonism. I don’t believe STS researchers are truly naive about these points; however, in their rhetoric of design intervention, conducted in labs but isolated from the real conditions of technology firms, there is an idealism that can only survive under the self-imposed severity of STS’s own methodological restrictions.

Independent scholars can take up this position and publish daring pieces, winning the moral high ground. But that is not a serious position to take in an industrial setting, or when pursuing generalizable knowledge about the downstream impact of a design on a complex social system. Those empirical questions require different tools, albeit far more unwieldy ones. Complex survey instruments, skilled data analysis, and substantive social theory are needed to arrive at solid conclusions about the ethical impact of technology.

References

Ames, M. G., Go, J., Kaye, J. J., & Spasojevic, M. (2011, March). Understanding technology choices and values through social class. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (pp. 55-64). ACM.

Friedman, B., & Bainbridge, W. S. (2004). Value sensitive design.

Selbst, A. D., Friedler, S., Venkatasubramanian, S., & Vertesi, J. (2018, August). Fairness and Abstraction in Sociotechnical Systems. In ACM Conference on Fairness, Accountability, and Transparency (FAT*).

All the problems with our paper, “Racial categories in machine learning”

Bruce Haynes and I were blown away by the reception to our paper, “Racial categories in machine learning“. This was a huge experiment in interdisciplinary collaboration for us. We are excited about the next steps in this line of research.

That includes engaging with criticism. One of our goals was to fuel a conversation in the research community about the operationalization of race. That isn’t a question that can be addressed by any one paper or team of researchers. So one thing we got out of the conference was great critical feedback on potential problems with the approach we proposed.

This post is an attempt to capture those critiques.

Need for participatory design

Khadijah Abdurahman, of Word to RI , issues a subtweeted challenge to us to present our paper to the hood. (RI stands for Roosevelt Island, in New York City, the location of the recently established Cornell Tech campus.)

One striking challenge, raised by Khadijah Abdurahman on Twitter, is that we should be developing peer relationships with the communities we research. I read this as a call for participatory design. It’s true this was not part of the process of the paper. In particular, Ms. Abdurahman points to a part of our abstract that uses jargon from computer science.

There are a lot of ways to respond to this comment. The first is to accept the challenge. I would personally love it if Bruce and I could present our research to folks on Roosevelt Island and get feedback from them.

There are other ways to respond that address the tensions of this comment. One is to point out that in addition to being an accomplished scholar of the sociology of race and how it forms, especially in urban settings, Bruce is a black man who is originally from Harlem. Indeed, Bruce’s family memoir shows his deep and well-researched familiarity with the life of marginalized people of the hood. So a “peer relationship” between an algorithm designer (me) and a member of an affected community (Bruce) is really part of the origin of our work.

Another is to point out that we did not research a particular community. Our paper was not human subjects research; it was about the racial categories that are maintained by the Federal U.S. government and which pervade society in a very general way. Indeed, everybody is affected by these categories. When I and others who looks like me are ascribed “white”, that is an example of these categories at work. Bruce and I were very aware of how different kinds of people at the conference responded to our work, and how it was an intervention in our own community, which is of course affected by these racial categories.

The last point is that computer science jargon is alienating to basically everybody who is not trained in computer science, whether they live in the hood or not. And the fact is we presented our work at a computer science venue. Personally, I’m in favor of universal education in computational statistics, but that is a tall order. If our work becomes successful, I could see it becoming part of, for example, a statistical demography curriculum that could be of popular interest. But this is early days.

The Quasi-Racial (QR) Categories are Not Interpretable

In our presentation, we introduced some terminology that did not make it into the paper. We named the vectors of segregation derived by our procedure “quasi-racial” (QR) vectors, to denote that we were trying to capture dimensions that were race-like, in that they captured the patterns of historic and ongoing racial injustice, without being the racial categories themselves, which we argued are inherently unfair categories of inequality.

First, we are not wedded to the name “quasi-racial” and are very open to different terminology if anybody has an idea for something better to call them.

More importantly, somebody pointed out that these QR vectors may not be interpretable. Given that the conference is not only about Fairness, but also Accountability and Transparency, this critique is certainly on point.

To be honest, I have not yet done the work of surveying the extensive literature on algorithm interpretability to get a nuanced response. I can give two informal responses. The first is that one assumption of our proposal is that there is something wrong with how race and racial categories are intuitive understood. Normal people’s understanding of race is, of course, ridden with stereotypes, implicit biases, false causal models, and so on. If we proposed an algorithm that was fully “interpretable” according to most people’s understanding of what race is, that algorithm would likely have racist or racially unequal outcomes. That’s precisely the problem that we are trying to get at with our work. In other words, when categories are inherently unfair, interpretability and fairness may be at odds.

The second response is that educating people about how the procedure works and why its motivated is part of what makes its outcomes interpretable. Teaching people about the history of racial categories, and how those categories are both the cause and effect of segregation in space and society, makes the algorithm interpretable. Teaching people about Principal Component Analysis, the algorithm we employ, is part of what makes the system interpretable. We are trying to drop knowledge; I don’t think we are offering any shortcuts.

Principal Component Analysis (PCA) may not be the right technique

An objection from the computer science end of the spectrum was that our proposed use of Principal Component Analysis (PCA) was not well-motivated enough. PCA is just one of many dimensionality reduction techniques–why did we choose it in particular? PCA has many assumptions about the input embedded within it, including the component vectors of interest are linear combinations of the inputs. What if the best QR representation is a non-linear combination of the input variables? And our use of unsupervised learning, as a general criticism, is perhaps lazy, since in order to validate its usefulness we will need to test it with labeled data anyway. We might be better off with a more carefully calibrated and better motivated alternative technique.

These are all fair criticisms. I am personally not satisfied with the technical component of the paper and presentation. I know the rigor of the analysis is not of the standard that would impress a machine learning scholar and can take full responsibility for that. I hope to do better in a future iteration of the work, and welcome any advice on how to do that from colleagues. I’d also be interested to see how more technically skilled computer scientists and formal modelers address the problem of unfair racial categories that we raised in the paper.

I see our main contribution as the raising of this problem of unfair categories, not our particular technical solution to it. As a potential solution, I hope that it’s better than nothing, a step in the right direction, and provocative. I subscribe to the belief that science is an iterative process and look forward to the next cycle of work.

Please feel free to reach out if you have a critique of our work that we’ve missed. We do appreciate all the feedback!