Digifesto

Tag: FAT*

Why STS is not the solution to “tech ethics”

“Tech ethics” are in (1) (2) (3) and a popular refrain at FAT* this year was that sensitivity to social and political context is the solution to the problems of unethical technology. How do we bring this sensitivity to technical design? Using the techniques of Science and Technology Studies (STS), argue variously Dobbe and Ames, as well as Selbst et al. (2019). Value Sensitive Design (VSD) (Friedman and Bainbridge, 2004) is one typical STS technique proposed technique for bringing this political awareness into the design process. In general, there is broad agreement that computer scientists should be working with social scientists when developing socially impactful technologies.

In this blog post, I argue that STS is not the solution to “tech ethics” that it tries to be.

Encouraging computer scientists to collaborate with social science domain experts is a great idea. My paper with Bruce Haynes (1) (2) (3) is an example of this kind of work. In it, we drew from sociology of race to inform a technical design that addressed the unfairness of racial categories. Significantly, in my view, we did not use STS in our work. Because the social injustices we were addressing were due to broad reaching social structures and politically constructed categories, we used sociology to elucidate what was at stake and what sorts of interventions would be a good idea.

It is important to recognize that there are many different social sciences dealing with “social and political context”, and that STS, despite its interdisciplinarity, is only one of them. This is easily missed in an interdisciplinary venue in which STS is active, because STS is somewhat activist in asserting its own importance in these venues. In a sense, STS frequently positions itself as a reminder to blindered technologists that there is a social world out there. “Let me tell you about what you’re missing!” That’s it’s shtick. Because of this positioning, STS scholars frequently get a seat at the table with scientists and technologists. It’s a powerful position, in sense.

What STS scholar tend to ignore is how and when other forms of social scientists involve themselves in the process of technical design. For example, at FAT* this year there were two full tracks of Economic Models. Economic Models. Economics is a well-established social scientific discipline that has tools for understanding how a particular mechanism can have unintended effects when put into a social context. In economics, this is called “mechanism design”. It addresses what Selbst et al. might call the “Ripple Effect Trap”–the fact that a system in context may have effects that are different from the intention of designers. I’ve argued before that wiser economics are something we need to better address technology ethics, especially if we are talking about technology deployed by industry, which is most of it! But despite deep and systematic social scientific analysis of secondary and equilibrium effects at the conference, these peer-reviewed works are not acknowledged by STS interventionists. Why is that?

As usual, quantitative social scientists are completely ignored by STS-inspired critiques of technologists and their ethics. That is too bad, because at the scale at which these technologies are operating (mainly, we are discussing civic- or web-scale automated decision making systems that are inherently about large numbers of people), fuzzier debates about “values” and contextualized impact would surely benefit from quantitative operationalization.

The problem is that STS is, at its heart, a humanistic discipline, a subfield of anthropology. If and when STS does not deny the utility or truth or value of mathematization or quantification entirely, as a field of research it is methodologically skeptical about such things. In the self-conception of STS, this methodological relativism is part of its ethnographic rigor. This ethnographic relativism is more or less entirely incompatible with formal reasoning, which aspires to universal internal validity. At a moralistic level, it is this aspiration of universal internal validity that is so bedeviling to the STS scholar: the mathematics are inherently distinct from an awareness of the social context, because social context can only be understood in its ethnographic particularity.

This is a false dichotomy. There are other social sciences that address social and political context that do not have the same restrictive assumptions of STS. Some of these are quantitative, but not all of them are. There are qualitative sociologists and political scientists with great insights into social context that are not disciplinarily allergic to the standard practices of engineering. In many ways, these kinds of social sciences are far more compatible with the process of designing technology than STS! For example, the sociology we draw on in our “Racial categories in machine learning” paper is variously: Gramscian racial hegemony theory, structuralist sociology, Bourdieusian theories of social capital, and so on. Significantly, these theories are not based exclusively on ethnographic method. They are based on disciplines that happily mix historical and qualitative scholarship with quantitative research. The object of study is the social world, and part of the purpose of the research is to develop politically useful abstractions from it that generalize and can be measured. This is the form of social sciences that is compatible with quantitative policy evaluation, the sort of thing you would want to use if, for example, understanding the impact of an affirmative action policy.

Given the widely acknowledge truism that public sector technology design often encodes and enacts real policy changes (a point made in Deirdre Mulligan’s keynote), it would make sense to understand the effects of these technologies using the methodologies of policy impact evaluation. That would involve enlisting the kinds of social scientific expertise relevant to understand society at large!

But that is absolutely not what STS has to offer. STS is, at best, offering a humanistic evaluation of the social processes of technology design. The ontology of STS is flat, and its epistemology and ethics are immediate: the design decision comes down to a calculus of “values” of different “stakeholders”. Ironically, this is a picture of social context that often seems to neglect the political and economic context of that context. It is not an escape from empty abstraction. Rather, it insists on moving from clear abstractions to more nebulous ones, “values” like “fairness”, maintaining that if the conversation never ends and the design never gets formalized, ethics has been accomplished.

This has proven, again and again, to be a rhetorically effective position for research scholarship. It is quite popular among “ethics” researchers that are backed by corporate technology companies. That is quite possibly because the form of “ethics” that STS offers, for all of its calls for political sensitivity, is devoid of political substance. An apples-to-apples comparison of “values”, without considering the social origins of those values and the way those values are grounded in political interests that are not merely about “what we think is important in life”, but real contests over resource allocation. The observation by Ames et al. (2011) that people’s values with respect to technology varies with socio-economic class is terribly relevant, Bourdieusian lesson in how the standpoint of “values sensitivity” may, when taken seriously, run up against the hard realities of political agonism. I don’t believe STS researchers are truly naive about these points; however, in their rhetoric of design intervention, conducted in labs but isolated from the real conditions of technology firms, there is an idealism that can only survive under the self-imposed severity of STS’s own methodological restrictions.

Independent scholars can take up this position and publish daring pieces, winning the moral high ground. But that is not a serious position to take in an industrial setting, or when pursuing generalizable knowledge about the downstream impact of a design on a complex social system. Those empirical questions require different tools, albeit far more unwieldy ones. Complex survey instruments, skilled data analysis, and substantive social theory are needed to arrive at solid conclusions about the ethical impact of technology.

References

Ames, M. G., Go, J., Kaye, J. J., & Spasojevic, M. (2011, March). Understanding technology choices and values through social class. In Proceedings of the ACM 2011 conference on Computer supported cooperative work (pp. 55-64). ACM.

Friedman, B., & Bainbridge, W. S. (2004). Value sensitive design.

Selbst, A. D., Friedler, S., Venkatasubramanian, S., & Vertesi, J. (2018, August). Fairness and Abstraction in Sociotechnical Systems. In ACM Conference on Fairness, Accountability, and Transparency (FAT*).

Advertisements

All the problems with our paper, “Racial categories in machine learning”

Bruce Haynes and I were blown away by the reception to our paper, “Racial categories in machine learning“. This was a huge experiment in interdisciplinary collaboration for us. We are excited about the next steps in this line of research.

That includes engaging with criticism. One of our goals was to fuel a conversation in the research community about the operationalization of race. That isn’t a question that can be addressed by any one paper or team of researchers. So one thing we got out of the conference was great critical feedback on potential problems with the approach we proposed.

This post is an attempt to capture those critiques.

Need for participatory design

Khadijah Abdurahman, of Word to RI , issues a subtweeted challenge to us to present our paper to the hood. (RI stands for Roosevelt Island, in New York City, the location of the recently established Cornell Tech campus.)

On striking challenge, raised by Khadijah Abdurahman on Twitter, is that we should be developing peer relationships with the communities we research. I read this as a call for participatory design. It’s true this was not part of the process of the paper. In particular, Ms. Abdurahman points to a part of our abstract that uses jargon from computer science.

There are a lot of ways to respond to this comment. The first is to accept the challenge. I would personally love it if Bruce and I could present our research to folks on Roosevelt Island and get feedback from them.

There are other ways to respond that address the tensions of this comment. One is to point out that in addition to being an accomplished scholar of the sociology of race and how it forms, especially in urban settings, Bruce is a black man who is originally from Harlem. Indeed, Bruce’s family memoir shows his deep and well-researched familiarity with the life of marginalized people of the hood. So a “peer relationship” between an algorithm designer (me) and a member of an affected community (Bruce) is really part of the origin of our work.

Another is to point out that we did not research a particular community. Our paper was not human subjects research; it was about the racial categories that are maintained by the Federal U.S. government and which pervade society in a very general way. Indeed, everybody is affected by these categories. When I and others who looks like me are ascribed “white”, that is an example of these categories at work. Bruce and I were very aware of how different kinds of people at the conference responded to our work, and how it was an intervention in our own community, which is of course affected by these racial categories.

The last point is that computer science jargon is alienating to basically everybody who is not trained in computer science, whether they live in the hood or not. And the fact is we presented our work at a computer science venue. Personally, I’m in favor of universal education in computational statistics, but that is a tall order. If our work becomes successful, I could see it becoming part of, for example, a statistical demography curriculum that could be of popular interest. But this is early days.

The Quasi-Racial (QR) Categories are Not Interpretable

In our presentation, we introduced some terminology that did not make it into the paper. We named the vectors of segregation derived by our procedure “quasi-racial” (QR) vectors, to denote that we were trying to capture dimensions that were race-like, in that they captured the patterns of historic and ongoing racial injustice, without being the racial categories themselves, which we argued are inherently unfair categories of inequality.

First, we are not wedded to the name “quasi-racial” and are very open to different terminology if anybody has an idea for something better to call them.

More importantly, somebody pointed out that these QR vectors may not be interpretable. Given that the conference is not only about Fairness, but also Accountability and Transparency, this critique is certainly on point.

To be honest, I have not yet done the work of surveying the extensive literature on algorithm interpretability to get a nuanced response. I can give two informal responses. The first is that one assumption of our proposal is that there is something wrong with how race and racial categories are intuitive understood. Normal people’s understanding of race is, of course, ridden with stereotypes, implicit biases, false causal models, and so on. If we proposed an algorithm that was fully “interpretable” according to most people’s understanding of what race is, that algorithm would likely have racist or racially unequal outcomes. That’s precisely the problem that we are trying to get at with our work. In other words, when categories are inherently unfair, interpretability and fairness may be at odds.

The second response is that educating people about how the procedure works and why its motivated is part of what makes its outcomes interpretable. Teaching people about the history of racial categories, and how those categories are both the cause and effect of segregation in space and society, makes the algorithm interpretable. Teaching people about Principal Component Analysis, the algorithm we employ, is part of what makes the system interpretable. We are trying to drop knowledge; I don’t think we are offering any shortcuts.

Principal Component Analysis (PCA) may not be the right technique

An objection from the computer science end of the spectrum was that our proposed use of Principal Component Analysis (PCA) was not well-motivated enough. PCA is just one of many dimensionality reduction techniques–why did we choose it in particular? PCA has many assumptions about the input embedded within it, including the component vectors of interest are linear combinations of the inputs. What if the best QR representation is a non-linear combination of the input variables? And our use of unsupervised learning, as a general criticism, is perhaps lazy, since in order to validate its usefulness we will need to test it with labeled data anyway. We might be better off with a more carefully calibrated and better motivated alternative technique.

These are all fair criticisms. I am personally not satisfied with the technical component of the paper and presentation. I know the rigor of the analysis is not of the standard that would impress a machine learning scholar and can take full responsibility for that. I hope to do better in a future iteration of the work, and welcome any advice on how to do that from colleagues. I’d also be interested to see how more technically skilled computer scientists and formal modelers address the problem of unfair racial categories that we raised in the paper.

I see our main contribution as the raising of this problem of unfair categories, not our particular technical solution to it. As a potential solution, I hope that it’s better than nothing, a step in the right direction, and provocative. I subscribe to the belief that science is an iterative process and look forward to the next cycle of work.

Please feel free to reach out if you have a critique of our work that we’ve missed. We do appreciate all the feedback!