Digifesto

Tag: racial projects

All the problems with our paper, “Racial categories in machine learning”

Bruce Haynes and I were blown away by the reception to our paper, “Racial categories in machine learning“. This was a huge experiment in interdisciplinary collaboration for us. We are excited about the next steps in this line of research.

That includes engaging with criticism. One of our goals was to fuel a conversation in the research community about the operationalization of race. That isn’t a question that can be addressed by any one paper or team of researchers. So one thing we got out of the conference was great critical feedback on potential problems with the approach we proposed.

This post is an attempt to capture those critiques.

Need for participatory design

Khadijah Abdurahman, of Word to RI , issues a subtweeted challenge to us to present our paper to the hood. (RI stands for Roosevelt Island, in New York City, the location of the recently established Cornell Tech campus.)

On striking challenge, raised by Khadijah Abdurahman on Twitter, is that we should be developing peer relationships with the communities we research. I read this as a call for participatory design. It’s true this was not part of the process of the paper. In particular, Ms. Abdurahman points to a part of our abstract that uses jargon from computer science.

There are a lot of ways to respond to this comment. The first is to accept the challenge. I would personally love it if Bruce and I could present our research to folks on Roosevelt Island and get feedback from them.

There are other ways to respond that address the tensions of this comment. One is to point out that in addition to being an accomplished scholar of the sociology of race and how it forms, especially in urban settings, Bruce is a black man who is originally from Harlem. Indeed, Bruce’s family memoir shows his deep and well-researched familiarity with the life of marginalized people of the hood. So a “peer relationship” between an algorithm designer (me) and a member of an affected community (Bruce) is really part of the origin of our work.

Another is to point out that we did not research a particular community. Our paper was not human subjects research; it was about the racial categories that are maintained by the Federal U.S. government and which pervade society in a very general way. Indeed, everybody is affected by these categories. When I and others who looks like me are ascribed “white”, that is an example of these categories at work. Bruce and I were very aware of how different kinds of people at the conference responded to our work, and how it was an intervention in our own community, which is of course affected by these racial categories.

The last point is that computer science jargon is alienating to basically everybody who is not trained in computer science, whether they live in the hood or not. And the fact is we presented our work at a computer science venue. Personally, I’m in favor of universal education in computational statistics, but that is a tall order. If our work becomes successful, I could see it becoming part of, for example, a statistical demography curriculum that could be of popular interest. But this is early days.

The Quasi-Racial (QR) Categories are Not Interpretable

In our presentation, we introduced some terminology that did not make it into the paper. We named the vectors of segregation derived by our procedure “quasi-racial” (QR) vectors, to denote that we were trying to capture dimensions that were race-like, in that they captured the patterns of historic and ongoing racial injustice, without being the racial categories themselves, which we argued are inherently unfair categories of inequality.

First, we are not wedded to the name “quasi-racial” and are very open to different terminology if anybody has an idea for something better to call them.

More importantly, somebody pointed out that these QR vectors may not be interpretable. Given that the conference is not only about Fairness, but also Accountability and Transparency, this critique is certainly on point.

To be honest, I have not yet done the work of surveying the extensive literature on algorithm interpretability to get a nuanced response. I can give two informal responses. The first is that one assumption of our proposal is that there is something wrong with how race and racial categories are intuitive understood. Normal people’s understanding of race is, of course, ridden with stereotypes, implicit biases, false causal models, and so on. If we proposed an algorithm that was fully “interpretable” according to most people’s understanding of what race is, that algorithm would likely have racist or racially unequal outcomes. That’s precisely the problem that we are trying to get at with our work. In other words, when categories are inherently unfair, interpretability and fairness may be at odds.

The second response is that educating people about how the procedure works and why its motivated is part of what makes its outcomes interpretable. Teaching people about the history of racial categories, and how those categories are both the cause and effect of segregation in space and society, makes the algorithm interpretable. Teaching people about Principal Component Analysis, the algorithm we employ, is part of what makes the system interpretable. We are trying to drop knowledge; I don’t think we are offering any shortcuts.

Principal Component Analysis (PCA) may not be the right technique

An objection from the computer science end of the spectrum was that our proposed use of Principal Component Analysis (PCA) was not well-motivated enough. PCA is just one of many dimensionality reduction techniques–why did we choose it in particular? PCA has many assumptions about the input embedded within it, including the component vectors of interest are linear combinations of the inputs. What if the best QR representation is a non-linear combination of the input variables? And our use of unsupervised learning, as a general criticism, is perhaps lazy, since in order to validate its usefulness we will need to test it with labeled data anyway. We might be better off with a more carefully calibrated and better motivated alternative technique.

These are all fair criticisms. I am personally not satisfied with the technical component of the paper and presentation. I know the rigor of the analysis is not of the standard that would impress a machine learning scholar and can take full responsibility for that. I hope to do better in a future iteration of the work, and welcome any advice on how to do that from colleagues. I’d also be interested to see how more technically skilled computer scientists and formal modelers address the problem of unfair racial categories that we raised in the paper.

I see our main contribution as the raising of this problem of unfair categories, not our particular technical solution to it. As a potential solution, I hope that it’s better than nothing, a step in the right direction, and provocative. I subscribe to the belief that science is an iterative process and look forward to the next cycle of work.

Please feel free to reach out if you have a critique of our work that we’ve missed. We do appreciate all the feedback!

Advertisements

For fairness in machine learning, we need to consider the unfairness of racial categorization

Pre-prints of papers accepted to this coming 2019 Fairness, Accountability, and Transparency conference are floating around Twitter. From the looks of it, many of these papers add a wealth of historical and political context, which I feel is a big improvement.

A noteworthy paper, in this regard, is Hutchinson and Mitchell’s “50 Years of Test (Un)fairness: Lessons for Machine Learning”, which puts recent ‘fairness in machine learning’ work in the context of very analogous debates from the 60’s and 70’s that concerned the use of testing that could be biased due to cultural factors.

I like this paper a lot, in part because it is very thorough and in part because it tees up a line of argument that’s dear to me. Hutchinson and Mitchell raise the question of how to properly think about fairness in machine learning when the protected categories invoked by nondiscrimination law are themselves social constructs.

Some work on practically assessing fairness in ML has tackled the problem of using race as a construct. This echoes concerns in the testing literature that stem back to at least 1966: “one stumbles immediately over the scientific difficulty of establishing clear yardsticks by which people can be classified into convenient racial categories” [30]. Recent approaches have used Fitzpatrick skin type or unsupervised clustering to avoid racial categorizations [7, 55]. We note that the testing literature of the 1960s and 1970s frequently uses the phrase “cultural fairness” when referring to parity between blacks and whites.

They conclude that this is one of the areas where there can be a lot more useful work:

This short review of historical connections in fairness suggest several concrete steps forward for future research in ML fairness: Diving more deeply into the question of how subgroups are defined, suggested as early as 1966 [30], including questioning whether subgroups should be treated as discrete categories at all, and how intersectionality can be modeled. This might include, for example, how to quantify fairness along one dimension (e.g., age) conditioned on another dimension (e.g., skin tone), as recent work has begun to address [27, 39].

This is all very cool to read, because this is precisely the topic that Bruce Haynes and I address in our FAT* paper, “Racial categories in machine learning” (arXiv link). The problem we confront in this paper is that the racial categories we are used to using in the United States (White, Black, Asian) originate in the white supremacy that was enshrined into the Constitution when it was formed and perpetuated since then through the legal system (with some countervailing activity during the Civil Rights Movement, for example). This puts “fair machine learning” researchers in a bind: either they can use these categories, which have always been about perpetuating social inequality, or they can ignore the categories and reproduce the patterns of social inequality that prevail in fact because of the history of race.

In the paper, we propose a third option. First, rather than reify racial categories, we propose breaking race down into the kinds of personal features that get inscribed with racial meaning. Phenotype properties like skin type and ocular folds are one such set of features. Another set are events that indicate position in social class, such as being arrested or receiving welfare. Another set are facts about the national and geographic origin of ones ancestors. These facts about a person are clearly relevant to how racial distinctions are made, but are themselves more granular and multidimensional than race.

The next step is to detect race-like categories by looking at who is segregated from each other. We propose an unsupervised machine learning technique that works with the distribution of the phenotype, class, and ancestry features across spatial tracts (as in when considering where people physically live) or across a social network (as in when considering people’s professional networks, for example). Principal component analysis can identify what race-like dimensions capture the greatest amounts of spatial and social separation. We hypothesize that these dimensions will encode the ways racial categorization has shaped the social structure in tangible ways; these effects may include both politically recognized forms of discrimination as well as forms of discrimination that have not yet been surfaced. These dimensions can then be used to classify people in race-like ways as input to fairness interventions in machine learning.

A key part of our proposal is that race-like classification depends on the empirical distribution of persons in physical and social space, and so are not fixed. This operationalizes the way that race is socially and politically constructed without reifying the categories in terms that reproduce their white supremacist origins.

I’m quite stoked about this research, though obviously it raises a lot of serious challenges in terms of validation.

“The Theory of Racial Formation”: notes, part 1 (Cha. 4, Omi and Winant, 2014)

Chapter 4 of Omi and Winant (2014) is “The Theory of Racial Formation”. It is where they lay out their theory of race and its formation, synthesizing and improving on theories of race as ethnicity, race as class, and race as nation that they consider earlier in the book.

This rhetorical strategy of presenting the historical development of multiple threads of prior theory before synthesizing them into something new is familiar to me from my work with Helen Nissenbaum on Contextual Integrity. CI is a theory of privacy that advances prior legal and social theories by teasing out their tensions. This seems to be a good way to advance theory through scholarship. It is interesting that the same method of theory building can work in multiple fields. My sense is that what’s going on is that there is an underlying logic to this process which in a less Anglophone world we might call “dialectical”. But I digress.

I have not finished Chapter 4 yet but I wanted to sketch out the outline of it before going into detail. That’s because what Omi and Winant are presenting a way of understanding the mechanisms behind the reproduction of race that are not simplistically “systemic” but rather break it down into discrete operations. This is a helpful contribution; even if the theory is not entirely accurate, its very specificity elevates the discourse.

So, in brief notes:

For Omi and Winant, race is a way of “making up people”; they attribute this phrase to Ian Hacking but do not develop Hacking’s definition. Their reference to a philosopher of science does situate them in a scholarly sense; it is nice that they seem to acknowledge an implicit hierarchy of theory that places philosophy at the foundation. This is correct.

Race-making is a form of othering, of having a group of people identify another group as outsiders. Othering is a basic and perhaps unavoidable human psychological function; their reference for this is powell and Menendian (Apparently, john a. powell being one of these people like danah boyd who decapitalizes their name.)

Race is of course a social construct that is neither a fixed and coherent category nor something that is “unreal”. That is, presumably, why we need a whole book on the dynamic mechanisms that form it. One reason why race is such a dynamic concept is because (a) it is a way of organizing inequality in society, (b) the people on “top” of the hierarchy implied by racial categories enforce/reproduce that category “downwards”, (c) the people on the “bottom” of the hierarchy implied by racial categories also enforce/reproduce a variation of those categories “upwards” as a form of resistance, and so (d) the state of the racial categories at any particular time is a temporary consequence of conflicting “elite” and “street” variations of it.

This presumes that race is fundamentally about inequality. Omi and Winant believe it is. In fact, they think racial categories are a template for all other social categories that are about inequality. This is what they mean by their claim that race is a master category. It’s “a frame used for organizing all manner of political thought”, particularly political thought about liberation struggles.

I’m not convinced by this point. They develop it with a long discussion of intersectionality that is also unconvincing to me. Historically, they point out that sometimes women’s movements have allied with black power movements, and sometimes they haven’t. They want the reader to think this is interesting; as a data scientist, I see randomness and lack of correlation. They make the poignant and true point that “perhaps at the core of intersectionality practice, as well as theory, is the ‘mixed race’ category. Well, how does it come about that people can be ‘mixed’?” They then drop the point with no further discussion.

[Edit: While Omi and Winant do address the issue of what it means to be ‘mixed race’ in more depth later in the book, their treatment of intersectionality remains for me difficult. Race is a system of political categorization; however, racial categories are hereditary in a way that sexual categories are not. That is an important difference in how the categories are formed and maintained, one that is glossed over in O&W’s treatment of the subject, as well as in popular discourse.]

Omi and Winant make an intriguing comment, “In legal theory, the sexual contract and racial contract have often been compared”. I don’t know what this is about but I want to know more.

This is all a kind of preamble to their presentation of theory. They start to provide some definitions:

racial formation
The sociohistorical process by which racial identities are created, lived out, transformed, and destroyed.
racialization
How phenomic-corporeal dimensions of bodies acquire meaning in social life.
racial projects
The co-constitutive ways that racial meanings are translated into social structures and become racially signified.
racism
Not defined. A property of racial projects that Omi and Winant will discuss later.
racial politics
Ways that the politics (of a state?) can handle race, including racial despotism, racial democracy, and racial hegemony.

This is a useful breakdown. More detail in the next post.

population traits, culture traits, and racial projects: a methods challenge #ica18

In a recent paper I’ve been working on with Mark Hannah that he’s presenting this week at the International Communications Association conference, we take on the question of whether and how “big data” can be used to study the culture of a population.

By “big data” we meant, roughly large social media data sets. The pitfalls of using this sort of data for any general study of a population are perhaps best articled by Tufekci (2014). In short: studies based on social media data are often sampling on the dependent variable because they only consider the people representing themselves on social media, though this is only a small portion of the population. To put it another way, the sample suffers from the 1% rule of Internet cultures: for any on-line community, only 1% create content, 10% interact with the content somehow, and the rest lurk. The behavior and attitudes of the lurkers, in addition to any field effects in the “background” of the data (latent variables in the social field of production), are all out of band and so opaque to the analyst.

By “the culture of a population”, we meant something specific: the distribution of values, beliefs, dispositions, and tastes of a particular group of people. The best source we found on this was Marsden and Swingle (1994), and article from a time before the Internet had started to transform academia. Then and perhaps now, the best way to study the distribution of culture across a broad population was a survey. The idea is that you sample the population according to some responsible statistics, you ask them some questions about their values, beliefs, dispositions, and tastes, and you report the results. Viola!

(Given the methodological divergence here, the fact that many people, especially ‘people on the Internet’, now view culture mainly through the lens of other people on the Internet is obviously a huge problem. Most people are not in this sample, and yet we pretend that it is representative because it’s easily available for analysis. Hence, our concept of culture (or cultures) is screwy, reflecting much more than is warranted whatever sorts of cultures are flourishing in a pseudonymous, bot-ridden, commercial attention economy.)

Can we productively combine social media data with surveys methods to get a better method for studying the culture of a population? We think so. We propose the following as a general method framework:

(1) Figure out the population of interest by their stable, independent ‘population traits’ and look for their activity on social media. Sample from this.

(2) Do exploratory data analysis to inductively get content themes and observations about social structure from this data.

(3) Use the inductively generated themes from step (2) to design a survey addressing cultural traits of the population (beliefs, values, dispositions, tastes).

(4) Conduct a stratified sample specifically across social media creators, synthesizers (e.g. people who like, retweet, and respond), and the general population and/or known audience, and distribute the survey.

(5) Extrapolate the results to general conclusions.

(6) Validate the conclusions with other data or not discrepancies for future iterations.

I feel pretty good about this framework as a step forward, except that in the interest of time we had to sidestep what is maybe the most interesting question raised by it, which is: what’s the difference between a population trait and a cultural trait.

Here’s what we were thinking:

Population trait Cultural trait
Location Twitter use (creator, synthesizer, lurker, none)
Age Political views: left, right, center
Permanent unique identifier Attitude towards media
Preferred news source
Pepsi or coke?

One thing to note: we decided that traits about media production and consumption were a subtype of cultural traits. I.e., if you use Twitter, that’s a particular cultural trait that may be correlated with other cultural traits. That makes the problem of sampling on the dependent variable explicit.

But the other thing to note is that there are certain categories that we did not put on this list. Which ones? Gender, race, etc. Why not? Because choosing whether these are population traits or cultural traits opens a big bag of worms that is the subject of active political contest. That discussion was well beyond the scope of the paper!

The dicey thing about this kind of research is that we explicitly designed it to try to avoid investigator bias. That includes the bias of seeing the world through social categories that we might otherwise naturalize of reify. Naturally, though, if we were to actually conduct this method on a sample, such as, I dunno, a sample of Twitter-using academics, we would very quickly discover that certain social categories (men, women, person of color, etc.) were themes people talked about and so would be included as survey items under cultural traits.

That is not terrible. It’s probably safer to do that than to treat them like immutable, independent properties of a person. It does seem to leave something out though. For example, say one were to identify race as a cultural trait and then ask people to identify with a race. Then one takes the results, does a factor analysis, and discovers a factor that combines a racial affinity with media preferences and participation rates. It then identifies the prevalence of this factor in a certain region with a certain age demographic. One might object to this result as a representation of a racial category as entailing certain cultural categories, and leaving out the cultural minority within a racial demographic that wants more representation.

This is upsetting to some people when, for example, Facebook does this and allows advertisers to target things based on “ethnic affinity”. Presumably, Facebook is doing just this kind of factor analysis when they identify these categories.

Arguably, that’s not what this sort of science is for. But the fact that the objection seems pertinent is an informative intuition in its own right.

Maybe the right framework for understanding why this is problematic is Omi and Winant’s racial formation theory (2014). I’m just getting into this theory recently, at the recommendation of Bruce Haynes, who I look up to as an authority on race in America. According to racial projects theory, racial categories are stable because they include both representations of groups of people as having certain qualities and social structures controlling the distribution of resources. So, the white/black divide in the U.S. is both racial stereotypes and segregating urban policy, because the divide is stable because of how the material and cultural factors reinforce each other.

This view is enlightening because it helps explain why hereditary phenotype, representations of people based on hereditary phenotype, requests for people to identify with a race even when this may not make any sense, policies about inheritance and schooling, etc. all are part of the same complex. When we were setting out to develop the method described above, we were trying to correct for a sampling bias in media while testing for the distribution of culture across some objectively determinable population variables. But the objective qualities (such as zip code) are themselves functions of the cultural traits when considered over the course of time. In short, our model, which just tabulates individual differences without looking at temporal mechanisms, is naive.

But it’s a start, if only to an interesting discussion.

References

Marsden, Peter V., and Joseph F. Swingle. “Conceptualizing and measuring culture in surveys: Values, strategies, and symbols.” Poetics 22.4 (1994): 269-289.

Omi, Michael, and Howard Winant. Racial formation in the United States. Routledge, 2014.

Tufekci, Zeynep. “Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls.” ICWSM 14 (2014): 505-514.