Digifesto

Tag: quantitative research methods

population traits, culture traits, and racial projects: a methods challenge #ica18

In a recent paper I’ve been working on with Mark Hannah that he’s presenting this week at the International Communications Association conference, we take on the question of whether and how “big data” can be used to study the culture of a population.

By “big data” we meant, roughly large social media data sets. The pitfalls of using this sort of data for any general study of a population are perhaps best articled by Tufekci (2014). In short: studies based on social media data are often sampling on the dependent variable because they only consider the people representing themselves on social media, though this is only a small portion of the population. To put it another way, the sample suffers from the 1% rule of Internet cultures: for any on-line community, only 1% create content, 10% interact with the content somehow, and the rest lurk. The behavior and attitudes of the lurkers, in addition to any field effects in the “background” of the data (latent variables in the social field of production), are all out of band and so opaque to the analyst.

By “the culture of a population”, we meant something specific: the distribution of values, beliefs, dispositions, and tastes of a particular group of people. The best source we found on this was Marsden and Swingle (1994), and article from a time before the Internet had started to transform academia. Then and perhaps now, the best way to study the distribution of culture across a broad population was a survey. The idea is that you sample the population according to some responsible statistics, you ask them some questions about their values, beliefs, dispositions, and tastes, and you report the results. Viola!

(Given the methodological divergence here, the fact that many people, especially ‘people on the Internet’, now view culture mainly through the lens of other people on the Internet is obviously a huge problem. Most people are not in this sample, and yet we pretend that it is representative because it’s easily available for analysis. Hence, our concept of culture (or cultures) is screwy, reflecting much more than is warranted whatever sorts of cultures are flourishing in a pseudonymous, bot-ridden, commercial attention economy.)

Can we productively combine social media data with surveys methods to get a better method for studying the culture of a population? We think so. We propose the following as a general method framework:

(1) Figure out the population of interest by their stable, independent ‘population traits’ and look for their activity on social media. Sample from this.

(2) Do exploratory data analysis to inductively get content themes and observations about social structure from this data.

(3) Use the inductively generated themes from step (2) to design a survey addressing cultural traits of the population (beliefs, values, dispositions, tastes).

(4) Conduct a stratified sample specifically across social media creators, synthesizers (e.g. people who like, retweet, and respond), and the general population and/or known audience, and distribute the survey.

(5) Extrapolate the results to general conclusions.

(6) Validate the conclusions with other data or not discrepancies for future iterations.

I feel pretty good about this framework as a step forward, except that in the interest of time we had to sidestep what is maybe the most interesting question raised by it, which is: what’s the difference between a population trait and a cultural trait.

Here’s what we were thinking:

Population trait Cultural trait
Location Twitter use (creator, synthesizer, lurker, none)
Age Political views: left, right, center
Permanent unique identifier Attitude towards media
Preferred news source
Pepsi or coke?

One thing to note: we decided that traits about media production and consumption were a subtype of cultural traits. I.e., if you use Twitter, that’s a particular cultural trait that may be correlated with other cultural traits. That makes the problem of sampling on the dependent variable explicit.

But the other thing to note is that there are certain categories that we did not put on this list. Which ones? Gender, race, etc. Why not? Because choosing whether these are population traits or cultural traits opens a big bag of worms that is the subject of active political contest. That discussion was well beyond the scope of the paper!

The dicey thing about this kind of research is that we explicitly designed it to try to avoid investigator bias. That includes the bias of seeing the world through social categories that we might otherwise naturalize of reify. Naturally, though, if we were to actually conduct this method on a sample, such as, I dunno, a sample of Twitter-using academics, we would very quickly discover that certain social categories (men, women, person of color, etc.) were themes people talked about and so would be included as survey items under cultural traits.

That is not terrible. It’s probably safer to do that than to treat them like immutable, independent properties of a person. It does seem to leave something out though. For example, say one were to identify race as a cultural trait and then ask people to identify with a race. Then one takes the results, does a factor analysis, and discovers a factor that combines a racial affinity with media preferences and participation rates. It then identifies the prevalence of this factor in a certain region with a certain age demographic. One might object to this result as a representation of a racial category as entailing certain cultural categories, and leaving out the cultural minority within a racial demographic that wants more representation.

This is upsetting to some people when, for example, Facebook does this and allows advertisers to target things based on “ethnic affinity”. Presumably, Facebook is doing just this kind of factor analysis when they identify these categories.

Arguably, that’s not what this sort of science is for. But the fact that the objection seems pertinent is an informative intuition in its own right.

Maybe the right framework for understanding why this is problematic is Omi and Winant’s racial formation theory (2014). I’m just getting into this theory recently, at the recommendation of Bruce Haynes, who I look up to as an authority on race in America. According to racial projects theory, racial categories are stable because they include both representations of groups of people as having certain qualities and social structures controlling the distribution of resources. So, the white/black divide in the U.S. is both racial stereotypes and segregating urban policy, because the divide is stable because of how the material and cultural factors reinforce each other.

This view is enlightening because it helps explain why hereditary phenotype, representations of people based on hereditary phenotype, requests for people to identify with a race even when this may not make any sense, policies about inheritance and schooling, etc. all are part of the same complex. When we were setting out to develop the method described above, we were trying to correct for a sampling bias in media while testing for the distribution of culture across some objectively determinable population variables. But the objective qualities (such as zip code) are themselves functions of the cultural traits when considered over the course of time. In short, our model, which just tabulates individual differences without looking at temporal mechanisms, is naive.

But it’s a start, if only to an interesting discussion.

References

Marsden, Peter V., and Joseph F. Swingle. “Conceptualizing and measuring culture in surveys: Values, strategies, and symbols.” Poetics 22.4 (1994): 269-289.

Omi, Michael, and Howard Winant. Racial formation in the United States. Routledge, 2014.

Tufekci, Zeynep. “Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls.” ICWSM 14 (2014): 505-514.

Advertisements

Values, norms, and beliefs: units of analysis in research on culture

Much of the contemporary critical discussion about technology in society and ethical design hinges on the term “values”. Privacy is one such value, according to Mulligan, Koopman, and Doty (2016), drawing on Westin and Post. Contextual Integrity (Nissenbaum, 2009) argues that privacy is a function of norms, and that norms get their legitimacy from, among other sources, societal values. The Data and Society Research Institute lists “values” as one of the cross-cutting themes of its research. Richmond Wong (2017) has been working on eliciting values reflections as a tool in privacy by design. And so on.

As much as ‘values’ get emphasis in this literary corner, I have been unsatisfied with how these literatures represent values as either sociological or philosophical phenomena. How are values distributed in society? Are they stable under different methods of measurement? Do they really have ethical entailments, or are they really just a kind of emotive expression?

For only distantly related reasons, I’ve been looking into the literature on quantitative measurement of culture. I’m doing a bit of a literature review and need your recommendations! But an early hit is Marsden and Swingle’s is a “Conceptualizing and measuring culture in surveys: Values, strategies, and symbols” (1994), which is a straightforward social science methods piece apparently written before either rejections of positivism or Internet-based research became so destructively fashionable.

A useful passage comes early:

To frame our discussion of the content of the culture module, we have drawn on distinctions made in Peterson’s (1979: 137-138) review of cultural research in sociology. Peterson observes that sociological work published in the late 1940s and 1950s treated values – conceptualizations of desirable end-states – and the behavioral norms they specify as the principal explanatory elements of culture. Talcott Parsons (19.51) figured prominently in this school of thought, and more recent survey studies of culture and cultural change in both the United States (Rokeach, 1973) and Europe (Inglehart, 1977) continue the Parsonsian tradition of examining values as a core concept.

This was a surprise! Talcott Parsons is not a name you hear every day in the world of sociology of technology. That’s odd, because as far as I can tell he’s one of these robust and straightforwardly scientific sociologists. The main complaint against him, if I’ve heard any, is that he’s dry. I’ve never heard, despite his being tied to structural functionalism, that his ideas have been substantively empirically refuted (unlike Durkheim, say).

So the mystery is…whatever happened to the legacy of Talcott Parsons? And how is it represented, if at all, in contemporary sociological research today?

One reason why we don’t hear much about Parsons may be because the sociological community moved from measuring “values” to measuring “beliefs”. Marsden and Swingle go on:

Cultural sociologists writing since the late 1970s however, have accented other elements of culture. These include, especially, beliefs and expressive symbols. Peterson’s (1979: 138) usage of “beliefs” refers to “existential statements about how the world operates that often serve to justify value and norms”. As such, they are less to be understood as desirable end-states in and of themselves, but instead as habits or styles of thought that people draw upon, especially in unstructured situations (Swidler, 1986).

Intuitively, this makes sense. When we look at the contemporary seemingly mortal combat of partisan rhetoric and tribalist propaganda, a lot of what we encounter are beliefs and differences in beliefs. As suggested in this text, beliefs justify values and norms, meaning that even values (which you might have thought are the source of all justification) get their meaning from a kind of world-view, rather than being held in a simple way.

That makes a lot of sense. There’s often a lot more commonality in values than in ways those values should be interpreted or applied. Everybody cares about fairness, for example. What people disagree about, often vehemently, is what is fair, and that’s because (I’ll argue here) people have widely varying beliefs about the world and what’s important.

To put it another way, the Humean model where we have beliefs and values separately and then combine the two in an instrumental calculus is wrong, and we’ve known it’s wrong since the 70’s. Instead, we have complexes of normatively thick beliefs that reinforce each other into a worldview. When we we’re asked about our values, we are abstracting in a derivative way from this complex of frames, rather than getting at a more core feature of personality or culture.

A great book on this topic is Hilary Putnam’s The collapse of the fact/value dichotomy (2002), just for example. It would be nice if more of this metaethical theory and sociology of values surfaced in the values in design literature, despite it’s being distinctly off-trend.

References

Marsden, Peter V., and Joseph F. Swingle. “Conceptualizing and measuring culture in surveys: Values, strategies, and symbols.” Poetics 22.4 (1994): 269-289.

Mulligan, Deirdre K., Colin Koopman, and Nick Doty. “Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy.” Phil. Trans. R. Soc. A 374.2083 (2016): 20160118.

Nissenbaum, Helen. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2009.

Putnam, Hilary. The collapse of the fact/value dichotomy and other essays. Harvard University Press, 2002.

Wong, Richmond Y., et al. “Eliciting Values Reflections by Engaging Privacy Futures Using Design Workbooks.” (2017).

digital qualities: some meditations on methodology

Text is a kind of data that is both qualitative (interpretable for the qualities it conveys) and qualitative (characterized by certain amounts of certain abstract tokens arranged in a specific order).

Statistical learning techniques are able to extract qualitative distinctions from quantitative data, through clustering processes for example. Non-parametric statistical methods allow qualitative distinctions to be extracted from quantitative data without specifying particular structure or features up front.

Many cognitive scientists and computational neuroscientists believe that this is more or less how perception works. The neurons in our eyes (for example) provide a certain kind of data to downstream neurons, which activate according to quantifiable regularities in neuron activation. A qualitative difference that we perceive is due to a statistical aggregation of these inputs in the context of a prior, physically definite, field of neural connectivity.

A source of debate in the social sciences is the relationship between qualitative and quantitative research methods. As heirs to the methods of harder sciences whose success is indubitable, quantitative research is often assumed to be credible up to the profound limits of its method. A significant amount of ink has been spilled distinguishing qualitative research from quantitative research and justifying it in the face of skeptical quantitative types.

Qualitative researchers, as a rule, work with text. This is trivially true due to the fact that a limiting condition of qualitative research appears to be the creation of a document explicating the research conclusions. But if we are to believe several instructional manuals on qualitative research, then the work of an e.g. ethnographer involves jottings, field notes, interview transcripts, media transcripts, coding of notes, axial coding of notes, theoretical coding of notes, or, more broadly, the noting of narratives (often written down), the interpreting of text, a hermeneutic exposition of hermeneutic expositions ad infinitum down an endless semiotic staircase.

Computer assisted qualitative data analysis software passes the Wikipedia test for “does it exist”.

Data processed by computers is necessarily quantitative. Hence, qualitative data is necessarily quantitative. This is unsurprising, since so much qualitative data is text. (See above).

We might ask: what makes the work qualitative researchers do qualitative as opposed to quantitative, if the data they work with with quantitative? We could answer: it’s their conclusions that are qualitative.

But so are the conclusions of a quantitative researcher. A hypothesis is, generally speaking, a qualitative assessment, that is then operationalized into a prediction whose correspondence with data can be captured quantitatively through a statistical model. The statistical apparatus is meant to guide our expectations of the generalizability of results.

Maybe the qualitative researcher isn’t trying to get generalized results. Maybe they are just reporting a specific instance. Maybe generalizations are up to the individual interpreter. Maybe social scientific research can only apply and elaborate on an ideal type, tell a good story. All further insight is beyond the purview of the social sciences.

Hey, I don’t mean to be insensitive about this, but I’ve got two practical considerations: first, do you expect anyone to pay you for research that is literally ungeneralizable? That has no predictive or informative impact on the future? Second, if you believe that, aren’t you basically giving up all ground on social prediction to economists? Do you really want that?

Then there’s the mixed methods researcher. Or, the researcher who in principle admits that mixed methods are possible. Sure, the quantitative folks are cool. We’d just rather be interviewing people because we don’t like math.

That’s alright. Math isn’t for everybody. It would be nice if computers did it for us. (See above)

What some people say is: qualitative research generates hypotheses, quantitative research tests hypotheses.

Listen: that is totally buying into the hegemony of quantitative methods by relegating qualitative methods to an auxiliary role with no authority.

Let’s accept that hegemony as an assumption for a second, just to see where it goes. All authority comes from a quantitatively supported judgment. This includes the assessment of the qualitative researcher.

We might ask, “Where are the missing scientists?” about qualitative research, if it is to have any authority at all, even in its auxiliary role.

What would Bruno Latour do?

We could locate the missing scientists in the technological artifacts that qualitative researchers engage with. The missing scientists may lie within the computer assisted qualitative data analysis software, which dutifully treats qualitative data as numbers, and tests the data experimentally and in a controlled way. The user interface is the software’s experimental instrument, through which it elicits “qualitative” judgments from its users. Of course, to the software, the qualitative judgments are quantitative data about the cognitive systems of the software’s users, black boxes that nevertheless have a mysterious regularity to them. The better the coding of the qualitative data, the better the mysteries of the black box users are consolidated into regularities. From the perspective of the computer assisted qualitative data analysis software, the whole world, including its users, is quantitative. By delegating quantitative effort to this software, we conserve the total mass of science in the universe. The missing mass is in the software. Or, maybe, in the visual system of the qualitative researchers, which performs non-parametric statistical inference on the available sensory data as delivered by photo-transmitters in the eye.

I’m sorry. I have to stop. Did you enjoy that? Did my Latourian analysis convince you of the primacy or at least irreducibility of the quantitative element within the social sciences?

I have a confession. Everything I’ve ever read by Latour smells like bullshit to me. If writing that here and now means I will never be employed in a university, then may God have mercy on the soul of academe, because its mind is rotten and its body dissolute. He is obviously a brilliant man but as far as I can tell nothing he writes is true. That said, if you are inclined to disagree, I challenge you to refute my Latourian analysis above, else weep before the might of quantification, which will forever dominate the process of inquiry, if not in man, then in our robot overlords and the unconscious neurological processes that prefigure them.

This is all absurd, of course. Simultaneously accepting the hegemony of quantitative methods
and Latourian analysis has provided us with a reductio ad absurdum that compels us to negate some assumptions. If we discard Latourian analysis, then our quantitative “hegemony” dissolves as more and more quantitative work is performed by unthinking technology. All research becomes qualitative, a scholarly consideration of the poetry outputted by our software and instruments.

Nope, that’s not it either. Because somebody is building that software and those instruments, and that requires generalizability of knowledge, which so far qualitative methods have given up a precise claim to.

I’m going to skip some steps and cut to the chase:

I think the quantitative/qualitative distinction in social scientific research, and in research in general, is dumb.

I think researchers should recognize the fungibility of quantity and quality in text and other kinds of data. I think ethnographers and statistical learning theorists should warmly embrace each other and experience the bliss that is finding ones complement.

Goodnight.