methodology | Digifesto

May 26, 2018

population traits, culture traits, and racial projects: a methods challenge #ica18

In a recent paper I’ve been working on with Mark Hannah that he’s presenting this week at the International Communications Association conference, we take on the question of whether and how “big data” can be used to study the culture of a population.

By “big data” we meant, roughly large social media data sets. The pitfalls of using this sort of data for any general study of a population are perhaps best articled by Tufekci (2014). In short: studies based on social media data are often sampling on the dependent variable because they only consider the people representing themselves on social media, though this is only a small portion of the population. To put it another way, the sample suffers from the 1% rule of Internet cultures: for any on-line community, only 1% create content, 10% interact with the content somehow, and the rest lurk. The behavior and attitudes of the lurkers, in addition to any field effects in the “background” of the data (latent variables in the social field of production), are all out of band and so opaque to the analyst.

By “the culture of a population”, we meant something specific: the distribution of values, beliefs, dispositions, and tastes of a particular group of people. The best source we found on this was Marsden and Swingle (1994), and article from a time before the Internet had started to transform academia. Then and perhaps now, the best way to study the distribution of culture across a broad population was a survey. The idea is that you sample the population according to some responsible statistics, you ask them some questions about their values, beliefs, dispositions, and tastes, and you report the results. Viola!

(Given the methodological divergence here, the fact that many people, especially ‘people on the Internet’, now view culture mainly through the lens of other people on the Internet is obviously a huge problem. Most people are not in this sample, and yet we pretend that it is representative because it’s easily available for analysis. Hence, our concept of culture (or cultures) is screwy, reflecting much more than is warranted whatever sorts of cultures are flourishing in a pseudonymous, bot-ridden, commercial attention economy.)

Can we productively combine social media data with surveys methods to get a better method for studying the culture of a population? We think so. We propose the following as a general method framework:

(1) Figure out the population of interest by their stable, independent ‘population traits’ and look for their activity on social media. Sample from this.

(2) Do exploratory data analysis to inductively get content themes and observations about social structure from this data.

(3) Use the inductively generated themes from step (2) to design a survey addressing cultural traits of the population (beliefs, values, dispositions, tastes).

(4) Conduct a stratified sample specifically across social media creators, synthesizers (e.g. people who like, retweet, and respond), and the general population and/or known audience, and distribute the survey.

(5) Extrapolate the results to general conclusions.

(6) Validate the conclusions with other data or not discrepancies for future iterations.

I feel pretty good about this framework as a step forward, except that in the interest of time we had to sidestep what is maybe the most interesting question raised by it, which is: what’s the difference between a population trait and a cultural trait.

Here’s what we were thinking:

Population trait	Cultural trait
Location	Twitter use (creator, synthesizer, lurker, none)
Age	Political views: left, right, center
Permanent unique identifier	Attitude towards media
	Preferred news source
	Pepsi or coke?

One thing to note: we decided that traits about media production and consumption were a subtype of cultural traits. I.e., if you use Twitter, that’s a particular cultural trait that may be correlated with other cultural traits. That makes the problem of sampling on the dependent variable explicit.

But the other thing to note is that there are certain categories that we did not put on this list. Which ones? Gender, race, etc. Why not? Because choosing whether these are population traits or cultural traits opens a big bag of worms that is the subject of active political contest. That discussion was well beyond the scope of the paper!

The dicey thing about this kind of research is that we explicitly designed it to try to avoid investigator bias. That includes the bias of seeing the world through social categories that we might otherwise naturalize of reify. Naturally, though, if we were to actually conduct this method on a sample, such as, I dunno, a sample of Twitter-using academics, we would very quickly discover that certain social categories (men, women, person of color, etc.) were themes people talked about and so would be included as survey items under cultural traits.

That is not terrible. It’s probably safer to do that than to treat them like immutable, independent properties of a person. It does seem to leave something out though. For example, say one were to identify race as a cultural trait and then ask people to identify with a race. Then one takes the results, does a factor analysis, and discovers a factor that combines a racial affinity with media preferences and participation rates. It then identifies the prevalence of this factor in a certain region with a certain age demographic. One might object to this result as a representation of a racial category as entailing certain cultural categories, and leaving out the cultural minority within a racial demographic that wants more representation.

This is upsetting to some people when, for example, Facebook does this and allows advertisers to target things based on “ethnic affinity”. Presumably, Facebook is doing just this kind of factor analysis when they identify these categories.

Arguably, that’s not what this sort of science is for. But the fact that the objection seems pertinent is an informative intuition in its own right.

Maybe the right framework for understanding why this is problematic is Omi and Winant’s racial formation theory (2014). I’m just getting into this theory recently, at the recommendation of Bruce Haynes, who I look up to as an authority on race in America. According to racial projects theory, racial categories are stable because they include both representations of groups of people as having certain qualities and social structures controlling the distribution of resources. So, the white/black divide in the U.S. is both racial stereotypes and segregating urban policy, because the divide is stable because of how the material and cultural factors reinforce each other.

This view is enlightening because it helps explain why hereditary phenotype, representations of people based on hereditary phenotype, requests for people to identify with a race even when this may not make any sense, policies about inheritance and schooling, etc. all are part of the same complex. When we were setting out to develop the method described above, we were trying to correct for a sampling bias in media while testing for the distribution of culture across some objectively determinable population variables. But the objective qualities (such as zip code) are themselves functions of the cultural traits when considered over the course of time. In short, our model, which just tabulates individual differences without looking at temporal mechanisms, is naive.

But it’s a start, if only to an interesting discussion.

References

Marsden, Peter V., and Joseph F. Swingle. “Conceptualizing and measuring culture in surveys: Values, strategies, and symbols.” Poetics 22.4 (1994): 269-289.

Omi, Michael, and Howard Winant. Racial formation in the United States. Routledge, 2014.

Tufekci, Zeynep. “Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls.” ICWSM 14 (2014): 505-514.

March 31, 2013

deep thoughts by jack handy

Information transfer just is the coming-into-dependence of two variables, which under the many worlds interpretation of quantum mechanics means the entanglement of the “worlds” of each variable (and, by extension, the networks of causally related variables of which they are a part). Information exchange collapses possibilities.
This holds up whether you take a subjectivist view of reality (and probability–Bayesian probability properly speaking) or an objectivist view. At their (dialectical?) limit, the two “irreconcilable” paradigms converge on a monist metaphysics that is absolutely physical and also ideal. (This was recognized by Hegel, who was way ahead of the game in a lot of ways.) It is the ideality of nature that allows it to be mathematized, though its important to note that mathematization does not exclude engagement with nature through other modalities, e.g. the emotional, the narrative, etc.

This means that characterizing the evolution of networks of information exchange by their physical properties (limits of information capacity of channels, etc.) is something to be embraced to better understand their impact on e.g. socially constructed reality, emic identity construction, etc. What the mathematics provide is a representation of what remains after so many diverse worlds are collapsed.

A similar result, representing a broad consensus, might be attained dialectically, specifically through actual dialog. Whereas the mathematical accounting is likely to lead to reduction to latent variables that may not coincide with the lived experience of participants, a dialectical approach is more likely to result in a synthesis of perspectives at a higher level of abstraction. (Only a confrontation with nature as the embodiment of unconscious constraints is likely to force us to confront latent mechanisms.)

Whether or not such dialectical synthesis will result in a singular convergent truth is unknown, with various ideologies taking positions on the matter as methodological assumptions. Haraway’s feminist epistemology, eschewing rational consensus in favor of interperspectival translation, rejects a convergent (scientific, and she would say masculine) truth. But does this stand up to the simple objection that Haraway’s own claims about truth and method transcend individual perspective, making he guilty of performative contradiction?

Perhaps a deeper problem with the consensus view of truth, which I heard once from David Weinberger, is that the structure of debate may have fractal complexity. The fractal pluralectic can fray into infinite and infinitesimal disagreement at its borders. I’ve come around to agreeing with this view, uncomfortable as it is. However, within the fractal pluralectic we can still locate a convergent perspective based on the network topology of information flow. Some parts of the network are more central and brighter than others.

A critical question is to what extent the darkness and confusion in the dissonant periphery can be included within the perspective of the central, convergent parts of the network. Is there necessarily a Shadow? Without the noise, can there be a signal?

October 18, 2012

“Weird Twitter” art experiment method notes and observations

@bugbucket @sbenthall @hell_homer is this post some kind of super-elaborate performance art

— ask her to unblock me, I am very new to flirting (@third_eye_gape) October 17, 2012

First, I got to say: Weird twitter definitely exists, and it is bigger and weirder than I imagined.

I want to write up some notes on my methodology for determining this, but I feel like some self-disclosure is in order.

I’m a PhD student with research interests that include community formation on the internet and collective intelligence. I’ve been studying theories about how communities establish their boundaries using symbols, and am also interested in “collective sensemaking.”

I am 27 years old and have been on the internet for long enough to know what I’m doing. I am really into conceptual art.

I’ve been aware of what I’ve referred to as “weird twitter” for some time, and have been curious what’s going on. I love it and love that it exists. But I didn’t know if it was real, or just something I was peripherally aware of because I followed a few people. It was much, much deeper than I had the patience to venture into at the time, but I had no sense of its scale.

Unfortunately, doing analysis on a gigantic unstructured digital social network turns out to be one of the big challenges of contemporary social science research. You either need to slurp a lot of data into something that crunches numbers, or you have to painstakingly research individuals in a tedious way that is not going to give you any general results on the scale necessary for this problem. So I tried a new method.

This method, which I don’t have a good name for, is basically: call it names, and see if it answers back. In other words, trolling.

I did this in August on a lark.

That post was an experiment.

Suppose “weird twitter” did not exist. Then there would be no reason for anybody to identify with its content. It would be a blog post lost in obscurity, like most of my blog posts.
But if “weird twitter” did exist, then there was a chance that it would react to its label in a statistically significant way.

I’ll adopt some speculative language for a moment: what if “weird twitter” were a kind of collective intelligence? Is it self-aware?

There are many theories of how self-awareness arises. Some believe that a person’s self-awareness depends on their interacting socially with others. It will not be a self until it is treated like a self. Until then, it will exist in some pre-conscious, animal state.

Others have argued that the Internet is creating a “global brain” of collective intelligence. This raises questions implicated by but far more interesting than the question of whether “corporations are people”. In what ways can a collection of people be a person? Do they need to self-identify as a community before that happens?

So many interesting questions.

Of course, if it were true that “weird twitter” were just a bunch of people telling jokes, and not a community, identity, culture, or collective intelligence, then a blog post about them would be meaningless and ephemeral.

For fun, I made the post extra obtuse.

I should say: “Weird twitter” seemed like a fun bunch, mostly just a bunch of jokers who don’t take things too seriously. So there was no way such a post would be taken seriously unless, well, I was wrong, and some people took it very, very seriously.

I have never gotten more hate spam in my life. Holy crap.

It is a really good thing I have a thick skin, because the amount of abuse I’ve put up with in the past 48 hours has been intense. There has also been a pretty epic amount of disdain and even a little attempted character assassination….

A note about this:

Ok, I need to address this directly, partly because it is the sort of thing that can really ruin ones reputation, and partly because I think it raises some pretty interesting questions on feminism on the internet.

Kimmy (@aRealLiveGhost) is a talented poet whose work I generally like and have recommended to others who appreciate poetry. (Think her reconfigurations of @horse_ebooks tweets are her best work.) As far as I know, she got her start just tweeting authentically. At some point, she started posting pictures of herself along with her poetry. She also seemed alarmed by the number of followers she was getting.

That was in January, which was before she was a minor internet celebrity with thousands of followers. However, one source (see comments to this post) has noted considerable overlap between “weird twitter” and the feminist twitter landscape. In light of this whole art project/experiment thing, Kimmy referred to that tweet, which generated some discussion.

As I’ve learned, this comment bothered Kimmy, and I’ve apologized. As I’ve explained, my intention was to point out that there might be some connection between (especially a woman) posting cute pictures of herself on the internet and her suddenly getting a lot of attention on the internet. The recent Violentacrez scandal highlights the extremes of this, and why I might be concerned on her behalf.

This comment, which some have called “anti-woman”, has been variously interpreted as:

“mansplaining”, presumably because I should know already that all women on the internet know that putting cute pictures of themselves will get them a lot of attention/followers/whatever.

insinuating that Kimmy’s success (in terms of Twitter followers I guess?) is undeserved or only because she put pictures of herself on the internet.

I take feminism rather seriously and so I found these accusations pretty hurtful, actually. But then I thought about it and realized that taken together, they make no sense. So, I’m over it.

EDIT: In the ensuing discussion over this, I’ve learned a lot about how feminists think about this comment. It’s reasonable for women to suspect that somebody making such a comment has hostile or demeaning intentions, and that problem is especially exacerbated in low-bandwidth computer mediated communication such as Twitter, where so much is left to interpretation. I regret saying it.

In other words, the experiment was a wild success in terms of generating a significant reaction. However, the results coming in were literally all over the map: random hate, denial that the phenomenon existed, direct confirmation that the phenomenon existed, questioning of the meaning of it. A surprising number of people telling me I had “ruined” something, or “didn’t get” something.

Basically, there was every possible angle of existential crisis represented in the response from the collective consciousness of weird twitter.

Or maybe subconscious. Some people on Twitter seem to see it as primarily an expression of the subconscious. Which would explain why it hates getting called out so much.

These results were nonetheless inconclusive. Weird twitter was being awakened from its subconscious, unreflective slumber. So I gave it a kick.

This post was of course the kind of postmodern ironic half-joke that seems to be so characteristic of “weird twitter” but I guess it went over the heads of a lot of people.

There’s a legitimate concern here that this post involved what the academic and mainstream press has termed “cyberbullying”. But I made a calculated decision that people who were actively being dickish about the whole thing to me directly were asking for it and could handle being made fun of. In case anyone else was concerned (one person who contacted me was), I spoke with@bugbucket and @hellhomer and we’re cool.

The point of the second post was:

As a measurement instrument. I had a good indication that Weird Twitter really did exist. But how big is it? I’ve got analytics set up, and figured there was no reason for somebody who wasn’t part of weird twitter to want to read a post about weird twitter. This would give a rough order of magnitude estimate at least.
To test the theory that an on-line community exists partly by negotiating its own symbolic boundaries, and to see if it would achieve self-consciousness if pressed on the issue.
To generate more data about digital communities reacting to external reification. The nice thing about all this is that Twitter stores probably 99% of all the relevant communication for this kind of identity formation process (or the failure of it), so at some point somebody might dig it up and check it out in more detail.

In case you are wondering, if you were to ask me “How many people do you think are part of Weird Twitter?”, I’d now say “about 3,000”, if you operationalize “weird twitter” as “the number of people who care enough about being called out as Weird Twitter to read an article about it”. There may, of course, be multiple or overlapping weird twitters. Maybe other parts of the “weird twitter” landscape could be identified by referring to other patterns of behavior. (Maybe there’s a weird twitter that tells completely different jokes than the ones identified in the original post) Perhaps this only got to the most sensitive or curious bunch, those that actively click links. There’s also no accounting for factors like time zone.

Really the next thing to do would be to try to map out the actual social network structure.

Qualitatively, there were a lot of interesting reactions and questions raised in this process. I want to note them here before I forget:

Because of the tone of the initial post, I was estimated to be older than I am, and I got some criticism that I was some weird old guy invading somebody else’s space. One person, presumably a teenager, tweeting angrily that I was exploiting teenagers.
Lots of people reacted to the feeling of being watched or categorized. That’s ironic, because what people post on Twitter is openly available, and many of the members of this community of literally thousands of “followers”. And, Twitter data as a whole is being slurped and analyzed and categorized all the time algorithmically for research and marketing purposes. The amount of outrage created by a blog post that WASN’T based on observation of most of the system suggests that people in Weird Twitter really don’t get this.
One of the smartest response I saw was somebody who suggested making their posts more private to avoid having them looked at by people like me. Yes, that is correct. I was pulling a prank on you. I am the least of your problems.
Those who I guess you could call the “thought leaders” in the Weird Twitter community are experts at managing information flow. While several members of the community passed around links to my post directly, others were quite deliberate in posting links to images that would not be traced back here. My favorite posts were those that obliquely acknowledged there was a controversy going on with no navigable links at all.
I was definitely “othered” throughout the whole process, despite the fact that I’ve been using Twitter and interacting with a few of the members of this community in a peripheral way for a while, and the claims by some of its members that it’s just a community of people making jokes than anyone can join. (If it is the latter, then I declare myself a member.) Since its central members appear to have more followers than they can keep track of, it’s not surprising that they would see me as an outsider, especially given the estranged language and alternative platform of the blog post. @hellhomer‘s observations that I was unqualified to comment on the community because I only shared a small number of connections was evidence that online community membership can be operationalized as membership in a quasi-clique structure.
A lot of people assumed I’m planning on writing an academic article about this, and thought that would be exploitative. In reality, I think there’s no way in hell I would get this past the IRB. This was performance art. Y’all are suckers. Funniest were the people that got on my case about the flimsiness of my analysis or research methods. Funniest was the person that told me I really ought to be referencing Bruno Latour.
But, one day yeah maybe I’ll write an article about Weird Twitter. Obviously I’d go about it totally differently, though I might start with leads I’ve gotten through this project. I do believe that the best way to study radically transparent on-line communities is through radically transparent research (thanks Mel for introducing me to this term), which this experiment was an exercise in.
Who the hell posted this quora post on weird twitter? What’s their angle? Their insight that Weird Twitter is like the /b/ of Twitter is a bold claim, because fewer communities have had an impact on internet culture as great at /b/. Have any significant memes originated in Weird Twitter and escaped into the wild? Unclear. Are there other, similarly creative and unregulated pseudonymous communities in other social media?
I’ve been asked by one tweeter to ‘please explore the carefucker vs jokeman split amongst “weird twitter”‘. That is a useful research lead if I’ve ever seen one. “Carefucker” has not yet hit Urban Dictionary, but I guess the term is self-explanatory. Ironically, in my observations the most polished “jokemen” were also the most strategic and guarded about their references to being labeled, while the most authentically absurd appeared to be “carefucking”. I suspect that some folks were trying hard to be cool.
A significant portion of the reactions were people upset that I had “ruined” their “thing”, that thing which may or may not be weird twitter. If I had to guess, this is due to the perception that blog posts are less ephemeral than tweets, which is true, but also the illusion that what is phenomenologically ephemeral for them isn’t permanent in fact. As I said in my second post, there’s a weird power dynamic at work between blogs and tweets. But this is absurd. Because, if your attention span has been trained on blogs and not tweets, you realize that blog posts, too, are historically ephemeral. Most of the traffic to this post has been from Twitter itself. It is an artifact produced by Weird Twitter, not (as it has been accused of being) a voyeuristic or surveilling observation made on it from without. If this post has any significance within the history of that community, it will only be because the community’s consciousness of itself lead to a kind of dissolution (or suicide), or because its significance has outshone its containment within Twitter itself. Only time will tell on that one.
I have heard a lot of complaints about the prominence of internet trolls sending death threats to especially feminist bloggers. I find that really interesting, because I generally appreciate feminists and and do some research on internet security. It was pretty shocking how much vitriol I got exposed to for writing a blog post describing an internet community that maybe didn’t exist. I’ve assumed for the purpose of writing this that those people who attacked me were somehow motivated by anger at the blog post. But wouldn’t that be completely batshit? I mean, look at that first blog post. It’s dumb. I have an alternative theory, which is that there is a population on the internet that opportunistically hates on anybody who they think they can get away with hating on. This is a testable hypothesis, which if true would simplify the problem of cleaning up the mess. If hate speech on the internet were considered less a political issue and more an issue like spam detection and removal, I think the Internet would be a better place.

If you’ve read this far, then thank you for your interest. I’ve found this a very rewarding and insightful experience, and I hope you have gotten something out of it as well.

79 Comments

Tag: methodology