Digifesto

Category: social media

correcting an error in my analysis

There is an error in my last post where I was thinking through the interpretation of 25,000,000 hit number reported for the Buzzfeed blue/black/white/whatever dress post. In that post I assumed that the distribution of viewers would be the standard one you see in on-line participation: a power law distribution with a long tail. Depending on which way you hold the diagram, the “tail” is either the enormous number of instances that only occur once (in this case, a visitor who goes to the page once and never again) or it’s population of instances that have bizarrely high occurrences (like that one guy who hit refresh on the page 100 times, and the woman that looked at the page 300 times, and…). You can turn one tail into the other by turning the histogram sideways and shaking really hard.

The problem with this analysis is that it ignores the data I’ve been getting from a significant subset of people who I’ve talked to about this in passing, which is that because the page contains some sort of well-crafted optical illusion, lots of people have looked at it once (and seen it as, say, a blue and black dress) and then looked at it again, seeing it as white and gold. In fact the article seems designed to get the reader to do just this.

If I’m being somewhat abstract in my analysis, it’s because I’ve refused to go click on the link myself. I have read too much Adorno. I hear the drumbeat of fascism in all popular culture. I do not want to take part in intelligently designed collective effervescence if I can help it. This is my idiosyncrasy.

But this inferred stickiness of the dress image has consequences for the traffic analysis. I’m sure that whoever is actually looking at the metrics on the article is tracking repeat version unique visitors. I wonder how deliberately the image was created with the idea of maximizing repeat visitations in mind, and the observed correlation between repeat and unique visitors. Repeated visits suggests sustained interest over time, whereas “mere” virality is a momentary spread of information over space. If you see content as a kind of property and sustained traffic over time as the value of that property, it makes sense to try to create things with staying power. Memetic globules forever gunking the crisscrossed manifold of attention. Culture.

Does this require a different statistical distribution to process properly? Is Cosma Shalizi right after all, and are these “power law” distributions just overhyped log-normal distributions? What happens when the generative process has a stickiness term? Is that just reflected in the power law distribution’s exponent? One day I will get a grip on this. Maybe I can do it working with mailing list data.

I’m writing this because over the weekend I was talking with a linguist and a philosopher about collective attention, a subject of great interest to me. It was the linguist who reported having looked at the dress twice and seeing it in different colors. The philosopher had not seen it. The latter’s research specialty was philosophy of mind, a kind of philosophy I care about a lot. I asked him whether in cases of collective attention the mental representation supervenes reductively on many individual minds or on more than that. He said that this is a matter of current debate but that he wants to argue that collective attention means more than my awareness of X, and my awareness of your awareness of X, ad infinitum. Ultimately I’m a mathematical person and am happy to see the limit of the infinite process as itself and its relationship with what it reduces to mediated by the logic of infinitesimals. But perhaps even this is not enough. I gave the philosopher my recommendation of Soren Brier and Ulanowicz, who together I think provide the groundwork needed for an ontology of macroorganic mentality and representation. The operationalization of these theories is the goal of my work at Glass Bead Labs.

25,000,000 re: @ftrain

It was gratifying to read Paul Ford’s reluctant think piece about the recent dress meme epidemic.

The most interesting fact in the article was that Buzzfeed’s dress article has gotten 25 million views:

People are also keenly aware that BuzzFeed garnered 25 million views (and climbing) for its article about the dress. Twenty-five million is a very, very serious number of visitors in a day — the sort of traffic that just about any global media property would kill for (while social media is like, ho hum).

I’ve recently become interested in the question: how important is the Internet, really? Those of us who work closely with it every day see it as central to our lives. Logically, we would tend to extrapolate and think that it is central to everybody’s life. If we are used to sampling from other’s experience using social media, we would see that social media is very important in everybody’s life, confirming this suspicion.

This is obviously a kind of sampling bias though.

This is where the 25,000,000 figure comes in handy. My experience of the dress meme was that it was completely ubiquitous. Literally nobody I was following on Twitter who was tweeting that day was not at least referencing the dress. The meme also got to me via an email backchannel, and came up in a seminar. Perhaps you had a similar experience: you and everyone you knew was aware of this meme.

Let’s assume that 25 million is an indicator of the order of magnitude of people that learned about this meme. If you googled the dress question, you probably clicked the article. Maybe you clicked it twice. Maybe you clicked it twenty times and you are an outlier. Maybe you didn’t click it at all. It’s plausible that it evens out and the actual number of people who were aware of the meme is somewhere between 10 million and 50 million.

That’s a lot of people. But–and this is really my point–it’s not that many people, compared to everybody. There’s about 300 million people in the United States. There’s over 7 billion people on the planet. Who are the tenth of the population who were interested in the dress? If you are reading this blog, they are probably people a lot like you or I. Who are the other ~93% of people in the U.S.?

I’ve got a bold hypothesis. My hypothesis is that the other 90% of people are people who have lives. I mean this in the sense of the idiom “get a life“, which has fallen out of fashion for some reason. Increasingly, I’m becoming interested in the vast but culturally foreign population of people who followed this advice at some point in their lives and did not turn back. Does anybody know of any good ethnographic work about them? Where do they hang out in the Bay Area?

‘Bad twitter’ : exit, voice, and social media

I made the mistake in the past couple of days of checking my Twitter feed. I did this because there are some cool people on Twitter and I want to have conversations with them.

Unfortunately it wasn’t long before I started to read things that made me upset.

I used to think that a benefit of Twitter was that it allowed for exposure to alternative points of view. Of course you should want to see the other side, right?

But then there’s this: if you do that for long enough, you start to see each “side” make the same mistakes over and over again. It’s no longer enlightening. It’s just watching a train wreck in slow motion on repeat.

Hirschman’s Exit, Voice, and Loyalty is relevant to this. Presumably, over time, those who want a higher level of conversation Exit social media (and its associated news institutions, such as Salon.com) to more private channels, causing a deterioration in the quality of public discourse. Because social media sites have very strong network effects, they are robust to any revenue loss due to quality-sensitive Exiters, leaving a kind of monopoly-tyranny that Hirschman describes vividly thus:

While of undoubted benefit in the case of the exploitative, profit-maximizing monopolist, the presence of competition could do more harm than good when the main concern is to counteract the monopolist’s tendency toward flaccidity and mediocrity. For, in that case, exit-competition could just fatally weaken voice along the lines of the preceding section, without creating a serious threat to the organization’s survival. This was so for the Nigerian Railway Corporation because of the ease with which it could dip into the public treasury in case of deficit. But there are many other cases where competition does not restrain monopoly as it is supposed to, but comforts and bolsters it by unburdening it of its more troublesome customers. As a result, one can define an important and too little noticed type of monopoly-tyranny: a limited type, an oppression of the weak by the incompetent and an exploitation of the poor by the lazy which is the more durable and stifling as it is both unambitious and escapable. The contrast is stark indeed with totalitarian, expansionist tyrannies or the profit-maximizing, accumulation-minded monopolies which may have captured a disproportionate share of our attention.

It’s interesting to compare a Hirschman-inspired view of the decline of Twitter as a function of exit and voice to a Frankfurt School analysis of it in terms of the culture industry. It’s also interesting to compare this with boyd’s 2009 paper on “White flight in networked publics?” in which she chooses to describe the decline of MySpace in terms of the troubled history of race and housing.*

In particular, there are passages of Hirschman in which he addresses neighborhoods of “declining quality” and the exit and voice dynamics around them. It is interesting to me that the narrative of racialized housing policy and white flight is so salient to me lately that I could not read these passages of Hirschman without raising an eyebrow at the fact that he didn’t mention race in his analysis. Was this color-blind racism? Or am I now so socialized by the media to see racism and sexism everywhere that I assumed there were racial connotations when in fact he was talking about a general mechanism. Perhaps the salience of the white flight narrative to me has made me tacitly racist by making me assume that the perceived decline in neighborhood quality is due to race!

The only way I could know for sure what was causing what would be to conduct a rigorous empirical analysis I don’t have time for. And I’m an academic whose job is to conduct rigorous empirical analyses! I’m forced to conclude that without a more thorough understanding of the facts, any judgment either way will be a waste of time. I’m just doing my best over here and when push comes to shove I’m a pretty nice guy, my friends say. Nevertheless, it’s this kind of lazy baggage-slinging that is the bread and butter of the mass journalist today. Reputations earned and lost on the basis of political tribalism! It’s almost enough to make somebody think that these standards matter, or are the basis of a reasonable public ethics of some kind that must be enforced lest society fall into barbarism!

I would stop here except that I am painfully aware that as much as I know it to be true that there is a portion of the population that has exited the morass of social media and put it to one side, I know that many people have not. In particular, a lot of very smart, accomplished friends of mine are still wrapped up in a lot of stupid shit on the interwebs! (Pardon my language!) This is partly due to the fact that networked publics now mediate academic discourse, and so a lot of aspiring academics now feel they have to be clued in to social media to advance their careers. Suddenly, everybody who is anybody is a content farmer! There’s a generation who are looking up to jerks like us! What the hell?!?!

This has a depressing consequence. Since politically divisive content is popular content, and there is pressure for intellectuals to produce popular content, this means that intellectuals have incentives to propagate politically divisive narratives instead of working towards reconciliation and the greater good. Or, alternatively, there is pressure to aim for the lowest common denominator as an audience.

At this point, I am forced to declare myself an elitist who is simply against provocation of any kind. It’s juvenile, is the problem. (Did I mention I just turned 30? I’m an adult now, swear to god.) I would keep this opinion to myself, but at that point I’m part of the problem by not exercising my Voice option. So here’s to blogging.

* I take a particular interest in danah boyd’s work because in addition to being one of the original Internet-celebrity-academics-talking-about-the-Internet and so aptly doubles as both the foundational researcher and just slightly implicated subject matter for this kind of rambling about social media and intellectualism (see below), she also shares an alma mater with me (Brown) and is the star graduate of my own department (UC Berkeley’s School of Information) and so serves as a kind of role model.

I feel the need to write this footnote because while I am in the scholarly habit of treating all academic writers I’ve never met abstractly as if they are bundles of text subject to detached critique, other people think that academics are real people(!), especially academics themselves. Suddenly the purely intellectual pursuit becomes personal. Multiple simultaneous context collapses create paradoxes on the level of pragmatics that would make certain kinds of communication impossible if they are not ignored. This can be awkward but I get a kind of perverse pleasure out of leaving analytic puzzles to whoever comes next.

I’m having a related but eerier intellectual encounter with an Internet luminary in some other work I’m doing. I’m writing software to analyze a mailing list used by many prominent activists and professionals. Among the emails are some written by the late Aaron Swartz. In the process of working on the software, I accepted a pull request from a Swiss programmer I had never met which has the Python package html2text as a dependency. Who wrote the html2text package? Aaron Swartz. Understand I never met the guy, am trying to map out how on-line communication mediates the emergent structure of the sociotechnical ecosystem of software and the Internet, and obviously am interested reflexively in how my own communication and software production fits into that larger graph. (Or multigraph? Or multihypergraph?) Power law distributions of connectivity on all dimensions make this particular situation not terribly surprising. But it’s just one of many strange loops.

analysis of content vs. analysis of distribution of media

A theme that keeps coming up for me in work and conversation lately is the difference between analysis of the content of media and analysis of the distribution of media.

Analysis of content looks for the tropes, motifs, psychological intentions, unconscious historical influences, etc. of the media. Over Thanksgiving a friend of mine was arguing that the Scorpions were a dog whistle to white listeners because that band made a deliberate move to distance themselves from influence of black music on rock. Contrast this with Def Leppard. He reached this conclusion based by listening carefully to the beats and contextualizing them in historical conversations that were happening at the time.

Analysis of distribution looks at information flow and the systemic channels that shape it. How did the telegraph change patterns of communication? How did television? Radio? The Internet? Google? Facebook? Twitter? Ello? Who is paying for the distribution of this media? How far does the signal reach?

Each of these views is incomplete. Just as data underdetermines hypotheses, media underdetermines its interpretation. In both cases, a more complete understanding of the etiology of the data/media is needed to select between competing hypotheses. We can’t truly understand content unless we understand the channels through which it passes.

Analysis of distribution is more difficult than analysis of content because distribution is less visible. It is much easier to possess and study data/media than it is to possess and study the means of distribution. The means of distribution are a kind of capital. Those that study it from the outside must work hard to get anything better than a superficial view of it. Those on the inside work hard to get a deep view of it that stays up to date.

Part of the difficulty of analysis of distribution is that the system of distribution depends on the totality of information passing through it. Communication involves the dynamic engagement of both speakers and an audience. So a complete analysis of distribution must include an analysis of content for every piece of implicated content.

One thing that makes the content analysis necessary for analysis of distribution more difficult than what passes for content analysis simpliciter is that the former needs to take into account incorrect interpretation. Suppose you were trying to understand the popularity of Fascist propaganda in pre-WWII Germany and were interested in how the state owned the mass media channels. You could initially base your theory simply on how people were getting bombarded by the same information all the time. But you would at some point need to consider how the audience was reacting. Was it stirring feelings of patriotic national identity? Did they experience communal feelings with others sharing similar opinions? As propaganda provided interpretations of Shakespeare saying he was secretly a German and denunciation of other works as “degenerate art”, did the audience believe this content analysis? Did their belief in the propaganda allow them to continue to endorse the systems of distribution in which they took part?

This shows how the question of how media is interpreted is a political battle fought by many. Nobody fighting these battles is an impartial scientist. Since one gets an understanding of the means of distribution through impartial science, and since this understanding of the means of distribution is necessary for correct content analysis, we can dismiss most content analysis as speculative garbage, from a scientific perspective. What this kind of content analysis is instead is art. It can be really beautiful and important art.

On the other hand, since distribution analysis depends on the analysis of every piece of implicated content, distribution analysis is ultimately hopeless without automated methods for content analysis. This is one reason why machine learning techniques for analyzing text, images, and video are such a hot research area. While the techniques for optimizing supply chain logistics (for example) are rather old, the automated processing of media is a more subtle problem precisely because it involves the interpretation and reinterpretation by finite subjects.

By “finite subject” here I mean subjects that are inescapably limited by the boundaries of their own perspective. These limits are what makes their interpretation possible and also what makes their interpretation incomplete.

things I’ve been doing while not looking at twitter

Twitter was getting me down so I went on a hiatus. I’m still on that hiatus. Instead of reading Twitter, I’ve been:

  • Reading Fred Turner’s The Democratic Surround. This is a great book about the relationship between media and democracy. Since a lot of my interest in Twitter has been because of my interest in the media and democracy, this gives me those kinds of jollies without the soap opera trainwreck of actually participating in social media.
  • Going to arts events. There was a staging of Rhinoceros at Berkeley. It’s an absurdist play in which a small French village is suddenly stricken by an epidemic wherein everybody is transformed into a rhinoceros. It’s probably an allegory for the rise of Communism or Fascism but the play is written so that it’s completely ambiguous. Mainly it’s about conformity in general, perhaps ideological conformity but just as easily about conformity to non-ideology, to a state of nature (hence, the animal form, rhinoceros.) It’s a good play.
  • I’ve been playing Transistor. What an incredible game! The gameplay is appealingly designed and original, but beyond that it is powerfully written an atmospheric. In many ways it can be read as a commentary on the virtual realities of the Internet and the problems with them. Somehow there was more media attention to GamerGate than to this one actually great game. Too bad.
  • I’ve been working on papers, software, and research in anticipation of the next semester. Lots of work to do!

Above all, what’s great about unplugging from social media is that it isn’t actually unplugging at all. Instead, you can plug into a smarter, better, deeper world of content where people are more complex and reasonable. It’s elevating!

I’m writing this because some time ago it was a matter of debate whether or not you can ‘just quit Facebook’ etc. It turns out you definitely can and it’s great. Go for it!

(Happy to respond to comments but won’t respond to tweets until back from the hiatus)

The Facebook ethics problem is a political problem

So much has been said about the Facebook emotion contagion experiment. Perhaps everything has been said.

The problem with everything having been said is that by an large people’s ethical stances seem predetermined by their habitus.

By which I mean: most people don’t really care. People who care about what happens on the Internet care about it in whatever way is determined by their professional orientation on that matter. Obviously, some groups of people benefit from there being fewer socially imposed ethical restrictions on data scientific practice, either in an industrial or academic context. Others benefit from imposing those ethical restrictions, or cultivating public outrage on the matter.

If this is an ethical issue, what system of ethics are we prepared to use to evaluate it?

You could make an argument from, say, a utilitarian perspective, or a deontological perspective, or even a virtue ethics standpoint. Those are classic moves.

But nobody will listen to what a professionalized academic ethicist will say on the matter. If there’s anybody who does rigorous work on this, it’s probably somebody like Luciano Floridi. His work is great, in my opinion. But I haven’t found any other academics who work in, say, policy that embrace his thinking. I’d love to be proven wrong.

But since Floridi does serious work on information ethics, that’s mainly an inconvenience to pundits. Instead we get heat, not light.

If this process resolves into anything like policy change–either governmental or internally at Facebook–it will because of a process of agonistic politics. “Agonistic” here means fraught with conflicted interests. It may be redundant to modify ‘politics’ with ‘agonistic’ but it makes the point that the moves being made are strategic actions, aimed at gain for ones person or group, more than they are communicative ones, aimed at consensus.

Because e.g. Facebook keeps public discussion fragmented through its EdgeRank algorithm, which even in its well-documented public version is full of apparent political consequences and flaws, there is no way for conversation within the Facebook platform to result in consensus. It is not, as has been observed by others, a public. In a trivial sense, it’s not a public because the data isn’t public. The data is (sort of) private. That’s not a bad thing. It just means that Facebook shouldn’t be where you go to develop a political consensus that could legitimize power.

Twitter is a little better for this, because it’s actually public. Facebook has zero reason to care about the public consensus of people on Twitter though, because those people won’t organize a consumer boycott of Facebook, because they can only reach people that use Twitter.

Facebook is a great–perhaps the greatest–example of what Habermas calls the steering media. “Steering,” because it’s how powerful entities steer public opinion. For Habermas, the steering media control language and therefore culture. When ‘mass’ media control language, citizens no longer use language to form collective will.

For individualized ‘social’ media that is arranged into filter bubbles through relevance algorithms, language is similarly controlled. But rather than having just a single commanding voice, you have the opportunity for every voice to be expressed at once. Through homophily effects in network formation, what you’d expect to see are very intense clusters of extreme cultures that see themselves as ‘normal’ and don’t interact outside of their bubble.

The irony is that the critical left, who should be making these sorts of observations, is itself a bubble within this system of bubbles. Since critical leftism is enacted in commercialized social media which evolves around it, it becomes recuperated in the Situationist sense. Critical outrage is tapped for advertising revenue, which spurs more critical outrage.

The dependence of contemporary criticality on commercial social media for its own diffusion means that, ironically, none of them are able to just quit Facebook like everyone else who has figured out how much Facebook sucks.

It’s not a secret that decentralized communication systems are the solution to this sort of thing. Stanford’s Liberation Tech group captures this ideology rather well. There’s a lot of good work on censorship-resistant systems, distributed messaging systems, etc. For people who are citizens in the free world, many of these alternative communication platforms where we are spared from algorithmic control are very old. Some people still use IRC for chat. I’m a huge fan of mailing lists, myself. Email is the original on-line social media, and ones inbox is ones domain. Everyone who is posting their stuff to Facebook could be posting to a WordPress blog. WordPress, by the way, has a lovely user interface these days and keeps adding “social” features like “liking” and “following”. This goes largely unnoticed, which is too bad, because Automattic, the company the runs WordPress, is really not evil at all.

So there are plenty of solutions to Facebook being bad for manipulative and bad for democracy. Those solutions involve getting people off of Facebook and onto alternative platforms. That’s what a consumer boycott is. That’s how you get companies to stop doing bad stuff, if you don’t have regulatory power.

Obviously the real problem is that we don’t have a less politically problematic technology that does everything we want Facebook to do only not the bad stuff. There are a lot of unsolved technical accomplishments to getting that to work. I think I wrote a social media think piece about this once.

I think a really cool project that everybody who cares about this should be working on is designing and executing on building that alternative to Facebook. That’s a huge project. But just think about how great it would be if we could figure out how to fund, design, build, and market that. These are the big questions for political praxis in the 21st century.

notes

This article is making me doubt some of my earlier conclusions about the role of the steering media. Habermas, I’ve got to concede, is dated. As much as skeptics would like to show how social media fails to ‘democratize’ media (not in the sense of being justly won by elections, but rather in the original sense of being mob ruled), the fragmentation is real and the public is reciprocally involved in its own narration.

What can then be said of the role of new media in public discourse? Here are some hypotheses:

  • As a first order effect, new media exacerbates shocks, both endogenous and exogenous. See Didier Sornette‘s work on application of self-excited Hawkes process to social systems like finance and Amazon reviews. (I’m indebted to Thomas Maillart for introducing me to this research.) This changes the dynamics because rather than being Poisson distributed, new media intervention is strategically motivated.
  • As a second order effect, since new media acting strategically, it must make predictive assessments of audience receptivity. New media suppliers must anticipate and cultivate demand. But demand is driven partly by environmental factors like information availability. See these notes on Dewey’s ethical theory for how taste can be due to environmental adaptation with no truly intrinsic desire–hence, the inappropriateness of modeling these dynamics straightforwardly with ‘utility functions’–which upsets neoclassical market modeling techniques. Hence the ‘social media marketer’ position that engages regularly in communication with an audience in order to cultivate a culture that is also a media market. Microcelebrity practices achieve not merely a passively received branding but an actively nurtured communicative setting. Communication here is transmission (Shannon, etc.) and/or symbolic interaction, on which community (Carey) supervenes.
  • Though not driven be neoclassical market dynamics simpliciter, new media is nevertheless competitive. We should expect new media suppliers to be fluidly territorial. The creates a higher-order incentive for curatorial intervention to maintain and distinguish ones audience as culture. A critical open question here is to what extent these incentives drive endogenous differentiation, vs. to what extent media fragmentation results in efficient allocation of information (analogously to efficient use of information in markets.) There is no a priori reason to suppose that the ad hoc assemblage of media infrastructures and regulations minimizes negative cultural externalities. (What are examples of negative cultural externalities? Fascism, ….)
  • Different media markets will have different dialects, which will have different expressive potential because of description lengths of concepts. (Algorithmic information theoretic interpretation of weak Sapir-Whorf hypothesis.) This is unavoidable because man is mortal (cannot approach convergent limits in a lifetime.) Some consequences (which have taken me a while to come around to, but here it is):
    1. Real intersubjective agreement is only provisionally and locally attainable.
    2. Language use, as a practical effect, has implications for future computational costs and therefore is intrinsically political.
    3. The poststructuralists are right after all. ::shakes fist at sky::
    4. That’s ok, we can still hack nature and create infrastructure; technical control resonates with physical computational layers that are not subject to wetware limitations. This leaves us, disciplinarily, with post-positivist engineering, post-structuralist hermeneutics enabling only provisional consensus and collective action (which can, at best, be ‘society made durable’ via technical implementation or cultural maintenance (see above on media market making), and critical reflection (advancing social computation directly).
  • There is a challenge to Pearl/Woodward causality here, in that mechanistic causation will be insensitive to higher-order effects. A better model for social causation would be Luhmann’s autopoieisis (c.f Brier, 2008). Ecological modeling (Ulanowicz) provides the best toolkit for showing interactions between autopoietic networks?

This is not helping me write my dissertation prospectus at all.

How to tell the story about why stories don’t matter

I’m thinking of taking this seminar because I’m running into the problem it addresses: how do you pick a theoretical lens for academic writing?

This is related to a conversation I’ve found myself in repeatedly over the past weeks. A friend who studied Rhetoric insists that the narrative and framing of history is more important than the events and facts. A philosopher friend minimizes the historical impact of increased volumes of “raw footage”, because ultimately it’s the framing that will matter.

Yesterday I had the privilege of attending Techraking III, a conference put on by the Center for Investigative Reporting with the generous support and presence of Google. It was a conference about data journalism. The popular sentiment within the conference was that data doesn’t matter unless it’s told with a story, a framing.

I find this troubling because while I pay attention to this world and the way it frames itself, I also read the tech biz press carefully, and it tells a very different narrative. Data is worth billions of dollars. Even data exhaust, the data fumes that come from your information processing factory, can be recycled into valuable insights. Data is there to be mined for value. And if you are particularly genius at it, you can build an expert system that acts on the data without needing interpretation. You build an information processing machine that acts according to mechanical principles that approximate statistical laws, and these machines are powerful.

As social scientists realize they need to be data scientists, and journalists realize they need to be data journalists, there seems to be in practice a tacit admission of the data-driven counter-narrative. This tacit approval is contradicted by the explicit rhetoric that glorifies interpretation and narrative over data.

This is an interesting kind of contradiction, as it takes place as much in the psyche of the data scientist as anywhere else. It’s like the mouth doesn’t know what the hand is doing. This is entirely possible since our minds aren’t actually that coherent to start with. But it does make the process of collaboratively interacting with others in the data science field super complicated.

All this comes to a head when the data we are talking about isn’t something simple like sensor data about the weather but rather is something like text, which is both data and narrative simulatenously. We intuitively see the potential of treating narrative as something to be treated mechanically, statistically. We certainly see the effects of this in our daily lives. This is what the most powerful organizations in the world do all the time.

The irony is that the interpretivists, who are so quick to deny technological determinism, are the ones who are most vulnerable to being blindsided by “what technology wants.” Humanities departments are being slowly phased out, their funding cut. Why? Do they have an explanation for this? If interpetation/framing were as efficacious as they claim, they would be philosopher kings. So their sociopolitical situation contradicts their own rhetoric and ideology. Meanwhile, journalists who would like to believe that it’s the story that matters are, for the sake of job security, being corralled into classes to learn CSS, the programming language that determines, mechanically, the logic of formatting and presentation.

Sadly, neither mechanists nor interpretivists have much of an interest in engaging this contradiction. This is because interpretivists chase funding by reinforcing the narrative that they are critically important, and the work of mechanists speaks for itself in corporate accounting (an uninterpretive field) without explanation. So this contradiction falls mainly into the laps of those coordinating interaction between tribes. Managers who need to communicate between engineering and marketing. University administrators who have to juggle the interests of humanities and sciences. The leadership of investigative reporting non-profits who need to justify themselves to savvy foundations and who are removed enough from particular skillsets to be flexible.

Mechnanized information processing is becoming the new epistemic center. (Forgive me:) the Google supercomputer approximating statistics has replaced Kantian trancendental reason as the grounds for bourgious understanding of the world. This is threatening, of course, to the plurality of perspectives that do not themselves internalize the logic of machine learning. Where machine intelligence has succeeded, then, it has been by juggling this multitude of perspectives (and frames) through automated, data-driven processes. Machine intelligence is not comprehensible to lay interpretivism. Interestingly, lay interpetivism isn’t comprehensible yet to machine intelligence–natural language processing has not yet advanced so far. It treats our communications like we treat ants in an ant farm: a blooming buzzing confusion of arbitrary quanta, fascinatingly complex for its patterns that we cannot see. And when it makes mistakes–and it does often–we feel its effects as a structural force beyond our control. A change in the user interface of Facebook that suddenly exposes drunken college photos to employers and abusive ex-lovers.

What theoretical frame is adequate to tell this story, the story that’s determining the shape of knowledge today? For Lyotard, the postmodern condition is one in which metanarratives about the organization of knowledge collapse and leave only politics, power, and language games. The postmodern condition has gotten us into our present condition: industrial machine intelligence presiding over interpretivists battling in paralogical language games. When the interpretivists strike back, it looks like hipsters or Weird Twitter–paralogy as a subculture of resistance that can’t even acknowledge its own role as resistance for fear of recuperation.

We need a new metanarrative to get out of this mess. But what kind of theory could possibly satisfy all these constituents?

Complications in Scholarly Hypertext

I’ve got a lot of questions about on-line academic publishing. A lot of this comes from career anxiety: I am not a very good academic because I don’t know how to write for academic conferences and journals. But I’m also coming from an industry that is totally eating the academy’s lunch when it comes to innovating and disseminating information. People within academia are increasingly feeling the disruptive pressure of alternative publication venues and formats, and moreover seeing the need for alternatives for the sake of the intellectual integrity of the whole enterprise. Open science, open data, reproducible research–these are keywords for new practices that are meant to restore confidence in science itself, in part by making it more accessible.

One manifestation of this trend is the transition of academic group blogs into academic quasi-journals or on-line magazines. I don’t know how common this is, but I recently had a fantastic experience of this writing for Ethnography Matters. Instead of going through an opaque and problematic academic review process, I worked with editor Rachelle Annechino to craft a piece about Weird Twitter that was appropriate for the edition and audience.

During the editing process, I tried to unload everything I had to say about Weird Twitter so that I could at last get past it. I don’t consider myself an ethnographer and I don’t want to write my dissertation of Weird Twitter. But Rachelle encouraged me to split off the pseudo-ethnographic section into a separate post, since the first half was more consistent with the Virtual Identity edition. (Interesting how the word “edition”, which has come to mean “all the copies of a specific issue of a newspaper”, in the digital context returns to its etymological roots as simply something published or produced (past participle)).

Which means I’m still left with the (impossible) task of doing an ethnography (something I’m not very well trained for) about Weird Twitter (which might not exist). Since I don’t want to violate the contextual integrity of Weird Twitter more than I already have, I’m reluctant to write about it in a non-Web-based medium.

This carries with it a number of challenges, not least of which is the reception on Twitter itself.

What my thesaurus and I do in the privacy of our home is our business and anyway entirely legal in the state of California. But I’ve come to realize that forced disclosure is an occupational hazard I need to learn to accept. What these remarks point to, though, is the tension between access to documents as data and access to documents as sources of information. The latter, as we know from Claude Shannon, requires an interpreter who can decode the language in which the information is written.

Expert language is a prison for knowledge and understanding. A prison for intellectually significant relationships. It is time to move beyond the institutional practices of triviledge

– Taylor and Saarinen, 1994, quoted in Kolb, 1997

Is it possible to get away from expert language in scholarly writing? Naively, one could ask experts to write everything “in plain English.” But that doesn’t do language justice: often (though certainly not always) new words express new concepts. Using a technical vocabulary fluently requires not just a thesaurus, but an actual understanding of the technical domain. I’ve been through the phase myself in which I thought I knew everything and so blamed anything written opaquely to me on obscurantism. Now I’m humbler and harder to understand.

What is so promising about hypertext as a scholarly medium is that it offers a solution to this problem. Wikipedia is successful because it directly links jargon to further content that explains it. Those with the necessary expertise to read something can get the intended meaning out of an article, and those that are confused by terminology can romp around learning things. Maybe they will come back to the original article later with an expanded understanding.

xkcd: The Problem with Wikipedia

Hypertext and hypertext-based reading practices are valuable for making ones work open and accessible. But it’s not clear how to combine these with scholarly conventions on referencing and citations. Just to take Ethnography Matters as an example, for my article I used in-line linking and where I got to it parenthetical bibliographic information. Contrast with Heather Ford’s article in the same edition, which has no links and a section at the end for academic references. The APA has rules for citing web resources within an academic paper. What’s not clear is how directly linking citations within an academic hypertext document should work.

One reason for lack of consensus around this issue is that citation formatting is a pain in the butt. For off-line documents, word processing software has provided myriad tools for streamlining bibliographic work. But for publishing academic work on the web, we write in markup languages or WYSIWIG editors.

Since standards on the web tend to evolve through “rough consensus and running code”, I expect we’ll see a standard for this sort of thing emerge when somebody builds a tool that makes it easy for them to follow. This leads me back to fantasizing about the Dissertron. This is a bit disturbing. As much as I’d like to get away from studying Weird Twitter, I see now that a Weird Twitter ethnography is the perfect test-bed for such a tool precisely because of the hostile scrutiny it would attract.

Aristotelian legislation and the virtual community

I dipped into Aristotle’s Politics today and was intrigued by William Ellis’ introduction.

Ellis claims that in Aristotle’s day, you would call on a legislator as an external consultant when you set about founding a new city or colony. There were a great variety of constitutions available to be studied. You would study them to become an expert in how to design a community’s laws. Classical political philosophy was part of the very real project of starting new communities supporting human flourishing.

We see a similar situation with on-line communities today. If cyberspace was an electronic frontier, it’s been bulldozed and is now a metropolis with suburbs and strip malls. But there is still innovation in on-line social life as social media infrastructure as users migrate between social networking services.

If Lessig is right and “code is law“, then the variety of virtual communities and the opportunity to found new ones renews the role of the Aristotelian legislator. We can ask questions like: should an on-line community be self-governing or run by an aristocracy? How can it sustain itself economically, or defend itself in (cyber-)wars? How can it best promote human flourishing? The arts? Justice?

It would be easy to trivialize these possibilities by noting that virtual life is not real life. But that would underestimate the shift that is occurring as economic and political engagement moves on-line. In recognition and anticipation of these changes, philosophy has a practical significance in comprehensive design.

Follow

Get every new post delivered to your Inbox.

Join 976 other followers