Category: academia

objectivity is powerful

Like “neoliberal”, “objectivity” in contemporary academic discourse is only used as a term of disparagement. It has fallen out of fashion to speak about “objectivity” in scientific language. It remains in fashion to be critical of objectivity in those disciplines that have been criticizing objectivity since at least the 70’s.

This is too bad because objectivity is great.

The criticism goes like this: scientists and journalists both used to say that they were being objective. There was a lot to this term. It could mean ‘disinterested’ or it could mean so rigorous as to be perfectly inter-subjective. It sounded good. But actually, all the scientists and journalists who claimed to be objective were sexist, racist, and lapdogs of the bourgeoisie. They used ‘objectivity’ as a way to exclude those who were interested in promoting social justice. Hence, anyone who claims to be objective is suspicious.

There are some more sophisticated arguments than this but their sophistication only weakens the main emotional thrust of the original criticism. The only reason for this sophistication is to be academically impressive, which is fundamentally useless, or to respond in good faith to criticisms, which is politically unnecessary and probably unwise.

Why is it unwise to respond in good faith to criticisms of a critique of objectivity? Because to concede that good faith response to criticism is epistemically virtuous would be to concede something to the defender of objectivity. Once you start negotiating with the enemy in terms of reasons, you become accountable to some kind of shared logic which transcends your personal subjectivity, or the collective subjectivity of those whose perspectives are channeled in your discourse.

In a world in which power is enacted and exerted through discourse, and in which cultural logics are just rules in a language game provisionally accepted by players, this rejection of objectivity is true resistance. The act of will that resists logical engagement with those in power will stymie that power. It’s what sticks it to the Man.

The problem is that however well-intentioned this strategy may be, it is dumb.

It is dumb because as everybody knows, power isn’t exerted mainly through discourse. Power is exerted through violence. And while it may be fun to talk about “cultural logics” if you are a particular kind of academic, and even fun to talk about how cultural logics can be violent, that is vague metaphorical poetry compared to something else that they could be talking about. Words don’t kill people. Guns kill people.

Put yourself in the position of somebody designing and manufacturing guns. What do you talk about with your friends and collaborators? If you think that power is about discourse, then you might think that these people talk about their racist political agenda, wherein they reproduce the power dynamics that they will wield to continue their military dominance.

They don’t though.

Instead what they talk about is the mechanics of how guns work and the technicalities of supply chain management. Where are they importing their gunpowder from and how much does it cost? How much will it go boom?

These conversations aren’t governed by “cultural logics.” They are governed by logic. Because logic is what preserves the intersubjective validity of their claims. That’s important because to successful build and market guns, the gun has to go boom the same amount whether or not the person being aimed at shares your cultural logic.

This is all quite grim. “Of course, that’s the point: objectivity is the language of violence and power! Boo objectivity!”

But that misses the point. The point is that it’s not that objectivity is what powerful people dupe people into believing in order to stay powerful. The point is that objectivity is what powerful people strive for in order to stay powerful. Objectivity is powerful in ways that more subjectively justified forms of knowledge are not.

This is not a popular perspective. There a number of reasons for this. One is that attain objective understanding is a lot of hard work and most people are just not up for it. Another is that there are a lot of people who have made their careers arguing for a much more popular perspective, which is that “objectivity” is associated with evil people and therefor we should reject it as an epistemic principal. There will always be an audience for this view, who will be rendered powerless by it and become the self-fulfilling prophecy of the demagogues who encourage their ignorance.

frustrations with machine ethics

It’s perhaps because of the contemporary two cultures problem of tech and the humanities that machine ethics is in such a frustrating state.

Today I read danah boyd’s piece in The Message about technology as an arbiter of fairness. It’s more baffling conflation of data science with neoliberalism. This time, the assertion was that the ideology of the tech industry is neoliberalism hence their idea of ‘fairness’ is individualist and against social fabric. It’s not clear what backs up these kinds of assertions. They are more or less refuted by the fact that industrial data science is obsessed with our network of ties for marketing reasons. If anybody understands the failure of the myth of the atomistic individual, it’s “tech folks,” a category boyd uses to capture, I guess, everyone from marketing people at Google to venture capitalists to startup engineers to IBM researchers. You know, the homogenous category that is “tech folks.”

This kind of criticism makes the mistake of thinking that a historic past is the right way to understand a rapidly changing present that is often more technically sophisticated than the critics understand. But critical academics have fallen into the trap of critiquing neoliberalism over and over again. One problem is that tech folks don’t spend a ton of time articulating their ideology in ways that are convenient for pop culture critique. Often their business models require rather sophisticated understandings of the market, etc. that don’t fit readily into that kind of mold.

What’s needed is substantive progress in computational ethics. Ok, so algorithms are ethically and politically important. What politics would you like to see enacted, and how do you go about implementing that? How do you do it in a way that attracts new users and is competitively funded so that it can keep up with the changing technology with which we use to access the web? These are the real questions. There is so little effort spent trying to answer them. Instead there’s just an endless series of op-ed bemoaning the way things continue to be bad because it’s easier than having agency about making things better.

more on algorithms, judgment, polarization

I’m still pondering the most recent Tufekci piece about algorithms and human judgment on Twitter. It prompted some grumbling among data scientists. Sweeping statements about ‘algorithms’ do that, since to a computer scientist ‘algorithm’ is about as general a term as ‘math’.

In later conversation, Tufekci clarified that when she was calling out the potential problems of algorithmic filtering of the Twitter newsfeed, she was speaking to the problems of a newsfeed curated algorithmically for the sake of maximizing ‘engagement’. Or ads. Or, it is apparent on a re-reading of the piece, new members. She thinks an anti-homophily algorithm would maybe be a good idea, but that this is so unlikely according to the commercial logic of Twitter to be a marginal point. And, meanwhile, she defends ‘human prioritizatin’ over algorithmic curation, despite the fact that homophily (not to mention preferential attachment) are arguable negative consequences of social system driven by human judgment.

I think inquiry into this question is important, but bound to be confusing to those who aren’t familiar in a deep way with network science, machine learning, and related fields. It’s also, I believe, helpful to have a background in cognitive science, because that’s a field which maintains that human judgment and computational systems are doing fundamentally commensurable kinds of work. When we think in sophisticated way about crowdsourced labor, we use this sort of thinking. We acknowledge, for example, that human brains are better at the computational task of image recognition, so then we employ Turkers to look at and label images. But those human judgments are then inputs to statistical proceses that verify and check those judgments against each other. Later, those determinations that result from a combination of human judgment and algorithmic processing could be used in a search engine–which returns answers to questions based on human input. Search engines, then, are also a way of combining human and purely algorithmic judgment.

What it comes down to is that virtually all of our interactions with the internet are built around algorithmic affordances. And these systems can be understood systematically if we reject the quantitative/qualitative divide at the ontological level. Reductive physicalism entails this rejection, but–and this is not to be underestated–pisses or alienates people who do qualitative or humanities research.

This is old news. C.P. Snow’s The Two Cultures. The Science Wars. We’ve been through this before. Ironically, the polarization is algorithmically visible in the contemporary discussion about algorithms.*

The Two Cultures on Twitter?

It’s I guess not surprising that STS and cultural studies academics are still around and in opposition to the hard scientists. What’s maybe new is how much computer science now affects the public, and how the popular press appears to have allied itself with the STS and cultural studies view. I guess this must be because cultural anthropologists and media studies people are more likely to become journalists and writers, whereas harder science is pretty abstruse.

There’s an interesting conflation now from the soft side of the culture wars of science with power/privilege/capitalism that plays out again and again. I bump into it in the university context. I read about it all the time. Tufekci’s pessimism that the only algorihtmic filtering Twitter would adopt would be one that essentially obeys the logic “of Wall Street” is, well, sad. It’s sad that an unfortunate pairing that is analytically contingent is historically determined to be so.

But there is also something deeply wrong about this view. Of course there are humanitarian scientists. Of course there is a nuanced center to the science wars “debate”. It’s just that the tedious framing of the science wars has been so pervasive and compelling, like a commercial jingle, that it’s hard to feel like there’s an audience for anything more subtle. How would you even talk about it?

* I need to confess: I think there was some sloppiness in that Medium piece. If I had had more time, I would have done something to check which conversations were actually about the Tufekci article, and which were just about whatever. I feel I may have misrepresented this in the post. For the sake of accessibility or to make the point, I guess. Also, I’m retrospectively skittish about exactly how distinct a cluster the data scientists were, and whether its insularity might have been an artifact of the data collection method. I’ve been building out poll.emic in fits mainly as a hobby. I built it originally because I wanted to at last understand Weird Twitter’s internal structure. The results were interesting but I never got to writing them up. Now I’m afraid that the culture has changed so much that I wouldn’t recognize it any more. But I digress. Is it even notable that social scientists from different disciplines would have very different social circles around them? Is the generalization too much? And are there enough nodes in this graph to make it a significant thing to say about anything, really? There could be thousands of academic tiffs I haven’t heard about that are just as important but which defy my expectations and assumptions. Or is the fact that Medium appears to have endorsed a particular small set of public intellectuals significant? How many Medium readers are there? Not as many as there are Twitter users, by several orders of magnitude, I expect. Who matters? Do academics matter? Why am I even studying these people as opposed to people who do more real things? What about all the presumabely sane and happy people who are not pathologically on the Internet? Etc.

a response to “Big Data and the ‘Physics’ of Social Harmony” by @doctaj; also Notes towards ‘Criticality as ideology';

I’ve been thinking over Robin James’ “Big Data & the ‘Physics’ of Social Harmony“, an essay in three sections. The first discusses Singapore’s use of data science to detect terrorists and public health threats for the sake of “social harmony,” as reported by Harris in Foreign Policy. The second ties together Plato, Pentland’s “social physics”, and neoliberalism. The last discusses the limits to individual liberty proposed by J.S. Mill. The author admits it’s “all over the place.” I get the sense that it is a draft towards a greater argument. It is very thought-provoking and informative.

I take issue with a number of points in the essay. Underlying my disagreement is what I think is a political difference about the framing of “data science” and its impact on society. Since I am a data science practitioner who takes my work seriously, I would like this framing to be nuanced, recognizing both the harm and help that data science can do. I would like the debate about data science to be more concrete and pragmatic so that practitioners can use this discussion as a guide to do the right thing. I believe this will require discussion of data science in society to be informed by a technical understanding of what data science is up to. However, I think it’s also very important that these discussions rigorously take up the normative questions surrounding data sciences’ use. It’s with this agenda that I’m interested in James’ piece.

James is a professor of Philosophy and Women’s/Gender Studies and the essay bears the hallmarks of these disciplines. Situated in a Western and primarily anglophone intellectual tradition, it draws on Plato and Mill for its understanding of social harmony and liberalism. At the same time, it has the political orientation common to Gender Studies, alluding to the gendered division of economic labor, at times adopting Marxist terminology, and holding suspicion for authoritarian power. Plato is read as being the intellectual root of a “particular neoliberal kind of social harmony” that is “the ideal that informs data science.” James contrasts this ideal with the ideal of individual liberty, as espoused and then limited by Mill.

Where I take issue with James is that I think this line of argument is biased by its disciplinary formation. (Since this is more or less a truism for all academics, I suppose this is less a rebuttal than a critique.) Where I believe this is most visible is in her casting of Singapore’s ideal of social harmony as an upgrade of Plato, via the ideology of neoliberalism. She does not not consider in the essay that Singapore’s ideal of social harmony might be rooted in Eastern philosophy, not Western philosophy. Though I have no special access or insight into the political philosophy of Singapore, this seems to me to be an important omission given that Singapore is ethnically 74.2% Chinese and with Buddhist plurality.

Social harmony is a central concept in Eastern, especially Chinese, philosophy with deep roots in Confucianism and Daoism. A great introduction for those with background in Western philosophy who are interested in the philosophical contributions of Confucius is Fingarette’s Confucius: The Secular as Sacred. Fingarette discusses how Confucian thought is a reaction to the social upheaval and war of Anciant China’s Warring States Period, roughly 475 – 221 BC. Out of these troubling social conditions, Confucian thought attempts to establish conditions for peace. These include ritualized forms of social interaction at whose center is a benevolent Emperor.

There are many parallels with Plato’s political philosophy, but Fingarette makes a point of highlighting where Confucianism is different. In particular, the role of social ritual and ceremony as the basis of society is at odds with Western individualism. Political power is not a matter of contest of wills but the proper enactment of communal rites. It is like a dance. Frequently, the word “harmony” is used in the translation of Confucian texts to refer to the ideal of this functional, peaceful ceremonial society and, especially, its relationship with nature.

A thorough analysis of use of data science for social control in light of Eastern philosophy would be an important and interesting work. I certainly haven’t done it. My point is simply that when we consider the use of data science for social control as a global phenomenon, it is dubious to see it narrowly in light of Western intellectual history and ideology. That includes rooting it in Plato, contrasting it with Mill, and characterizing it primarily as an expression of white neoliberalism. Expansive use of these Western tropes is a projection, a fallacy of “I think this way, therefore the world must.” This I submit is an occupational hazard of anyone who sees their work primarily as an analysis of critique of ideology.

In a lecture in 1965 printed in Knowledge and Human Interests, Habermas states:

The concept of knowledge-constitutive human interests already conjoins the two elements whose relation still has to be explained: knowledge and interest. From everyday experience we know that ideas serve often enough to furnish our actions with justifying motives in place of the real ones. What is called rationalization at this level is called ideology at the level of collective action. In both cases the manifest content of statements is falsified by consciousness’ unreflected tie to interests, despite its illusion of autonomy. The discipline of trained thought thus correctly aims at excluding such interests. In all the sciences routines have been developed that guard against the subjectivity of opinion, and a new discipline, the sociology of knowledge, has emerged to counter the uncontrolled influence of interests on a deeper level, which derive less from the individual than from the objective situation of social groups.

Habermas goes on to reflect on the interests driving scientific inquiry–“scientific” in the broadest sense of having to do with knowledge. He delineates:

  • Technical inquiry motivated by the drive for manipulation and control, or power
  • Historical-hermeneutic inquiry motivated by the drive to guide collective action
  • Critical, reflexive inquiry into how the objective situation of social groups controls ideology, motivated by the drive to be free or liberated

This was written in 1965. Habermas was positioning himself as a critical thinker; however, unlike some of the earlier Frankfurt School thinkers he drew on, he did maintained that technical power was an objective human interest. (see Bohman and Rehg) In the United States especially, criticality as a mode of inquiry took aim at the ideologies that aimed at white, bourgeois, and male power. Contemporary academic critique has since solidified as an academic discipline and wields political power. In particular, is frequently enlisted as an expression of the interests of marginalized groups. In so doing, academic criticality has (in my view regrettably) becomes mere ideology. No longer interested in being scientifically disinterested, it has become a tool of rationalization. It’s project is the articulation of changing historical conditions in certain institutionally recognized tropes. One of these tropes is the critique of capitalism, modernism, neoliberalism, etc. and their white male bourgeois heritage. Another is the feminist emphasis on domesticity as a dismissed form on economic production. This trope features in James’ analysis of Singapore’s ideal of social harmony:

Harris emphasizes that Singaporeans generally think that finely-tuned social harmony is the one thing that keeps the tiny city-state from tumbling into chaos. [1] In a context where resources are extremely scarce–there’s very little land, and little to no domestic water, food, or energy sources, harmony is crucial. It’s what makes society sufficiently productive so that it can generate enough commercial and tax revenue to buy and import the things it can’t cultivate domestically (and by domestically, I really mean domestically, as in, by ‘housework’ or the un/low-waged labor traditionally done by women and slaves/servants.) Harmony is what makes commercial processes efficient enough to make up for what’s lost when you don’t have a ‘domestic’ supply chain. (emphasis mine)

To me, this parenthetical is quite odd. There are other uses of the word “domestic” that do not specifically carry the connotation of women and slave/servants. For example, the economic idea of gross domestic product just means “an aggregate measure of production equal to the sum of the gross values added of all resident institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs).” Included in that production is work done by men and high-wage laborers. To suggest that natural resources are primarily exploited by “domestic” labor in the ‘housework’ sense is bizarre given, say, agribusiness, industrial mining, etc.

There is perhaps an interesting etymological relationship here; does our use of ‘domestic’ in ‘domestic product’ have its roots in household production? I wouldn’t know. Does that same etymological root apply in Singapore? Was agriculture in East Asia traditionally the province of household servants in China and Southeast Asia (as opposed to independent farmers and their sons?)? Regardless, domestic economic production agricultural production is not housework now. So it’s mysterious that this detail should play a role in explaining Singapore’s emphasis on social harmony today.

So I think it’s safe to say that this parenthetical remark by James is due to her disciplinary orientation and academic focus. Perhaps it is a contortion to satisfy the audience of Cyborgology, which has a critical left-leaning politics. A Harris’s original article does not appear to support this interpretation. Rather, it only uses the word ‘harmony’ twice, and maintains a cultural sensitivity that James’ piece lacks, noting that Singapore’s use of data science may be motivated by a cultural fear of loss or risk.

The colloquial word kiasu, which stems from a vernacular Chinese word that means “fear of losing,” is a shorthand by which natives concisely convey the sense of vulnerability that seems coded into their social DNA (as well as their anxiety about missing out — on the best schools, the best jobs, the best new consumer products). Singaporeans’ boundless ambition is matched only by their extreme aversion to risk.

If we think that Harris is closer to the source here, then we do not need the projections of Western philosophy and neoliberal theory to explain what is really meant by Singapore’s use of data science. Rather, we can look to Singapore’s culture and perhaps its ideological origins in East Asian thinking. Confucius, not Plato.

* * *

If there it is a disciplinary bias to American philosophy departments, it is that they exist to reproduce anglophone philosophy. This is point that James has recently expressed herself…in fact while I have been in the process of writing this response.

Though I don’t share James’ political project, generally speaking I agree that effort spent of the reproduction of disciplinary terminology is not helpful to the philosophical and scientific projects. Terminology should be deployed for pragmatic reasons in service to objective interests like power, understanding, and freedom. On the other hand, language requires consistency to be effective, and education requires language. My own personal conclusion on is that the scientific project can only be sustained now through disciplinary collapse.

When James suggests that old terms like metaphysics and epistemology prevent the de-centering of the “white supremacist/patriarchal/capitalist heart of philosophy”, she perhaps alludes to her recent coinage of “epistemontology” as a combination of epistemology and ontology, as a way of designating what neoliberalism is. She notes that she is trying to understand neoliberalism as an ideology, not as a historical period, and finds useful the definition that “neoliberals think everything in the universe works like a deregulated, competitive, financialized capitalist market.”

However helpful a philosophical understanding of neoliberalism as market epistemontology might be, I wonder whether James sees the tension between her statements about rejecting traditional terminology that reproduces the philosophical discipline and her interest in preserving the idea of “neoliberalism” in a way that can be be taught in an introduction to philosophy class, a point she makes in a blog comment later. It is, perhaps, in the act of teaching that a discipline is reproduced.

The use of neoliberalism as a target of leftist academic critique has been challenged relatively recently. Craig Hickman, in a blog post about Luis Suarez-Villa, writes:

In fact Williams and Srinicek see this already in their first statement in the interview where they remind us that “what is interesting is that the neoliberal hegemony remains relatively impervious to critique from the standpoint of the latter, whilst it appears fundamentally unable to counter a politics which would be able to combat it on the terrain of modernity, technology, creativity, and innovation.” That’s because the ball has moved and the neoliberalist target has shifted in the past few years. The Left is stuck in waging a war it cannot win. What I mean by that is that it is at war with a target (neoliberalism) that no longer exists except in the facades of spectacle and illusion promoted in the vast Industrial-Media-Complex. What is going on in the world is now shifting toward the East and in new visions of technocapitalism of which such initiatives as Smart Cities by both CISCO (see here) and IBM and a conglomerate of other subsidiary firms and networking partners to build new 21st Century infrastructures and architectures to promote creativity, innovation, ultra-modernity, and technocapitalism.

Let’s face it capitalism is once again reinventing itself in a new guise and all the Foundations, Think-Tanks, academic, and media blitz hype artists are slowly pushing toward a different order than the older market economy of neoliberalism. So it’s time the Left begin addressing the new target and its ideological shift rather than attacking the boogeyman of capitalism’s past. Oh, true, the façade of neoliberalism will remain in the EU and U.S.A. and much of the rest of the world for a long while yet, so there is a need to continue our watchdog efforts on that score. But what I’m getting at is that we need to move forward and overtake this new agenda that is slowly creeping into the mix before it suddenly displaces any forms of resistance. So far I’m not sure if this new technocapitalistic ideology has even registered on the major leftist critiques beyond a few individuals like Luis Suarez-Villa. Mark Bergfield has a good critique of Suarez-Villa’s first book on Marx & Philosophy site: here.

In other words, the continuation of capitalist domination is due to its evolution relative to the stagnation of intellectual critiques of it. Or to put it another way, privilege is the capacity to evolve and not merely reproduce. Indeed, the language game of academic criticality is won by those who develop and disseminate new tropes through which to represent the interests of the marginalized. These privileged academics accomplish what Lyotard describes as “legitimation through paralogy.”

* * * * *

If James were working merely within academic criticality, I would be less interested in the work. But her aspirations appear to be higher, in a new political philosophy that can provide normative guidance in a world where data science is a technical reality. She writes:

Mill has already made–in 1859 no less–the argument that rationalizes the sacrifice of individual liberty for social harmony: as long as such harmony is enforced as a matter of opinion rather than a matter of law, then nobody’s violating anybody’s individual rights or liberties. This is, however, a crap argument, one designed to limit the possibly revolutionary effects of actually granting individual liberty as more than a merely formal, procedural thing (emancipating people really, not just politically, to use Marx’s distinction). For example, a careful, critical reading of On Liberty shows that Mill’s argument only works if large groups of people–mainly Asians–don’t get individual liberty in the first place. [2] So, critiquing Mill’s argument may help us show why updated data-science versions of it are crap, too. (And, I don’t think the solution is to shore up individual liberty–cause remember, individual liberty is exclusionary to begin with–but to think of something that’s both better than the old ideas, and more suited to new material/technical realities.)

It’s because of these more universalist ambitions that I think it’s fair to point out the limits of her argument. If a government’s idea of “social harmony” is not in fact white capitalist but premodern Chinese, if “neoliberalism” is no longer the dominant ideology but rather an idea of an ideology reproduced by a stagnating academic discipline, then these ideas will not help us understand what is going on in the contemporary world in which ‘data science’ is allegedly of such importance.

What would be better than this?

There is an empirical reality to the practices of data science. Perhaps it should be studied on its own terms, without disciplinary baggage.

picking a data backend for representing email in #python

I’m at a difficult crossroads with BigBang where I need to pick an appropriate data storage backend for my preprocessed mailing list data.

There are a lot of different aspects to this problem.

The first and most important consideration is speed. If you know anything about computer science, you know that it exists to quickly execute complex tasks that would take too long to do by hand. It’s odd writing that sentence since computational complexity considerations are so fundamental to algorithm design that this can go unspoken in most technical contexts. But since coming to grad school I’ve found myself writing for a more diverse audience, so…

The problem I’m facing is that in doing exploratory data analysis, I do not know all the questions I am going to ask yet. But any particular question will be impractical to ask unless I tune the underlying infrastructure to answer it. This chicken-and-egg problem means that the process of inquiry is necessarily constrained by the engineering options that are available.

This is not new in scientific practice. Notoriously, the field of economics in the 20th century was shaped by what was analytically tractable as formal, mathematical results. The nuance of contemporary modeling of complex systems is due largely to the fact that we now have computers to do this work for us. That means we can still have the intersubjectively verified rigor that comes with mathematization without trying to fit square pegs into round holes. (Side note: something mathematicians acknowledge that others tend to miss is that mathematics is based on dialectic proof and intersubjective agreement. This makes it much closer epistemologically to something like history as a discipline than it is to technical fields dedicated to prediction and control, like chemistry or structural engineering. Computer science is in many ways an extension of mathematics. Obviously, these formalizations are then applied to great effect. Their power comes from their deep intersubjective validity–in other words, their truth. Disciplines that have dispensed with intersubjective validity as a grounds for truth claims in favor of a more nebulous sense of diverse truths in a manifold of interpretation have difficulty understanding this and so are likely to see the institutional gains of computer scientists to be a result of political manipulation, as opposed to something more basic: mastery of nature, or more provacatively, use of force. This disciplinary disfunction is one reason why these groups see their influence erode.)

For example, I have determined that in order to implement a certain query on the data efficiently, it would be best if another query were constant time. One way to do this is to use a database with an index.

However, setting up a database is something that requires extra work on the part of the programmer and so makes it harder to reproduce results. So far I have been keeping my processed email data “in memory” after it is pulled from files on the file system. This means that I have access to the data within the programming environment I’m most comfortable with, without depending on an external or parallel process. Fewer moving parts means that it is simpler to do my work.

So there is a tradeoff between the computational time of the software as it executes and the time and attention is takes me (and others that want to reproduce my results) to set up the environment in which the software runs. Since I am running this as an open source project and hope others will build on my work, I have every reason to be lazy, in a certain sense. Every inconvenience I suffer is one that will be suffered by everyone that follows me. There is a Kantian categorical imperative to keep things as simple as possible for people, to take any complex procedure and replace it with a script, so that others can do original creative thinking, solve the next problem. This is the imperative that those of us embedded in this culture have internalized. (G. Coleman notes that there are many cultures of hacking; I don’t know how prevalent these norms are, to be honest; I’m speaking from my experience) It is what makes this social process of developing our software infrastructure a social one with a modernist sense of progress. We are part of something that is being built out.

There are also social and political considerations. I am building this project intentionally in a way that is embedded within the Scientific Python ecosystem, as they are also my object of study. Certain projects are trendy right now, and for good reason. At the Python Worker’s Party at Berkeley last Friday, I saw a great presentation of Blaze. Blaze is a project that allows programmers experienced with older idioms of scientific Python programming to transfer their skills to systems that can handle more data, like Spark. This is exciting for the Python community. In such a fast moving field with multiple interoperating ecosystems, there is always the anxiety that ones skills are no longer the best skills to have. Has your expertise been made obsolete? So there is a huge demand for tools that adapt one way of thinking to a new system. As more data has become available, people have engineered new sophisticated processing backends. Often these are not done in Python, which has a reputation for being very usable and accessible but slow to run in operation. Getting the usable programming interface to interoperate with the carefully engineered data backends is hard work, work that Matt Rocklin is doing while being paid by Continuum Analytics. That is sweet.

I’m eager to try out Blaze. But as I think through the questions I am trying to ask about open source projects, I’m realizing that they don’t fit easily into the kind of data processing that Blaze currently supports. Perhaps this is dense on my part. If I knew better what I was asking, I could maybe figure out how to make it fit. But probably, what I’m looking at is data that is not “big”, that does not need the kind of power that these new tools provide. Currently my data fits on my laptop. It even fits in memory! Shouldn’t I build something that works well for what I need it for, and not worry about scaling at this point?

But I’m also trying to think long-term. What happens if an when it does scale up? What if I want to analyze ALL the mailing list data? Is that “big” data?

“Premature optimization is the root of all evil.” – Donald Knuth

the research lately

I’ve been working hard.

I wrote a lot, consolidating a lot of thinking about networked public design, digital activism, and Habermas. A lot of the thinking was inspired by Xiao Qiang’s course over a year ago, then a conversation with Nathan Mathias and Brian Keegan on Twitter, then building @TheTweetserve for Theorizing the Web. Interesting how these things acrete.

Through this, I think I’ve gotten a deeper understanding of Habermas’ system/lifeworld distinction than I’ve ever had before. Where I’m still weak on this is on his understanding of the role of law. There’s an angle in there about technology as regulation (a la Lessig) that ties things back to the recursive public. But of course Habermas was envisioning the normal kind of law–the potentially democratic law. Since the I School engages more with policy than it does with technicality, it would be good to have sharper thinking about this besides vague notions of the injustice or not of “the system”–how much of this rhetoric is owed to Habermas or the people he’s drawing on?

My next big writing project is going to be about Piketty and intellectual property, I hope. This is another argument that I’ve been working out for a long time–as an undergrad working on microeconomics of intellectual property, on the job at OpenGeo reading Lukacs for some reason, in grad school coursework. I tried to write something about this shortly after coming back to school but it went nowhere, partly because I was using anachronistic concepts and partly because the term “hacker” got weird political treatment due to some anti-startup yellow journalism.

The name of the imagined essay is “Free Capital.” It will try to trace the economic implications of free software and other open access technical designs, especially their impact on the relationship between capital and labor. It’s sort of an extension of this. I feel like there is more substance there to dig out, especially around liquidity and vendor- and employer- lock in. I’m imagining engaging some of the VC strategy press–I’ve been following the thinking of Kanyi Maqbela for a long time and always learning from it.

What I need to hone in on in terms of economic modeling is under what conditions it’s in labor’s interest to work to produce open source IP or ‘free capital’, and under what conditions is it in capital’s interest to invest in free capital, and what the macroeconomic implications of this are. It’s clear that capital will invest in free capital in order to unseat a monopoly–take Android for instance, or Firefox–but that this is (a) unstable and (b) difficult to take into account in measures of economic growth, since the gains in this case are to be had in the efficiency of the industrial organization rather than on the the value of the innovation itself. Meanwhile, Matt Asay has been saying for years that the returns on open source investment are not high enough to attract serious investment, and industry experience appears to bear that out.

Meanwhile, Picketty argues that the main force for convergence in income is technology and skills diffusion. But these are exogenous to his model. Meanwhile, here in the Bay Area the gold rush rages on and at least word on the grapevine is that VC money is finding a harder and harder time finding high-return investments, and are sinking it into lamer and lamer teams of recent Stamford undergrads.

My weakness in these arguments is that I don’t have data and don’t even know what predictions I’m making. It’s dangerously theoretical.

Meanwhile, my actual dissertation work progresses…slowly. I managed to get a lot done to get my preliminary results with BigBang ready for SciPy 2014. Since then I’ve switched it over to favor an Anaconda build and use I Python Notebooks internally–all good architectural changes but it’s yak shaving. Now I’m hitting performance issues and need to make some serious considerations about databases and data structures.

And then there’s the social work around it. They are good instincts–that I should be working on accessibility, polishing my communication, trying to encourage collaborator’s interest. I know how to start an open source project and it requires that. But then–what about the research? What about the whole point of the thing? Talking with Dave Kush today, he pointed me towards research on computational discourse analysis, which is where I think this needs to go. The material felt way over my head, a reminder that I’ve been barking up so many trees that are not where I think the real problem to work on is. Mainly because I’ve been caught up in the politics of things. It’s bewildering how enriching but distracting the academic context is–how many barriers there are to sitting and doing your best work. Petty disciplinary disputes, for example.

Theorizing the Web and SciPy conferences compared

I’ve just been through two days of tutorials at SciPy 2014–that stands for Scientific Python (the programming language). The last conference I went to was Theorizing the Web 2014. I wonder if I’m the first person to ever go to both conferences. Since I see my purpose in grad school as being a bridge node, I think it’s worthwhile to write something comparing the two.

Theorizing the Web was held in a “gorgeous warehouse space” in Williamsburg, the neighborhood of Brooklyn, New York that was full of hipsters ten years ago and now is full of baby carriages but still has gorgeous warehouse spaces and loft apartments. The warehouse spaces are actually gallery spaces that only look like warehouses from the outside. On the inside of the one where TtW was held, whole rooms with rounded interior corners were painted white, perhaps for a photo shoot. To call it a “warehouse” is to appeal to the blue color and industrial origins that Brooklyn gentrifiers appeal to in order to distinguish themselves from the elites in Manhattan. During my visit to New York for the conference, I crashed on a friend’s air mattress in the Brooklyn neighborhood I had been gentrifying just a few years earlier. The speakers included empirical scientific researchers, but these were not the focus of the event. Rather, the emphasis was on theorizing in a way that is accessible to the public. The most anticipated speaker was a porn actress. Others were artists or writers of one sort or another. One was a sex worker who then wrote a book. Others were professors of sociology and communications. Another was a Buzzfeed editor.

SciPy is taking place in the AT&T Education and Conference Center in Austin, Texas, near the UT Austin campus. I’m writing from the adjoining hotel. The conference rooms we are using are in the basement; they seat many in comfortable mesh rolling chairs on tiers so everybody can see the dual projector screens. The attendees are primarily scientists who do computationally intensive work. One is a former marine biologist who now does bioinformatics mainly. Another team does robotics. Another does image processing on electron microscope of chromosomes. They are not trying to be accessible to the public. What they are trying to teach is hard enough to get across to others with similar expertise. It is a small community trying to enlarge itself by teaching others its skills.

At Theorizing the Web, the rare technologist spoke up to talk about the dangers of drones. In the same panel, it was pointed out how the people designing medical supply drones for use in foreign conflict zones were considering coloring them white, not black, to make them less intimidating. The implication was that drone designers are racist.

It’s true that the vast majority of attendees of the conference are white and male. To some extent, this is generational. Both tutorials I attended today–including the one one on software for modeling multi-body dynamics, useful for designing things like walking robots–were interracial and taught by guys around my age. The audience has some older folks. These are not necessarily academics, but may be industry types or engineers whose firms are paying them to attend to train on cutting edge technology.

The afterparty first night of Theorizing the Web was in a dive bar in Williamsburg. Brooklyn’s Williamsburg has dive bars the same way Virginia’s Williamsburg has a colonial village–they are a cherished part of its cultural heritage. But the venue was alienating for some. One woman from abroad confided to me that they were intimidated by how cool the bar felt. It was my duty as an American and a former New Yorker to explain that Williamsburg stopped being cool a long time ago.

I’m an introvert and am initially uneasy in basically any social setting. Tonight’s SciPy afterparty was in the downtown office of Enthought, in the Bank of America building. Enthought’s digs are on the 21st floor, with spatious personal offices and lots of whiteboards which display serious use. As an open source product/consulting/training company, it appears to be doing quite well. I imagine really cool people would find it rather banal.

I don’t think it’s overstating things to say that Theorizing the Web serves mainly those skeptical of the scientific project. Knowledge is conceived of as a threat to the known. One panelist at TtW described the problem of “explainer” sites–web sites whose purpose is to explain things that are going on to people who don’t understand them–when they try to translate cultural phenomena that they don’t understand. It was argued that even in cases where these cultural events are public, to capture that content and provide a interpretation or narration around it can be exploitative. Later, Kate Crawford, a very distinguished scholar on civic media, spoke to a rapt audience about the “conjoint anxieties” of Big Data. The anxieties of the watched are matched by the anxieties of the watchmen–like the NSA and, more implicitly, Facebook–who must always seek out more data in order to know things. The implication is that their political or economic agenda is due to a psychological complex–damning if true. In a brilliant rhetorical move that I didn’t quite follow, she tied this in to normcore, which I’m pretty sure is an Internet meme about a fake “fashion” trend in New York. Young people in New York go gaga for irony like this. For some reason earlier this year hipsters ironically wearing unstylish clothing became notable again.

I once met somebody from L.A. who told me their opinion of Brooklyn was that all nerds gathered in one place and thought they could decide what cool was just by saying so. At the time I had only recently moved to Berkeley and was still adjusting. Now I realize how parochial that zeitgeist is, however much I may still identify with it some.

Back in Austin, I have interesting conversations with folks at the SciPy party. One conversation is with two social scientists (demographic observation: one man, one woman) from New York that work on statistical analysis of violent crime in service to the city. They talk about the difficulty of remaining detached from their research subjects, who are eager to assist with the research somehow, though this would violate the statistical rigor of their study. Since they are doing policy research, objectivity is important. They are painfully aware of the limitations of their methods and the implications this has on those their work serves.

Later, I’m sitting alone when I’m joined by an electrical engineer turned programmer. He’s from Tennessee. We talk shop for a bit but the conversation quickly turns philosophical–about the experience of doing certain kinds of science, the role of rationality in human ethics, whether religion is an evolved human impulse and whether that mattes. We are joined by a bioinformatics researcher from Paris. She tells us later that she has an applied math/machine learning background.

The problem in her field, she explains, is that for rare diseases it is very hard to find genetic causes because there isn’t enough data to do significant inference. Genomic data is very highly dimensional–thousands of genes–and for some diseases there may be less than fifty cases to study. Machine learning researchers are doing their best to figure out ways for researchers to incorporate “prior knowledge”–theoretical understanding from beyond the data available–to improve their conclusions.

Over meals the past couple days I’ve been checking Twitter, where a lot of the intellectuals who organize Theorizing the Web or are otherwise prominent in that community are active. One conversation extended conversation is about the relative failure of the open source movement to produce compelling consumer products. My theory is that this has to do with business models and the difficulty of coming up with upfront capital investment. But emotionally my response to that question is that it is misplaced: consumer products are trivial. Who cares?

Today, folks on Twitter are getting excited about using Adorno’s concept of the culture industry to critique Facebook’s emotional contagion experiment and other media manipulation. I find this both encouraging–it’s about time the Theorizing the Web community learned to embrace Frankfurt School thought–and baffling, because I believe they are misreading Adorno. The culture industry is that sector of the economy that produces cultural products, like Hollywood and television productions companies. On the Internet, the culture industry is Buzzfeed, the Atlantic, and to a lesser extent (though this is surely masked by it’s own ideology) The New Inquiry. My honest opinion for a long time has been that the brand of “anticapitalist” criticality indulged in on-line is a politically impotent form of entertainment equivalent to the soap opera. A concept more appropriate for understanding Facebook’s role in controlling access to news and the formation of culture is Habermas’ idea of steering media.

He gets into this in Theory of Communicative Action, vol. 2, which is underrated in America probably due to its heaviness.

Preparing for SciPy 2014

I’ve been instructed to focus my attention on mid-level concepts rather than grand theory as I begin my empirical work.is

This is difficult for me, as I tend to oscillate between thinking very big and thinking very narrowly. This is an occupational hazard of a developer. Technical minutiae accumulate into something durable and powerful. To sustain ones motivation one has to be able to envision ones tiny tasks (correcting the spelling of some word in a program) stepping towards a larger project.

I’m working in my comfort zone. I’ve got my software project open on GitHub and I’m preparing to present my preliminary results at SciPy 2014 next week. A colleague and mentor I met with today told me it’s not a conference for people marking up career points. It’s a conference for people to meet each other, get an update on how their community is doing as a whole, and to learn new skills from each other.

It’s been a few years since I’ve been to a developer conference. In my past career I went to FOSS4G, the open source geospatial conference, a number of times. In 2008, the conference was in South Africa. I didn’t know anybody, so I blogged about it, and got chastised for being too divisive. I wasn’t being sensitive to the delicate balance between the open source developer geospatial community and their greatest proprietary coopetitor, ESRI. I was being an ideologue at a time when the open source model was in that industry just in its inflection point and becoming mainstream. Obviously I didn’t understand the subtlety of the relationships, business and personal, threaded through the conference.

Later I attended FOSS4G in 2010 to pitch the project my team had recently launched, GeoNode. It was a very exciting time for me personally. I was very personally invested in the project, and I was so proud of my team and myself for pulling through on the beta release. In retrospect, building a system for serving spatial data modeled on a content management system seems like a no-brainer. Today there are plenty of data management startups and services out there, some industrial, some academic. But at the time we were ahead of the curve, thanks largely to the vision of Chris Holmes, who at the time the wunderkind visionary president of OpenGeo.

Cholmes always envisioned OpenGeo turning into an anti-capitalist organization, a hacker coop with as much transparency as it could handle. If only it could get its business model right. It was incubating in a pre-crash bubble that thinned out over time. I was very into the politics of the organization when I joined it, but over time I became more cynical and embraced the economic logic I was being taught by the mature entrepreneurs who had been attracted to OpenGeo’s promise and standing in the geospatial world. While trying to wrap my head around managing developers, clients, and the budget around GeoNode, I began to see why businesses are the way they are, and how open source plays out in the industrial organization of the tech industry as a whole.

GeoNode, the project, remains a success. There is glory to that, though in retrospect I can claim little of it. I made many big mistakes and the success of the project has always been due to the very intelligent team working on it, as well as its institutional positioning.

I left OpenGeo because I wanted to be a scientist. I had spent four years there, and had found my way onto a project where we were building data plumbing for disaster reduction scientists and the military. OpenGeo had become a victim of its own success and outgrown its non-profit incubator, buckling under the weight of the demand for its services. I had deferred enrollment at Berkeley for a year to see GeoNode through to a place where it couldn’t get canned. My last major act was to raise funding for a v1.1 release that fixed the show-stopping bugs in the v1.0 version.

OpenGeo is now Boundless, a for-profit company. It’s better that way. It’s still doing revolutionary work.

I’ve been under the radar in the open source world for the three years I’ve been in grad school. But as I begin this dissertation work, I feel myself coming back to it. My research questions, in one framing, are about software ecosystem sustainability and management. I’m drawing from my experience participating in and growing open source communities and am trying to operationalize my intuitions from that work. At Berkeley I’ve discovered the scientific Python community, which I feel at home with since I learned about how to do open source from the inimitable Whit Morris, a Pythonista of the Plone cohort, among others.

After immersing myself in academia, I’m excited to get back into the open source development world. Some of the most intelligent and genuine people I’ve ever met work in that space. Like the sciences, it is a community of very smart and creative people with the privilege to pursue opportunity but with goals that go beyond narrow commercial interests. But it’s also in many ways a more richly collaborative and constructive community than the academic world. It’s not a prestige economy, where people are rewarded with scarce attention and even scarcer titles. It’s a constructive economy, where there is always room to contribute usefully, and to be recognized even in a small way for that contribution.

I’m going to introduce my research on the SciPy communities themselves. In the wake of the backlash against Facebook’s “manipulative” data science research, I’m relieved to be studying a community that has from the beginning wanted to be open about its processes. My hope is that my data scientific work will be a contribution to, not an exploitation of, the community I’m studying. It’s an exciting opportunity that I’ve been preparing for for a long time.

metaphorical problems with logical solutions

There are polarizing discourses on the Internet about the following four dichotomies:

  • Public vs. Private (information)
  • (Social) Inclusivity vs. Exclusivity.
  • Open vs. Closed (systems, properties, communities).

Each of these pairings enlists certain metaphors and intuitions. Rarely are they precisely defined.

Due to their intuitive pull, it’s easy to draw certain naive associations. I certainly do. But how do they work together logically?

To what extent can we fill in other octants of this cube? Or is that way of modeling it too simplistic as well?

If privacy is about having contextual control over information flowing out of oneself, then that means that somebody must have the option of closing off some access to their information. To close off access is necessarily to exclude.


But it has been argued that open sociotechnical systems exclude as well by being inhospitable to those with greater need for privacy.


These conditionals limit the kinds of communities that can exist.


Social inclusivity in sociotechnical systems is impossible. There is no such thing as a sociotechnical system that works for everybody.

There are only three kinds of systems: open systems, private systems, or systems that are neither open nor private. We can call the latter leaky systems.

These binary logical relations capture only the limiting properties of these systems. If there has ever been an open system, it is the Internet; but everyone knows that even the Internet isn’t truly open because of access issues.

The difference between a private system and a leaky system is participant’s ability to control how their data escapes the system.

But in this case, systems that we call ‘open’ are often private systems, since participants choose whether or not to put information into the open.

So is the only question whether and when information is disclosed vs. leaked?


Get every new post delivered to your Inbox.

Join 924 other followers