Digifesto

frustrations with machine ethics

It’s perhaps because of the contemporary two cultures problem of tech and the humanities that machine ethics is in such a frustrating state.

Today I read danah boyd’s piece in The Message about technology as an arbiter of fairness. It’s more baffling conflation of data science with neoliberalism. This time, the assertion was that the ideology of the tech industry is neoliberalism hence their idea of ‘fairness’ is individualist and against social fabric. It’s not clear what backs up these kinds of assertions. They are more or less refuted by the fact that industrial data science is obsessed with our network of ties for marketing reasons. If anybody understands the failure of the myth of the atomistic individual, it’s “tech folks,” a category boyd uses to capture, I guess, everyone from marketing people at Google to venture capitalists to startup engineers to IBM researchers. You know, the homogenous category that is “tech folks.”

This kind of criticism makes the mistake of thinking that a historic past is the right way to understand a rapidly changing present that is often more technically sophisticated than the critics understand. But critical academics have fallen into the trap of critiquing neoliberalism over and over again. One problem is that tech folks don’t spend a ton of time articulating their ideology in ways that are convenient for pop culture critique. Often their business models require rather sophisticated understandings of the market, etc. that don’t fit readily into that kind of mold.

What’s needed is substantive progress in computational ethics. Ok, so algorithms are ethically and politically important. What politics would you like to see enacted, and how do you go about implementing that? How do you do it in a way that attracts new users and is competitively funded so that it can keep up with the changing technology with which we use to access the web? These are the real questions. There is so little effort spent trying to answer them. Instead there’s just an endless series of op-ed bemoaning the way things continue to be bad because it’s easier than having agency about making things better.

notes towards benign superintelligence j/k

Nick Bostrom will give a book talk on campus soon. My departmental seminar on “Algorithms as Computation and Culture” has opened with a paper on the ethics of algorithms and a paper on accumulated practical wisdom regarding machine learning. Of course, these are related subjects.

Jenna Burrell recently trolled me in order to get me to give up my own opinions on the matter, which are rooted in a philosophical functionalism. I’ve learned just now that these opinions may depend on obsolete philosophy of mind. I’m not sure. R. Scott Bakker’s blog post against pragmatic functionalism makes me wonder: what do I believe again? I’ve been resting on a position established when I was deeper into this stuff seven years ago. A lot has happened since then.

I’m turning into a historicist perhaps due to lack of imagination or simply because older works are more accessible. Cybernetic theories of control–or, electrical engineering theories of control–are as relevant, it seems, to contemporary debates as machine learning, which to the extent it depends on stochastic gradient descent is just another version of cybernetic control anyway, right?

Ashwin Parameswaran’s blog post about Benigner’s Control Revolution illustrates this point well. To a first approximation, we are simply undergoing the continuation of prophecies of the 20th century, only more thoroughly. Over and over, and over, and over, and over, like a monkey with a miniature cymbal.

CHIMP1

One property of a persistent super-intelligent infrastructure of control would be our inability to comprehend it. Our cognitive models, constructed over the course of a single lifetime with constraints on memory both in time and space, limited to a particular hypothesis space, could simply be outgunned by the complexity of the sociotechnical system in which it is embedded. I tried to get at this problem with work on computational asymmetry but didn’t find the right audience. I just learned there’s been work on this in finance which makes sense, as it’s where it’s most directly relevant today.

more on algorithms, judgment, polarization

I’m still pondering the most recent Tufekci piece about algorithms and human judgment on Twitter. It prompted some grumbling among data scientists. Sweeping statements about ‘algorithms’ do that, since to a computer scientist ‘algorithm’ is about as general a term as ‘math’.

In later conversation, Tufekci clarified that when she was calling out the potential problems of algorithmic filtering of the Twitter newsfeed, she was speaking to the problems of a newsfeed curated algorithmically for the sake of maximizing ‘engagement’. Or ads. Or, it is apparent on a re-reading of the piece, new members. She thinks an anti-homophily algorithm would maybe be a good idea, but that this is so unlikely according to the commercial logic of Twitter to be a marginal point. And, meanwhile, she defends ‘human prioritizatin’ over algorithmic curation, despite the fact that homophily (not to mention preferential attachment) are arguable negative consequences of social system driven by human judgment.

I think inquiry into this question is important, but bound to be confusing to those who aren’t familiar in a deep way with network science, machine learning, and related fields. It’s also, I believe, helpful to have a background in cognitive science, because that’s a field which maintains that human judgment and computational systems are doing fundamentally commensurable kinds of work. When we think in sophisticated way about crowdsourced labor, we use this sort of thinking. We acknowledge, for example, that human brains are better at the computational task of image recognition, so then we employ Turkers to look at and label images. But those human judgments are then inputs to statistical proceses that verify and check those judgments against each other. Later, those determinations that result from a combination of human judgment and algorithmic processing could be used in a search engine–which returns answers to questions based on human input. Search engines, then, are also a way of combining human and purely algorithmic judgment.

What it comes down to is that virtually all of our interactions with the internet are built around algorithmic affordances. And these systems can be understood systematically if we reject the quantitative/qualitative divide at the ontological level. Reductive physicalism entails this rejection, but–and this is not to be underestated–pisses or alienates people who do qualitative or humanities research.

This is old news. C.P. Snow’s The Two Cultures. The Science Wars. We’ve been through this before. Ironically, the polarization is algorithmically visible in the contemporary discussion about algorithms.*

The Two Cultures on Twitter?

It’s I guess not surprising that STS and cultural studies academics are still around and in opposition to the hard scientists. What’s maybe new is how much computer science now affects the public, and how the popular press appears to have allied itself with the STS and cultural studies view. I guess this must be because cultural anthropologists and media studies people are more likely to become journalists and writers, whereas harder science is pretty abstruse.

There’s an interesting conflation now from the soft side of the culture wars of science with power/privilege/capitalism that plays out again and again. I bump into it in the university context. I read about it all the time. Tufekci’s pessimism that the only algorihtmic filtering Twitter would adopt would be one that essentially obeys the logic “of Wall Street” is, well, sad. It’s sad that an unfortunate pairing that is analytically contingent is historically determined to be so.

But there is also something deeply wrong about this view. Of course there are humanitarian scientists. Of course there is a nuanced center to the science wars “debate”. It’s just that the tedious framing of the science wars has been so pervasive and compelling, like a commercial jingle, that it’s hard to feel like there’s an audience for anything more subtle. How would you even talk about it?

* I need to confess: I think there was some sloppiness in that Medium piece. If I had had more time, I would have done something to check which conversations were actually about the Tufekci article, and which were just about whatever. I feel I may have misrepresented this in the post. For the sake of accessibility or to make the point, I guess. Also, I’m retrospectively skittish about exactly how distinct a cluster the data scientists were, and whether its insularity might have been an artifact of the data collection method. I’ve been building out poll.emic in fits mainly as a hobby. I built it originally because I wanted to at last understand Weird Twitter’s internal structure. The results were interesting but I never got to writing them up. Now I’m afraid that the culture has changed so much that I wouldn’t recognize it any more. But I digress. Is it even notable that social scientists from different disciplines would have very different social circles around them? Is the generalization too much? And are there enough nodes in this graph to make it a significant thing to say about anything, really? There could be thousands of academic tiffs I haven’t heard about that are just as important but which defy my expectations and assumptions. Or is the fact that Medium appears to have endorsed a particular small set of public intellectuals significant? How many Medium readers are there? Not as many as there are Twitter users, by several orders of magnitude, I expect. Who matters? Do academics matter? Why am I even studying these people as opposed to people who do more real things? What about all the presumabely sane and happy people who are not pathologically on the Internet? Etc.

a mathematical model of collective creativity

I love my Mom. One reason I love her is that she is so good at asking questions.

I thought I was on vacation today, but then my Mom started to ask me questions about my dissertation. What is my dissertation about? Why is it interesting?

I tried to explain: I’m interested in studying how these people working on scientific software work together. That could be useful in the design of new research infrastructure.

M: Ok, so like…GitHub? Is that something people use to share their research? How do they find each other using that?

S: Well, people can follow each others repositories to get notifications. Or they can meet each other at conferences and learn what people are working on. Sometimes people use social media to talk about what they are doing.

M: That sounds like a lot of different ways of learning about things. Could your research be about how to get them all to talk about it in one place?

S: Yes, maybe. In some ways GitHub is already serving as that central repository these days. One application of my research could be about how to design, say, an extension to GitHub that connects people. There’s a lot of research on ‘link formation’ in the social media context–well I’m your friend, and you have this other friend, so maybe we should be friends. Maybe the story is different for collaborators. I have certain interests, and somebody else does too. When are our interests aligned, so that we’d really want to work together on the same thing? And how do we resolve disputes when our interests diverge?

M: That sounds like what open source is all about.

S: Yeah!

M: Could you build something like that that wasn’t just for software? Say I’m a researcher and I’m interesting in studying children’s education, and there’s another researcher who is interested in studying children’s education. Could you build something like that in your…your D-Lab?

S: We’ve actually talked about building an OKCupid for academic research! The trick there would be bringing together researchers interested in different things, but with different skills. Maybe somebody is really good at analyzing data, and somebody else is really good at collecting data. But it’s a lot of work to build something nice. Not as easy as “build it and they will come.”

M: But if it was something like what people are used to using, like OKCupid, then…

S: It’s true that would be a really interesting project. But it’s not exactly my research interest. I’m trying really hard to be a scientist. That means working on problems that aren’t immediately appreciable by a lot of people. There are a lot of applications of what I’m trying to do, but I won’t really know what they are until I get the answers to what I’m looking for.

M: What are you looking for?

S: I guess, well…I’m looking for a mathematical model of creativity.

M: What? Wow! And you think you’re going to find that in your data?

S: I’m going to try. But I’m afraid to say that. People are going to say, “Why aren’t you studying artists?”

M: Well, the people you are studying are doing creative work. They’re developing software, they’re scientists…

S: Yes.

M: But they aren’t like Beethoven writing a symphony, it’s like…

S: …a craft.

M: Yes, a craft. But also, it’s a lot of people working together. It’s collective creativity.

S: Yes, that’s right.

M: You really should write that down. A mathematical model of collective creativity! That gives me chills. I really hope you’ll write that down.

Thanks, Mom.

a response to “Big Data and the ‘Physics’ of Social Harmony” by @doctaj; also Notes towards ‘Criticality as ideology';

I’ve been thinking over Robin James’ “Big Data & the ‘Physics’ of Social Harmony“, an essay in three sections. The first discusses Singapore’s use of data science to detect terrorists and public health threats for the sake of “social harmony,” as reported by Harris in Foreign Policy. The second ties together Plato, Pentland’s “social physics”, and neoliberalism. The last discusses the limits to individual liberty proposed by J.S. Mill. The author admits it’s “all over the place.” I get the sense that it is a draft towards a greater argument. It is very thought-provoking and informative.

I take issue with a number of points in the essay. Underlying my disagreement is what I think is a political difference about the framing of “data science” and its impact on society. Since I am a data science practitioner who takes my work seriously, I would like this framing to be nuanced, recognizing both the harm and help that data science can do. I would like the debate about data science to be more concrete and pragmatic so that practitioners can use this discussion as a guide to do the right thing. I believe this will require discussion of data science in society to be informed by a technical understanding of what data science is up to. However, I think it’s also very important that these discussions rigorously take up the normative questions surrounding data sciences’ use. It’s with this agenda that I’m interested in James’ piece.

James is a professor of Philosophy and Women’s/Gender Studies and the essay bears the hallmarks of these disciplines. Situated in a Western and primarily anglophone intellectual tradition, it draws on Plato and Mill for its understanding of social harmony and liberalism. At the same time, it has the political orientation common to Gender Studies, alluding to the gendered division of economic labor, at times adopting Marxist terminology, and holding suspicion for authoritarian power. Plato is read as being the intellectual root of a “particular neoliberal kind of social harmony” that is “the ideal that informs data science.” James contrasts this ideal with the ideal of individual liberty, as espoused and then limited by Mill.

Where I take issue with James is that I think this line of argument is biased by its disciplinary formation. (Since this is more or less a truism for all academics, I suppose this is less a rebuttal than a critique.) Where I believe this is most visible is in her casting of Singapore’s ideal of social harmony as an upgrade of Plato, via the ideology of neoliberalism. She does not not consider in the essay that Singapore’s ideal of social harmony might be rooted in Eastern philosophy, not Western philosophy. Though I have no special access or insight into the political philosophy of Singapore, this seems to me to be an important omission given that Singapore is ethnically 74.2% Chinese and with Buddhist plurality.

Social harmony is a central concept in Eastern, especially Chinese, philosophy with deep roots in Confucianism and Daoism. A great introduction for those with background in Western philosophy who are interested in the philosophical contributions of Confucius is Fingarette’s Confucius: The Secular as Sacred. Fingarette discusses how Confucian thought is a reaction to the social upheaval and war of Anciant China’s Warring States Period, roughly 475 – 221 BC. Out of these troubling social conditions, Confucian thought attempts to establish conditions for peace. These include ritualized forms of social interaction at whose center is a benevolent Emperor.

There are many parallels with Plato’s political philosophy, but Fingarette makes a point of highlighting where Confucianism is different. In particular, the role of social ritual and ceremony as the basis of society is at odds with Western individualism. Political power is not a matter of contest of wills but the proper enactment of communal rites. It is like a dance. Frequently, the word “harmony” is used in the translation of Confucian texts to refer to the ideal of this functional, peaceful ceremonial society and, especially, its relationship with nature.

A thorough analysis of use of data science for social control in light of Eastern philosophy would be an important and interesting work. I certainly haven’t done it. My point is simply that when we consider the use of data science for social control as a global phenomenon, it is dubious to see it narrowly in light of Western intellectual history and ideology. That includes rooting it in Plato, contrasting it with Mill, and characterizing it primarily as an expression of white neoliberalism. Expansive use of these Western tropes is a projection, a fallacy of “I think this way, therefore the world must.” This I submit is an occupational hazard of anyone who sees their work primarily as an analysis of critique of ideology.

In a lecture in 1965 printed in Knowledge and Human Interests, Habermas states:

The concept of knowledge-constitutive human interests already conjoins the two elements whose relation still has to be explained: knowledge and interest. From everyday experience we know that ideas serve often enough to furnish our actions with justifying motives in place of the real ones. What is called rationalization at this level is called ideology at the level of collective action. In both cases the manifest content of statements is falsified by consciousness’ unreflected tie to interests, despite its illusion of autonomy. The discipline of trained thought thus correctly aims at excluding such interests. In all the sciences routines have been developed that guard against the subjectivity of opinion, and a new discipline, the sociology of knowledge, has emerged to counter the uncontrolled influence of interests on a deeper level, which derive less from the individual than from the objective situation of social groups.

Habermas goes on to reflect on the interests driving scientific inquiry–“scientific” in the broadest sense of having to do with knowledge. He delineates:

  • Technical inquiry motivated by the drive for manipulation and control, or power
  • Historical-hermeneutic inquiry motivated by the drive to guide collective action
  • Critical, reflexive inquiry into how the objective situation of social groups controls ideology, motivated by the drive to be free or liberated

This was written in 1965. Habermas was positioning himself as a critical thinker; however, unlike some of the earlier Frankfurt School thinkers he drew on, he did maintained that technical power was an objective human interest. (see Bohman and Rehg) In the United States especially, criticality as a mode of inquiry took aim at the ideologies that aimed at white, bourgeois, and male power. Contemporary academic critique has since solidified as an academic discipline and wields political power. In particular, is frequently enlisted as an expression of the interests of marginalized groups. In so doing, academic criticality has (in my view regrettably) becomes mere ideology. No longer interested in being scientifically disinterested, it has become a tool of rationalization. It’s project is the articulation of changing historical conditions in certain institutionally recognized tropes. One of these tropes is the critique of capitalism, modernism, neoliberalism, etc. and their white male bourgeois heritage. Another is the feminist emphasis on domesticity as a dismissed form on economic production. This trope features in James’ analysis of Singapore’s ideal of social harmony:

Harris emphasizes that Singaporeans generally think that finely-tuned social harmony is the one thing that keeps the tiny city-state from tumbling into chaos. [1] In a context where resources are extremely scarce–there’s very little land, and little to no domestic water, food, or energy sources, harmony is crucial. It’s what makes society sufficiently productive so that it can generate enough commercial and tax revenue to buy and import the things it can’t cultivate domestically (and by domestically, I really mean domestically, as in, by ‘housework’ or the un/low-waged labor traditionally done by women and slaves/servants.) Harmony is what makes commercial processes efficient enough to make up for what’s lost when you don’t have a ‘domestic’ supply chain. (emphasis mine)

To me, this parenthetical is quite odd. There are other uses of the word “domestic” that do not specifically carry the connotation of women and slave/servants. For example, the economic idea of gross domestic product just means “an aggregate measure of production equal to the sum of the gross values added of all resident institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs).” Included in that production is work done by men and high-wage laborers. To suggest that natural resources are primarily exploited by “domestic” labor in the ‘housework’ sense is bizarre given, say, agribusiness, industrial mining, etc.

There is perhaps an interesting etymological relationship here; does our use of ‘domestic’ in ‘domestic product’ have its roots in household production? I wouldn’t know. Does that same etymological root apply in Singapore? Was agriculture in East Asia traditionally the province of household servants in China and Southeast Asia (as opposed to independent farmers and their sons?)? Regardless, domestic economic production agricultural production is not housework now. So it’s mysterious that this detail should play a role in explaining Singapore’s emphasis on social harmony today.

So I think it’s safe to say that this parenthetical remark by James is due to her disciplinary orientation and academic focus. Perhaps it is a contortion to satisfy the audience of Cyborgology, which has a critical left-leaning politics. A Harris’s original article does not appear to support this interpretation. Rather, it only uses the word ‘harmony’ twice, and maintains a cultural sensitivity that James’ piece lacks, noting that Singapore’s use of data science may be motivated by a cultural fear of loss or risk.

The colloquial word kiasu, which stems from a vernacular Chinese word that means “fear of losing,” is a shorthand by which natives concisely convey the sense of vulnerability that seems coded into their social DNA (as well as their anxiety about missing out — on the best schools, the best jobs, the best new consumer products). Singaporeans’ boundless ambition is matched only by their extreme aversion to risk.

If we think that Harris is closer to the source here, then we do not need the projections of Western philosophy and neoliberal theory to explain what is really meant by Singapore’s use of data science. Rather, we can look to Singapore’s culture and perhaps its ideological origins in East Asian thinking. Confucius, not Plato.

* * *

If there it is a disciplinary bias to American philosophy departments, it is that they exist to reproduce anglophone philosophy. This is point that James has recently expressed herself…in fact while I have been in the process of writing this response.

Though I don’t share James’ political project, generally speaking I agree that effort spent of the reproduction of disciplinary terminology is not helpful to the philosophical and scientific projects. Terminology should be deployed for pragmatic reasons in service to objective interests like power, understanding, and freedom. On the other hand, language requires consistency to be effective, and education requires language. My own personal conclusion on is that the scientific project can only be sustained now through disciplinary collapse.

When James suggests that old terms like metaphysics and epistemology prevent the de-centering of the “white supremacist/patriarchal/capitalist heart of philosophy”, she perhaps alludes to her recent coinage of “epistemontology” as a combination of epistemology and ontology, as a way of designating what neoliberalism is. She notes that she is trying to understand neoliberalism as an ideology, not as a historical period, and finds useful the definition that “neoliberals think everything in the universe works like a deregulated, competitive, financialized capitalist market.”

However helpful a philosophical understanding of neoliberalism as market epistemontology might be, I wonder whether James sees the tension between her statements about rejecting traditional terminology that reproduces the philosophical discipline and her interest in preserving the idea of “neoliberalism” in a way that can be be taught in an introduction to philosophy class, a point she makes in a blog comment later. It is, perhaps, in the act of teaching that a discipline is reproduced.

The use of neoliberalism as a target of leftist academic critique has been challenged relatively recently. Craig Hickman, in a blog post about Luis Suarez-Villa, writes:

In fact Williams and Srinicek see this already in their first statement in the interview where they remind us that “what is interesting is that the neoliberal hegemony remains relatively impervious to critique from the standpoint of the latter, whilst it appears fundamentally unable to counter a politics which would be able to combat it on the terrain of modernity, technology, creativity, and innovation.” That’s because the ball has moved and the neoliberalist target has shifted in the past few years. The Left is stuck in waging a war it cannot win. What I mean by that is that it is at war with a target (neoliberalism) that no longer exists except in the facades of spectacle and illusion promoted in the vast Industrial-Media-Complex. What is going on in the world is now shifting toward the East and in new visions of technocapitalism of which such initiatives as Smart Cities by both CISCO (see here) and IBM and a conglomerate of other subsidiary firms and networking partners to build new 21st Century infrastructures and architectures to promote creativity, innovation, ultra-modernity, and technocapitalism.

Let’s face it capitalism is once again reinventing itself in a new guise and all the Foundations, Think-Tanks, academic, and media blitz hype artists are slowly pushing toward a different order than the older market economy of neoliberalism. So it’s time the Left begin addressing the new target and its ideological shift rather than attacking the boogeyman of capitalism’s past. Oh, true, the façade of neoliberalism will remain in the EU and U.S.A. and much of the rest of the world for a long while yet, so there is a need to continue our watchdog efforts on that score. But what I’m getting at is that we need to move forward and overtake this new agenda that is slowly creeping into the mix before it suddenly displaces any forms of resistance. So far I’m not sure if this new technocapitalistic ideology has even registered on the major leftist critiques beyond a few individuals like Luis Suarez-Villa. Mark Bergfield has a good critique of Suarez-Villa’s first book on Marx & Philosophy site: here.

In other words, the continuation of capitalist domination is due to its evolution relative to the stagnation of intellectual critiques of it. Or to put it another way, privilege is the capacity to evolve and not merely reproduce. Indeed, the language game of academic criticality is won by those who develop and disseminate new tropes through which to represent the interests of the marginalized. These privileged academics accomplish what Lyotard describes as “legitimation through paralogy.”

* * * * *

If James were working merely within academic criticality, I would be less interested in the work. But her aspirations appear to be higher, in a new political philosophy that can provide normative guidance in a world where data science is a technical reality. She writes:

Mill has already made–in 1859 no less–the argument that rationalizes the sacrifice of individual liberty for social harmony: as long as such harmony is enforced as a matter of opinion rather than a matter of law, then nobody’s violating anybody’s individual rights or liberties. This is, however, a crap argument, one designed to limit the possibly revolutionary effects of actually granting individual liberty as more than a merely formal, procedural thing (emancipating people really, not just politically, to use Marx’s distinction). For example, a careful, critical reading of On Liberty shows that Mill’s argument only works if large groups of people–mainly Asians–don’t get individual liberty in the first place. [2] So, critiquing Mill’s argument may help us show why updated data-science versions of it are crap, too. (And, I don’t think the solution is to shore up individual liberty–cause remember, individual liberty is exclusionary to begin with–but to think of something that’s both better than the old ideas, and more suited to new material/technical realities.)

It’s because of these more universalist ambitions that I think it’s fair to point out the limits of her argument. If a government’s idea of “social harmony” is not in fact white capitalist but premodern Chinese, if “neoliberalism” is no longer the dominant ideology but rather an idea of an ideology reproduced by a stagnating academic discipline, then these ideas will not help us understand what is going on in the contemporary world in which ‘data science’ is allegedly of such importance.

What would be better than this?

There is an empirical reality to the practices of data science. Perhaps it should be studied on its own terms, without disciplinary baggage.

picking a data backend for representing email in #python

I’m at a difficult crossroads with BigBang where I need to pick an appropriate data storage backend for my preprocessed mailing list data.

There are a lot of different aspects to this problem.

The first and most important consideration is speed. If you know anything about computer science, you know that it exists to quickly execute complex tasks that would take too long to do by hand. It’s odd writing that sentence since computational complexity considerations are so fundamental to algorithm design that this can go unspoken in most technical contexts. But since coming to grad school I’ve found myself writing for a more diverse audience, so…

The problem I’m facing is that in doing exploratory data analysis, I do not know all the questions I am going to ask yet. But any particular question will be impractical to ask unless I tune the underlying infrastructure to answer it. This chicken-and-egg problem means that the process of inquiry is necessarily constrained by the engineering options that are available.

This is not new in scientific practice. Notoriously, the field of economics in the 20th century was shaped by what was analytically tractable as formal, mathematical results. The nuance of contemporary modeling of complex systems is due largely to the fact that we now have computers to do this work for us. That means we can still have the intersubjectively verified rigor that comes with mathematization without trying to fit square pegs into round holes. (Side note: something mathematicians acknowledge that others tend to miss is that mathematics is based on dialectic proof and intersubjective agreement. This makes it much closer epistemologically to something like history as a discipline than it is to technical fields dedicated to prediction and control, like chemistry or structural engineering. Computer science is in many ways an extension of mathematics. Obviously, these formalizations are then applied to great effect. Their power comes from their deep intersubjective validity–in other words, their truth. Disciplines that have dispensed with intersubjective validity as a grounds for truth claims in favor of a more nebulous sense of diverse truths in a manifold of interpretation have difficulty understanding this and so are likely to see the institutional gains of computer scientists to be a result of political manipulation, as opposed to something more basic: mastery of nature, or more provacatively, use of force. This disciplinary disfunction is one reason why these groups see their influence erode.)

For example, I have determined that in order to implement a certain query on the data efficiently, it would be best if another query were constant time. One way to do this is to use a database with an index.

However, setting up a database is something that requires extra work on the part of the programmer and so makes it harder to reproduce results. So far I have been keeping my processed email data “in memory” after it is pulled from files on the file system. This means that I have access to the data within the programming environment I’m most comfortable with, without depending on an external or parallel process. Fewer moving parts means that it is simpler to do my work.

So there is a tradeoff between the computational time of the software as it executes and the time and attention is takes me (and others that want to reproduce my results) to set up the environment in which the software runs. Since I am running this as an open source project and hope others will build on my work, I have every reason to be lazy, in a certain sense. Every inconvenience I suffer is one that will be suffered by everyone that follows me. There is a Kantian categorical imperative to keep things as simple as possible for people, to take any complex procedure and replace it with a script, so that others can do original creative thinking, solve the next problem. This is the imperative that those of us embedded in this culture have internalized. (G. Coleman notes that there are many cultures of hacking; I don’t know how prevalent these norms are, to be honest; I’m speaking from my experience) It is what makes this social process of developing our software infrastructure a social one with a modernist sense of progress. We are part of something that is being built out.

There are also social and political considerations. I am building this project intentionally in a way that is embedded within the Scientific Python ecosystem, as they are also my object of study. Certain projects are trendy right now, and for good reason. At the Python Worker’s Party at Berkeley last Friday, I saw a great presentation of Blaze. Blaze is a project that allows programmers experienced with older idioms of scientific Python programming to transfer their skills to systems that can handle more data, like Spark. This is exciting for the Python community. In such a fast moving field with multiple interoperating ecosystems, there is always the anxiety that ones skills are no longer the best skills to have. Has your expertise been made obsolete? So there is a huge demand for tools that adapt one way of thinking to a new system. As more data has become available, people have engineered new sophisticated processing backends. Often these are not done in Python, which has a reputation for being very usable and accessible but slow to run in operation. Getting the usable programming interface to interoperate with the carefully engineered data backends is hard work, work that Matt Rocklin is doing while being paid by Continuum Analytics. That is sweet.

I’m eager to try out Blaze. But as I think through the questions I am trying to ask about open source projects, I’m realizing that they don’t fit easily into the kind of data processing that Blaze currently supports. Perhaps this is dense on my part. If I knew better what I was asking, I could maybe figure out how to make it fit. But probably, what I’m looking at is data that is not “big”, that does not need the kind of power that these new tools provide. Currently my data fits on my laptop. It even fits in memory! Shouldn’t I build something that works well for what I need it for, and not worry about scaling at this point?

But I’m also trying to think long-term. What happens if an when it does scale up? What if I want to analyze ALL the mailing list data? Is that “big” data?

“Premature optimization is the root of all evil.” – Donald Knuth

the research lately

I’ve been working hard.

I wrote a lot, consolidating a lot of thinking about networked public design, digital activism, and Habermas. A lot of the thinking was inspired by Xiao Qiang’s course over a year ago, then a conversation with Nathan Mathias and Brian Keegan on Twitter, then building @TheTweetserve for Theorizing the Web. Interesting how these things acrete.

Through this, I think I’ve gotten a deeper understanding of Habermas’ system/lifeworld distinction than I’ve ever had before. Where I’m still weak on this is on his understanding of the role of law. There’s an angle in there about technology as regulation (a la Lessig) that ties things back to the recursive public. But of course Habermas was envisioning the normal kind of law–the potentially democratic law. Since the I School engages more with policy than it does with technicality, it would be good to have sharper thinking about this besides vague notions of the injustice or not of “the system”–how much of this rhetoric is owed to Habermas or the people he’s drawing on?

My next big writing project is going to be about Piketty and intellectual property, I hope. This is another argument that I’ve been working out for a long time–as an undergrad working on microeconomics of intellectual property, on the job at OpenGeo reading Lukacs for some reason, in grad school coursework. I tried to write something about this shortly after coming back to school but it went nowhere, partly because I was using anachronistic concepts and partly because the term “hacker” got weird political treatment due to some anti-startup yellow journalism.

The name of the imagined essay is “Free Capital.” It will try to trace the economic implications of free software and other open access technical designs, especially their impact on the relationship between capital and labor. It’s sort of an extension of this. I feel like there is more substance there to dig out, especially around liquidity and vendor- and employer- lock in. I’m imagining engaging some of the VC strategy press–I’ve been following the thinking of Kanyi Maqbela for a long time and always learning from it.

What I need to hone in on in terms of economic modeling is under what conditions it’s in labor’s interest to work to produce open source IP or ‘free capital’, and under what conditions is it in capital’s interest to invest in free capital, and what the macroeconomic implications of this are. It’s clear that capital will invest in free capital in order to unseat a monopoly–take Android for instance, or Firefox–but that this is (a) unstable and (b) difficult to take into account in measures of economic growth, since the gains in this case are to be had in the efficiency of the industrial organization rather than on the the value of the innovation itself. Meanwhile, Matt Asay has been saying for years that the returns on open source investment are not high enough to attract serious investment, and industry experience appears to bear that out.

Meanwhile, Picketty argues that the main force for convergence in income is technology and skills diffusion. But these are exogenous to his model. Meanwhile, here in the Bay Area the gold rush rages on and at least word on the grapevine is that VC money is finding a harder and harder time finding high-return investments, and are sinking it into lamer and lamer teams of recent Stamford undergrads.

My weakness in these arguments is that I don’t have data and don’t even know what predictions I’m making. It’s dangerously theoretical.

Meanwhile, my actual dissertation work progresses…slowly. I managed to get a lot done to get my preliminary results with BigBang ready for SciPy 2014. Since then I’ve switched it over to favor an Anaconda build and use I Python Notebooks internally–all good architectural changes but it’s yak shaving. Now I’m hitting performance issues and need to make some serious considerations about databases and data structures.

And then there’s the social work around it. They are good instincts–that I should be working on accessibility, polishing my communication, trying to encourage collaborator’s interest. I know how to start an open source project and it requires that. But then–what about the research? What about the whole point of the thing? Talking with Dave Kush today, he pointed me towards research on computational discourse analysis, which is where I think this needs to go. The material felt way over my head, a reminder that I’ve been barking up so many trees that are not where I think the real problem to work on is. Mainly because I’ve been caught up in the politics of things. It’s bewildering how enriching but distracting the academic context is–how many barriers there are to sitting and doing your best work. Petty disciplinary disputes, for example.

responding to @npdoty on ethics in engineering

Nick Doty wrote a thorough and thoughtful response to my earlier post about the Facebook research ethics problem, correcting me on a number of points.

In particular, he highlights how academic ethicists like Floridi and Nissenbaum have an impact on industry regulation. It’s worth reading for sure.

Nick writes from an interesting position. Since he works for the W3C himself, he is closer to the policy decision makers on these issues. I think this, as well as his general erudition, give him a richer view of how these debates play out. Contrast that with the debate that happens for public consumption, which is naturally less focused.

In trying to understand scholarly work on these ethical and political issues of technology, I’m struck by how differences in where writers and audiences are coming from lead to communication breakdown. The recent blast of popular scholarship about ‘algorithms’, for example, is bewildering to me. I had the privilege of learning what an algorithm was fairly early. I learned about quicksort in an introductory computing class in college. While certainly an intellectual accomplishment, quicksort is politically quite neutral.

What’s odd is how certain contemporary popular scholarship seeks to introduce an unknowing audience to algorithms not via their basic properties–their pseudocode form, their construction from more fundamental computing components, their running time–but for their application in select and controversial contexts. Is this good for the public education? Or is this capitalizing on the vagaries of public attention?

My democratic values are being sorely tested by the quality of public discussion on matters like these. I’m becoming more content with the fact that in reality, these decisions are made by self-selecting experts in inaccessible conversations. To hope otherwise is to downplay the genuine complexity of technical problems and the amount of effort it takes to truly understand them.

But if I can sit complacently with my own expertise, this does not seem like a political solution. The FCC’s willingness to accept public comment, which normally does not elicit the response of a mass action, was just tested by Net Neutrality activists. I see from the linked article that other media-related requests for comments were similarly swamped.

The crux, I believe, is the self-referential nature of the problem–that the mechanics of information flow among the public are both what’s at stake (in terms of technical outcomes) and what drives the process to begin with, when it’s democratic. This is a recipe for a chaotic process. Perhaps there are no attractor or steady states.

Following Rash’s analysis of Habermas and Luhmann’s disagreement as to the fate of complex social systems, we’ve got at least two possible outcomes for how these debates play out. On the one hand, rationality may prevail. Genuine interlocutors, given enough time and with shared standards of discourse, can arrive at consensus about how to act–or, what technical standards to adopt, or what patches to accept into foundational software. On the other hand, the layering of those standards on top of each other, and the reaction of users to them as they build layers of communication on top of the technical edifice, can create further irreducible complexity. With that complexity comes further ethical dilemmas and political tensions.

A good desideratum for a communications system that is used to determine the technicalities of its own design is that its algorithms should intelligently manage the complexity of arriving at normative consensus.

This is truly unfortunate

This is truly unfortunate.

In one sense, this indicates that the majority of Facebook users have no idea how computers work. Do these Facebook users also know that their use of a word processor, or their web browser, or their Amazon purchases, are all mediated by algorithms? Do they understand that what computers do–more or less all they ever do–is mechanically execute algorithms?

I guess not. This is a massive failure of the education system. Perhaps we should start mandating that students read this well-written HowStuffWorks article, “What is a computer algorithm?” That would clear up a lot of confusion, I think.

The Facebook ethics problem is a political problem

So much has been said about the Facebook emotion contagion experiment. Perhaps everything has been said.

The problem with everything having been said is that by an large people’s ethical stances seem predetermined by their habitus.

By which I mean: most people don’t really care. People who care about what happens on the Internet care about it in whatever way is determined by their professional orientation on that matter. Obviously, some groups of people benefit from there being fewer socially imposed ethical restrictions on data scientific practice, either in an industrial or academic context. Others benefit from imposing those ethical restrictions, or cultivating public outrage on the matter.

If this is an ethical issue, what system of ethics are we prepared to use to evaluate it?

You could make an argument from, say, a utilitarian perspective, or a deontological perspective, or even a virtue ethics standpoint. Those are classic moves.

But nobody will listen to what a professionalized academic ethicist will say on the matter. If there’s anybody who does rigorous work on this, it’s probably somebody like Luciano Floridi. His work is great, in my opinion. But I haven’t found any other academics who work in, say, policy that embrace his thinking. I’d love to be proven wrong.

But since Floridi does serious work on information ethics, that’s mainly an inconvenience to pundits. Instead we get heat, not light.

If this process resolves into anything like policy change–either governmental or internally at Facebook–it will because of a process of agonistic politics. “Agonistic” here means fraught with conflicted interests. It may be redundant to modify ‘politics’ with ‘agonistic’ but it makes the point that the moves being made are strategic actions, aimed at gain for ones person or group, more than they are communicative ones, aimed at consensus.

Because e.g. Facebook keeps public discussion fragmented through its EdgeRank algorithm, which even in its well-documented public version is full of apparent political consequences and flaws, there is no way for conversation within the Facebook platform to result in consensus. It is not, as has been observed by others, a public. In a trivial sense, it’s not a public because the data isn’t public. The data is (sort of) private. That’s not a bad thing. It just means that Facebook shouldn’t be where you go to develop a political consensus that could legitimize power.

Twitter is a little better for this, because it’s actually public. Facebook has zero reason to care about the public consensus of people on Twitter though, because those people won’t organize a consumer boycott of Facebook, because they can only reach people that use Twitter.

Facebook is a great–perhaps the greatest–example of what Habermas calls the steering media. “Steering,” because it’s how powerful entities steer public opinion. For Habermas, the steering media control language and therefore culture. When ‘mass’ media control language, citizens no longer use language to form collective will.

For individualized ‘social’ media that is arranged into filter bubbles through relevance algorithms, language is similarly controlled. But rather than having just a single commanding voice, you have the opportunity for every voice to be expressed at once. Through homophily effects in network formation, what you’d expect to see are very intense clusters of extreme cultures that see themselves as ‘normal’ and don’t interact outside of their bubble.

The irony is that the critical left, who should be making these sorts of observations, is itself a bubble within this system of bubbles. Since critical leftism is enacted in commercialized social media which evolves around it, it becomes recuperated in the Situationist sense. Critical outrage is tapped for advertising revenue, which spurs more critical outrage.

The dependence of contemporary criticality on commercial social media for its own diffusion means that, ironically, none of them are able to just quit Facebook like everyone else who has figured out how much Facebook sucks.

It’s not a secret that decentralized communication systems are the solution to this sort of thing. Stanford’s Liberation Tech group captures this ideology rather well. There’s a lot of good work on censorship-resistant systems, distributed messaging systems, etc. For people who are citizens in the free world, many of these alternative communication platforms where we are spared from algorithmic control are very old. Some people still use IRC for chat. I’m a huge fan of mailing lists, myself. Email is the original on-line social media, and ones inbox is ones domain. Everyone who is posting their stuff to Facebook could be posting to a WordPress blog. WordPress, by the way, has a lovely user interface these days and keeps adding “social” features like “liking” and “following”. This goes largely unnoticed, which is too bad, because Automattic, the company the runs WordPress, is really not evil at all.

So there are plenty of solutions to Facebook being bad for manipulative and bad for democracy. Those solutions involve getting people off of Facebook and onto alternative platforms. That’s what a consumer boycott is. That’s how you get companies to stop doing bad stuff, if you don’t have regulatory power.

Obviously the real problem is that we don’t have a less politically problematic technology that does everything we want Facebook to do only not the bad stuff. There are a lot of unsolved technical accomplishments to getting that to work. I think I wrote a social media think piece about this once.

I think a really cool project that everybody who cares about this should be working on is designing and executing on building that alternative to Facebook. That’s a huge project. But just think about how great it would be if we could figure out how to fund, design, build, and market that. These are the big questions for political praxis in the 21st century.

Follow

Get every new post delivered to your Inbox.

Join 930 other followers