Another rant about academia and open source

by Sebastian Benthall

A few weeks ago I went to a great talk by Victoria Stodden about how there’s a crisis of confidence in scientific research that depends on heavy computing. Long story short, because the data and code aren’t openly available, the results aren’t reproducible. That means there’s no check on prior research, and bad results can slip through and be the foundation for future work. This is bad.

Stodden’s solution was to push forward within the scientific community and possibly in legislation (i.e., as a requirement on state-funded research) for open data and code in research. Right on!

Then, something intriguing: somebody in the audience asked how this relates to open source development. Stodden, who just couldn’t stop saying amazing things that needed to be said that day, answered by saying that scientists have a lot to learn from the “open source world”, because they know how to build strong communities around their (open) work.

Looking around the room at this point, I saw several scientists toying with their laptops. I don’t think they were listening.

It’s a difficult thing coming from an open source background and entering academia, because the norms are close, but off.

The other day I wrote in an informal departmental mailing list a criticism and questions about a theorist with a lot of influence in the department, Bruno Latour. There were a lot of reactions to that thread that ranged pretty much all across the board, but one of the surprising reactions I got was along the lines of “I’m not going to do your work for you by answering your question about Latour.” In other words, RTFM. Except, in this case, “the manual” was a book or two of dense academic literature in a field that I was just beginning to dip into.

I don’t want to make too much of this response, since there were a lot of extenuating circumstances, but it did strike me as an indication of one of the cultural divides between open source development and academic scholarship. In the former, you want as many people as possible to understand and use your cool new thing because that enriches your community and makes your feel better about your contribution to the world. For some kinds of scholars, being the only one who understands a thing is a kind of distinction that gives you pride and job opportunities, so you don’t really want other people to know as much as you about it.

Similarly for computationally heavy sciences: if you think your job is to get grants to fund your research, you don’t really want anybody picking through it and telling you your methodology was busted. In an Internet Security course this semester, I’ve had the pleasure of reading John McHugh’s Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Off-line Intrusion Detection System Evaluation as Performed by Lincoln Laboratory. In this incredible paper, McHugh explains why a particular DARPA-funded Lincoln Labs Intrusion Detection research paper is BS, scientifically speaking.

In open source development, we would call McHugh’s paper a bug report. We would say, “McHugh is a great user of our research because he went through and tested for all these bugs, and even has recommendations about how to fix them. This is fantastic! The next release is going to be great.”

In the world of security research, Lincoln Labs complained to the publisher and got the article pulled.

Ok, so security research is a new field with a lot of tough phenomena to deal with and not a ton of time to read up on 300 years of epistemology, philosophy of science, statistical learning theory, or each others’ methodological critiques. I’m not faulting the research community at all. However, it does show some of the trouble that happens in a field that is born out of industry and military funding concerns without the pretensions or emphasis on reproducible truth-discovery that you get in, say, physics.

All of this, it so happens, is what Lyotard describes in his monograph, The Postmodern Condition (1979). Lyotard argues that because of cybernetics and information technologies, because of Wittgenstein, because of the “collapse of metanarratives” that would make anybody believe in anything silly like “truth”, there’s nothing left to legitimize knowledge except Winning.

You can win in two ways: you can research something that helps somebody beat somebody else up or consume more, so that they give you funding. Or you can win by not losing, by pulling some wild theoretical stunt that puts you out of range of everybody else so that they can’t come after you. You become good at critiquing things in ways that sound smart, and tell people who disagree with you that they haven’t read your cannon. You hope that if they call your bluff and read it, they will be so converted by the experience that they will leave you alone.

Some, but certainly not all, of academia seems like this. You can still find people around who believe in epistemic standards: rational deduction, dialectical critique resolving to a consensus, sound statistical induction. Often people will see these as just a kind of meta-methodology in service to a purely pragmatic ideal of something that works well or looks pretty or makes you think in a new way, but that in itself isn’t so bad. Not everybody should be anal about methodology.

But these standards are in tension with the day to day of things, because almost nobody really believes that they are after true ideas any more. It’s so easy to be cynical or territorial.

What seems to be missing is a sense of common purpose in academic work. Maybe it’s the publication incentive structure, maybe it’s because academia is an ideological proxy for class or sex warfare, maybe it’s because of a lot of big egos, maybe it’s the collapse of meta-narratives.

In FOSS development, there’s a secret ethic that’s not particularly well articulated by either the Free Software Movement or the Open Source Initiative, but which I believe is shared by a lot of developers. It goes something like this:

I’m going to try to build a totally great new thing. It’s going to be a lot of work, but it will be worth it because it’s going to be so useful and cool. Gosh, it would be helpful if other people worked on it with me, because this is a lonely pursuit and having others work with me will help me know I’m not chasing after a windmill. If somebody wants to work on it with me, I’m going to try hard to give them what they need to work on it. But hell, even if somebody tells me they used it and found six problems in it, that’s motivating; that gives me something to strive for. It means I have (or had) a user. Users are awesome; they make my heart swell with pride. Also, bonus, having lots of users means people want to pay me for services or hire me or let me give talks. But it’s not like I’m trying to keep others out of this game, because there is just so much that I wish we could build and not enough time! Come on! Let’s build the future together!

I think this is the sort of ethic that leads to the kind of community building that Stodden was talking about. It requires a leap of faith: that your generosity will pay off and that the world won’t run out of problems to be solved. It requires self-confidence because you have to believe that you have something (even something small) to offer that will make you a respected part of an open community without walls to shelter you from criticism. But this ethic is the relentlessly spreading meme of the 21st century and it’s probably going to be victorious by the start of the 22nd. So if we want our academic work to have staying power we better get on this wagon early so we can benefit from the centrality effects in the growing openly collaborative academic network.

I heard David Weinberger give a talk last year on his new book Too Big to Know, in which he argued that “the next Darwin” was going to be actively involved in social media as a research methodology. Tracing their research notes will involve an examination of their inbox and facebook feed to see what conversations were happening, because just so much knowledge transfer is happening socially and digitally and it’s faster and more contextual than somebody spending a weekend alone reading books in a library. He’s right, except maybe for one thing, which is that this digital dialectic (or pluralectic) implies that “the next Darwin” isn’t just one dude, Darwin, with his own ‘-ism’ and pernicious Social adherents. Rather, it means that the next great theory of the origin of species is going to be built by a massive collaborative effort in which lots of people will take an active part. The historical record will show their contributions not just with the clumsy granularity of conference publications and citations, but with minute granularity of thousands of traced conversations. The theory itself will probably be too complicated for any one person to understand, but that’s OK, because it will be well architected and there will be plenty of domain experts to go to if anyone has problems with any particular part of it. And it will be growing all the time and maybe competing with a few other theories. For a while people might have to dual boot their brains until somebody figures out how to virtualize Foucauldean Quantum Mechanics on a Organic Data Splicing ideological platform, but one day some crazy scholar-hacker will find a way.

“Cool!” they will say, throwing a few bucks towards the Kickstarter project for a musical instrument that plays to the tune of the uncollapsed probabilistic power dynamics playing out between our collated heartbeats.

Does that future sound good? Good. Because it’s already starting. It’s just an evolution of the way things have always been, and I’m pretty sure based on what I’ve been hearing that it’s a way of doing things that’s picking of steam. It’s just not “normal” yet. Generation gap, maybe. That’s cool. At the rate things are changing, it will be here before you know it.