Digifesto

Category: academia

Another rant about academia and open source

A few weeks ago I went to a great talk by Victoria Stodden about how there’s a crisis of confidence in scientific research that depends on heavy computing. Long story short, because the data and code aren’t openly available, the results aren’t reproducible. That means there’s no check on prior research, and bad results can slip through and be the foundation for future work. This is bad.

Stodden’s solution was to push forward within the scientific community and possibly in legislation (i.e., as a requirement on state-funded research) for open data and code in research. Right on!

Then, something intriguing: somebody in the audience asked how this relates to open source development. Stodden, who just couldn’t stop saying amazing things that needed to be said that day, answered by saying that scientists have a lot to learn from the “open source world”, because they know how to build strong communities around their (open) work.

Looking around the room at this point, I saw several scientists toying with their laptops. I don’t think they were listening.

It’s a difficult thing coming from an open source background and entering academia, because the norms are close, but off.

The other day I wrote in an informal departmental mailing list a criticism and questions about a theorist with a lot of influence in the department, Bruno Latour. There were a lot of reactions to that thread that ranged pretty much all across the board, but one of the surprising reactions I got was along the lines of “I’m not going to do your work for you by answering your question about Latour.” In other words, RTFM. Except, in this case, “the manual” was a book or two of dense academic literature in a field that I was just beginning to dip into.

I don’t want to make too much of this response, since there were a lot of extenuating circumstances, but it did strike me as an indication of one of the cultural divides between open source development and academic scholarship. In the former, you want as many people as possible to understand and use your cool new thing because that enriches your community and makes your feel better about your contribution to the world. For some kinds of scholars, being the only one who understands a thing is a kind of distinction that gives you pride and job opportunities, so you don’t really want other people to know as much as you about it.

Similarly for computationally heavy sciences: if you think your job is to get grants to fund your research, you don’t really want anybody picking through it and telling you your methodology was busted. In an Internet Security course this semester, I’ve had the pleasure of reading John McHugh’s Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Off-line Intrusion Detection System Evaluation as Performed by Lincoln Laboratory. In this incredible paper, McHugh explains why a particular DARPA-funded Lincoln Labs Intrusion Detection research paper is BS, scientifically speaking.

In open source development, we would call McHugh’s paper a bug report. We would say, “McHugh is a great user of our research because he went through and tested for all these bugs, and even has recommendations about how to fix them. This is fantastic! The next release is going to be great.”

In the world of security research, Lincoln Labs complained to the publisher and got the article pulled.

Ok, so security research is a new field with a lot of tough phenomena to deal with and not a ton of time to read up on 300 years of epistemology, philosophy of science, statistical learning theory, or each others’ methodological critiques. I’m not faulting the research community at all. However, it does show some of the trouble that happens in a field that is born out of industry and military funding concerns without the pretensions or emphasis on reproducible truth-discovery that you get in, say, physics.

All of this, it so happens, is what Lyotard describes in his monograph, The Postmodern Condition (1979). Lyotard argues that because of cybernetics and information technologies, because of Wittgenstein, because of the “collapse of metanarratives” that would make anybody believe in anything silly like “truth”, there’s nothing left to legitimize knowledge except Winning.

You can win in two ways: you can research something that helps somebody beat somebody else up or consume more, so that they give you funding. Or you can win by not losing, by pulling some wild theoretical stunt that puts you out of range of everybody else so that they can’t come after you. You become good at critiquing things in ways that sound smart, and tell people who disagree with you that they haven’t read your cannon. You hope that if they call your bluff and read it, they will be so converted by the experience that they will leave you alone.

Some, but certainly not all, of academia seems like this. You can still find people around who believe in epistemic standards: rational deduction, dialectical critique resolving to a consensus, sound statistical induction. Often people will see these as just a kind of meta-methodology in service to a purely pragmatic ideal of something that works well or looks pretty or makes you think in a new way, but that in itself isn’t so bad. Not everybody should be anal about methodology.

But these standards are in tension with the day to day of things, because almost nobody really believes that they are after true ideas any more. It’s so easy to be cynical or territorial.

What seems to be missing is a sense of common purpose in academic work. Maybe it’s the publication incentive structure, maybe it’s because academia is an ideological proxy for class or sex warfare, maybe it’s because of a lot of big egos, maybe it’s the collapse of meta-narratives.

In FOSS development, there’s a secret ethic that’s not particularly well articulated by either the Free Software Movement or the Open Source Initiative, but which I believe is shared by a lot of developers. It goes something like this:

I’m going to try to build a totally great new thing. It’s going to be a lot of work, but it will be worth it because it’s going to be so useful and cool. Gosh, it would be helpful if other people worked on it with me, because this is a lonely pursuit and having others work with me will help me know I’m not chasing after a windmill. If somebody wants to work on it with me, I’m going to try hard to give them what they need to work on it. But hell, even if somebody tells me they used it and found six problems in it, that’s motivating; that gives me something to strive for. It means I have (or had) a user. Users are awesome; they make my heart swell with pride. Also, bonus, having lots of users means people want to pay me for services or hire me or let me give talks. But it’s not like I’m trying to keep others out of this game, because there is just so much that I wish we could build and not enough time! Come on! Let’s build the future together!

I think this is the sort of ethic that leads to the kind of community building that Stodden was talking about. It requires a leap of faith: that your generosity will pay off and that the world won’t run out of problems to be solved. It requires self-confidence because you have to believe that you have something (even something small) to offer that will make you a respected part of an open community without walls to shelter you from criticism. But this ethic is the relentlessly spreading meme of the 21st century and it’s probably going to be victorious by the start of the 22nd. So if we want our academic work to have staying power we better get on this wagon early so we can benefit from the centrality effects in the growing openly collaborative academic network.

I heard David Weinberger give a talk last year on his new book Too Big to Know, in which he argued that “the next Darwin” was going to be actively involved in social media as a research methodology. Tracing their research notes will involve an examination of their inbox and facebook feed to see what conversations were happening, because just so much knowledge transfer is happening socially and digitally and it’s faster and more contextual than somebody spending a weekend alone reading books in a library. He’s right, except maybe for one thing, which is that this digital dialectic (or pluralectic) implies that “the next Darwin” isn’t just one dude, Darwin, with his own ‘-ism’ and pernicious Social adherents. Rather, it means that the next great theory of the origin of species is going to be built by a massive collaborative effort in which lots of people will take an active part. The historical record will show their contributions not just with the clumsy granularity of conference publications and citations, but with minute granularity of thousands of traced conversations. The theory itself will probably be too complicated for any one person to understand, but that’s OK, because it will be well architected and there will be plenty of domain experts to go to if anyone has problems with any particular part of it. And it will be growing all the time and maybe competing with a few other theories. For a while people might have to dual boot their brains until somebody figures out how to virtualize Foucauldean Quantum Mechanics on a Organic Data Splicing ideological platform, but one day some crazy scholar-hacker will find a way.

“Cool!” they will say, throwing a few bucks towards the Kickstarter project for a musical instrument that plays to the tune of the uncollapsed probabilistic power dynamics playing out between our collated heartbeats.

Does that future sound good? Good. Because it’s already starting. It’s just an evolution of the way things have always been, and I’m pretty sure based on what I’ve been hearing that it’s a way of doing things that’s picking of steam. It’s just not “normal” yet. Generation gap, maybe. That’s cool. At the rate things are changing, it will be here before you know it.

Notes on Open Access for academic works

You could read this blog post, or you could watch this YouTube video and get about 50% of the written information.

I attended a meeting last week about Open Access publishing at Berkeley. As is well-known now, most academic publishing is a ruthless industry that stifles innovation by making it expensive to acess academic journals. (Nevermind for a minute that this industry is only possible because of academia’s unhealthy dependence on these journals as a currency of prestige.) Thankfully, principles of ‘openness’ are swiftly descending on the academy.

Three interesting things came up in the meeting. The first was the existence of hybrid open access journals. These try to bridge the gap between open access journals (which generally allow publishers to maintain copyright and make works available on the web) and normal journals by charging authors a premium for making their articles openly available in an otherwise journal..

This sounds good for about two seconds until you think about it and realize that the publisher is essentially ransoming the openness of the work, and making the author incur the cost. Often the fees charged by hybrid publishers for openness are exorbitant.

It’s worth noting that open access publishers tend to charge authors for publication as well. Also, in many cases universities or their libraries have started subsidizing their faculty to publish openly. (That makes sense, since it cuts down on library subscription costs!)

The difference between open access and closed publish, then, appears to be that in the case of open access publishing, authors (or the university they are associated with? unclear) get to maintain copyright. Also, the fees tend to be more reasonable. These may be related. The openness of the content means that publishers don’t reap monopoly profits, so doesn’t sense for the OA journal to charge academics for profit lost due to open content. OA journals will run leaner. They will also, in a just world, be more competitive, but that will require a shift in the way academics view prestige as being associated with a journal’s name or ‘impact’.

Which brings me to impact rankings. Since academics need to compete on how good their research is, they need a filter mechanism for demonstrating the value of their work. This has traditionally been benchmarked against journal publication, and in particular which journals one publishes in. Journals are ranked by various estimates of impact factor–who reads it, who cites it, who takes it seriously.

I haven’t looked into it carefully, but I would be willing to bet that the definition of impact factor is viciously circular to the advantage of any existing journal with “high impact.” It is precisely this estimation of “high impact” that gives journals the power to get academics to provide free content (articles) and free labor (editors) and then charge libraries for access to the tune of extraordinary profits.

This is a bad system. The solution, article level metrics (where the use and impact of the work itself, not the impact of the journal in which it is published, is considered what’s valuable–maybe a no brainer) is being pushed forward by the Public Library of Science, a leading Open Access publisher, but at the time of this writing article level metrics are covered by only the stubbiest of lonesome stubs on Wikipedia.

The other interesting thing I learned was about the growing trend of prestigious universities mandating that faculty publish open access. Harvard, MIT, Princton, Stanford, and Duke are apparently on board for this already. By the domino logic of academic prestige competition, this means a sea change is afoot.

There are some objections to this trend that are quickly countered. The main one appears to come from the humanities, where there are many small “society-based” journals that use a traditional business model to publish works. In my imagination, these journals are a bit like private poetry magazines, or n+1.

As a result, these university-wide open access mandates come with a strong opt-out clause. Faculty can get permission to publish in closed way, if they really really want to.

Then why is this a big deal? It turns out that it’s about bargaining power. It’s not that Harvard, MIT, and the rest are no longer publishing in Nature or other big name journals. It’s just that they can negotiate special deals with the major publishers to allow the universities to maintain copyright. With that copyright, they can then publish the works on-line with a university based publishing tool.

What does this mean for other schools? Well, it means that open access journals are going to become more legitimate and traditional journals are in trouble unless they can change their business models. And it means that more and more universities are going to have an easier time using their bargaining power to change the way academic publishing works.

At Berkeley (and I believe this is generally true of other universities) the decision to go open access is a faculty decision, to be made at the Faculty Senate. I didn’t get a sense from the meeting when these meetings take place or how like the faculty was to take the dive, but I’d like to look into it more.

Academic holy wars and conference acceptances

I’m inspired by Mel Chua’s recent posts about culture shock of entering academia from the open source. I don’t have her humility about it and so am convinced that they are doing a lot of things wrong more or less from the get-go, so I’m more or less looking for problems. That said, one came up at lunch with an old friend who’s finishing up his PhD.

My friend Joe reports that sometimes, when papers are submitted to conferences, academic holy war disputes will sometimes affect whether papers get accepted.

Ok, maybe that doesn’t sound like much of a surprise, but it’s an interesting mechanism.

According to Joe, conference papers are reviewed by attendees. Informally, somebody who gets their paper accepted is required to review 3 or so other papers. Nothing bad there.

However, when there is a “holy war” — a major division within the field about a basic theoretical or methodological issue — these religious persuasions will affect the reviews and lead to some papers being rejected despite what we could suppose to be their objective merits.

Is this bad? Is it any different from the open source process? I think so.

But not because of the dispute itself. There’s got to be some substance to these kinds of theoretical and methodological differences. Gosh, open source is full of divisive holy wars, and in general they are a good thing, since competing camps race to innovate and prove that Python is a better programming language than Ruby, or whatever.

The difference in the academic domain is that these conferences are a bottleneck for publication and accreditation, and the conference process itself is not easily forked. So the paper selection process is not merely curational, in the sense of selecting papers of interest for the attendees. Rather, rejected papers are silenced and discredited.

Some balance has to be struck. There has to be some venue for a real conflict of ideas, because unless you fight the holy war, how can you find out who is right? On the other hand, since individual’s reputations are tied to the success of their religion, there is an incentive for doctrinaire and skulduggerous rejection of opposing papers without regard to how much these papers contribute to the field.

What’s the solution? We could imagine a more open, web-based unconference system for accepting papers. There could be the same requirement that one has to review other papers in order to get ones own paper included. Reviews can include rating metadata that affects its prominence within the conference; reviewers could also be rated (for their comments) to give them additional clout within the community. Then you could track discrepancies in people’s ratings on controversial items to detect where the holy wars are at and correct for them statistically when awarding credit.