Tag: open source development

thinking about meritocracy in open source communities

There has been a trend in open source development culture over the past ten years or so. It is the rejection of ‘meritocracy’. Just now, I saw this Post-Meritocracy Manifesto, originally created by Coraline Ada Ehmke. It is exactly what it sounds like: an explicit rejection of meritocracy, specifically in open source development. It captures a recent progressive wing of software development culture. It is attracting signatories.

I believe this is a “trend” because I’ve noticed a more subtle expression of similar ideas a few months ago. This came up when we were coming up with a Code of Conduct for BigBang. We wound up picking the Contributor Covenant Code of Conduct, though there’s still some open questions about how to integrate it with our Governance policy.

This Contributor Covenant is widely adopted and the language of it seems good to me. I was surprised though when I found the rationale for it specifically mentioned meritocracy as a problem the code of conduct was trying to avoid:

Marginalized people also suffer some of the unintended consequences of dogmatic insistence on meritocratic principles of governance. Studies have shown that organizational cultures that value meritocracy often result in greater inequality. People with “merit” are often excused for their bad behavior in public spaces based on the value of their technical contributions. Meritocracy also naively assumes a level playing field, in which everyone has access to the same resources, free time, and common life experiences to draw upon. These factors and more make contributing to open source a daunting prospect for many people, especially women and other underrepresented people.

If it looks familiar, it may be because it was written by the same author, Coraline Ada Ehmke.

I have to admit that though I’m quite glad that we have a Code of Conduct now in BigBang, I’m uncomfortable with the ideological presumptions of its rationale and the rejection of ‘meritocracy’. There is a lot packed into this paragraph that is open to productive disagreement and which is not necessary for a commitment to the general point that harassment is bad for an open source community.

Perhaps this would be easier for me to ignore if this political framing did not mirror so many other political tensions today, and if open source governance were not something I’ve been so invested in understanding. I’ve taught a course on open source management, and BigBang spun out of that effort as an experiment in scientific analysis of open source communities. I am, I believe, deep in on this topic.

So what’s the problem? The problem is that I think there’s something painfully misaligned about criticism of meritocracy in culture at large and open source development, which is a very particular kind of organizational form. There is also perhaps a misalignment between the progressive politics of inclusion expressed in these manifestos and what many open source communities are really trying to accomplish. Surely there must be some kind of merit that is not in scare quotes, or else there would not be any good open source software to use a raise a fuss about.

Though it does not directly address the issue, I’m reminded of an old email discussion on the Numpy mailing list that I found when I was trying to do ethnographic work on the Scientific Python community. It was a response by John Hunter, the creator of Matplotlib, in response to concerns raised when Travis Oliphant, the leader of NumPy, started Continuum Analytics and there were concerns about corporate control over NumPy. Hunter quite thoughtfully, in my opinion, debunked the idea that open source governance should be a ‘democracy’, like many people assume institutions ought to be by default. After a long discussion about how Travis had great merit as a leader, he argued:

Democracy is something that many of us have grown up by default to consider as the right solution to many, if not most or, problems of governance. I believe it is a solution to a specific problem of governance. I do not believe democracy is a panacea or an ideal solution for most problems: rather it is the right solution for which the consequences of failure are too high. In a state (by which I mean a government with a power to subject its people to its will by force of arms) where the consequences of failure to submit include the death, dismemberment, or imprisonment of dissenters, democracy is a safeguard against the excesses of the powerful. Generally, there is no reason to believe that the simple majority of people polled is the “best” or “right” answer, but there is also no reason to believe that those who hold power will rule beneficiently. The democratic ability of the people to check to the rule of the few and powerful is essential to insure the survival of the minority.

In open source software development, we face none of these problems. Our power to fork is precisely the power the minority in a tyranical democracy lacks: noone will kill us for going off the reservation. We are free to use the product or not, to modify it or not, to enhance it or not.

The power to fork is not abstract: it is essential. matplotlib, and chaco, both rely *heavily* on agg, the Antigrain C++ rendering library. At some point many years ago, Maxim, the author of Agg, decided to change the license of Agg (circa version 2.5) to GPL rather than BSD. Obviously, this was a non-starter for projects like mpl, scipy and chaco which assumed BSD licensing terms. Unfortunately, Maxim had a new employer which appeared to us to be dictating the terms and our best arguments fell on deaf ears. No matter: mpl and Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think that less than 1% of our users have even noticed. Yes, we forked the project, and yes, noone has noticed. To me this is the ultimate reason why governance of open source, free projects does not need to be democratic. As painful as a fork may be, it is the ultimate antidote to a leader who may not have your interests in mind. It is an antidote that we citizens in a state government may not have.

It is true that numpy exists in a privileged position in a way that matplotlib or scipy does not. Numpy is the core. Yes, Continuum is different than STScI because Travis is both the lead of Numpy and the lead of the company sponsoring numpy. These are important differences. In the worst cases, we might imagine that these differences will negatively impact numpy and associated tools. But these worst case scenarios that we imagine will most likely simply distract us from what is going on: Travis, one of the most prolific and valuable contributers to the scientific python community, has decided to refocus his efforts to do more. And that is a very happy moment for all of us.

This is a nice articulation of how forking, not voting, is the most powerful governance mechanism in open source development, and how it changes what our default assumptions about leadership ought to be. A critical but I think unacknowledged question is to how the possibility of forking interacts with the critique of meritocracy in organizations in general, and specifically what that means for community inclusiveness as a goal in open source communities. I don’t think it’s straightforward.

The node.js fork — something new to think about

For Classics we are reading Albert Hirschman’s Exit, Voice, and Loyalty. Oddly, though normally I hear about ‘voice’ as an action from within an organization, the first few chapters of the book (including the introduction of the Voice concept itselt), are preoccupied with elaborations on the neoclassical market mechanism. Not what I expected.

I’m looking for interesting research use cases for BigBang, which is about analyzing the sociotechnical dynamics of collaboration. I’m building it to better understand open source software development communities, primarily. This is because I want to create a harmonious sociotechnical superintelligence to take over the world.

For a while I’ve been interested in Hadoop’s interesting case of being one software project with two companies working together to build it. This is reminiscent (for me) of when we started GeoExt at OpenGeo and Camp2Camp. The economics of shared capital are fascinating and there are interesting questions about how human resources get organized in that sort of situation. In my experience, there becomes a tension between the needs of firms to differentiate their products and make good on their contracts and the needs of the developer community whose collective value is ultimately tied to the robustness of their technology.

Unfortunately, building out BigBang to integrate with various email, version control, and issue tracking backends is a lot of work and there’s only one of me right now to both build the infrastructure, do the research, and train new collaborators (who are starting to do some awesome work, so this is paying off.) While integrating with Apache’s infrastructure would have been a smart first move, instead I chose to focus on Mailman archives and git repositories. Google Groups and whatever Apache is using for their email lists do not publish their archives in .mbox format, which is pain for me. But luckily Google Takeout does export data from folks’ on-line inbox in .mbox format. This is great for BigBang because it means we can investigate email data from any project for which we know an insider willing to share their records.

Does a research ethics issue arise when you start working with email that is openly archived in a difficult format, then exported from somebody’s private email? Technically you get header information that wasn’t open before–perhaps it was ‘private’. But arguably this header information isn’t personal information. I think I’m still in the clear. Plus, IRB will be irrelevent when the robots take over.

All of this is a long way of getting around to talking about a new thing I’m wondering about, the Node.js fork. It’s interesting to think about open source software forks in light of Hirschman’s concepts of Exit and Voice since so much of the activity of open source development is open, virtual communication. While you might at first think a software fork is definitely a kind of Exit, it sounds like IO.js was perhaps a friendly fork of just somebody who wanted to hack around. In theory, code can be shared between forks–in fact this was the principle that GitHub’s forking system was founded on. So there are open questions (to me, who isn’t involved in the Node.js community at all and is just now beginning to wonder about it) along the lines of to what extent a fork is a real event in the history of the project, vs. to what extent it’s mythological, vs. to what extent it’s a reification of something that was already implicit in the project’s sociotechnical structure. There are probably other great questions here as well.

A friend on the inside tells me all the action on this happened (is happening?) on the GitHub issue tracker, which is definitely data we want to get BigBang connected with. Blissfully, there appear to be well supported Python libraries for working with the GitHub API. I expect the first big hurdle we hit here will be rate limiting.

Though we haven’t been able to make integration work yet, I’m still hoping there’s some way we can work with MetricsGrimoire. They’ve been a super inviting community so far. But our software stacks and architecture are just different enough, and the layers we’ve built so far thin enough, that it’s hard to see how to do the merge. A major difference is that while MetricsGrimoire tools are built to provide application interfaces around a MySQL data backend, since BigBang is foremost about scientific analysis our whole data pipeline is built to get things into Pandas dataframes. Both projects are in Python. This too is a weird microcosm of the larger sociotechnical ecosystem of software production, of which the “open” side is only one (important) part.