Digifesto

Tag: open source software

Open Source Software (OSS) and regulation by algorithms

It has long been argued that technology, especially built infrastructure, has political implications (Winner, 1980). With the rise of the Internet as the dominating form of technological infrastructure, Lessig (1999), among others, argued that software code is a regulating force parallel to the law. By extension of this argument, we would expect open source software to be a regulating force in society.

This is not the case. There is a lot of open source software and much of it is very important. But there’s no evidence to say that open source software, in and of itself, regulates society except in the narrow sense in that the communities that build and maintain it are necessarily constrained by its technical properties.

On the other hand, there are countless platforms and institutions that deploy open source software as part of their activity, which does have a regulating force on society. The Big Tech companies that are so powerful that they seem to rival some states are largely built on an “open core” of software. Likewise for smaller organizations. OSS is simply part of the the contemporary software production process, and it is ubiquitous.

Most widely used programming languages are open source. Perhaps a good analogy for OSS is that it is a collection of languages, and literatures in those languages. These languages and much of their literature are effectively in the public domain. We might say the same thing about English or Chinese or German or Hindi.

Law, as we know it in the modern state, is a particular expression of language with purposeful meaning. It represents, at its best, a kind of institutional agreement that constraints behavior based on its repetition and appeals to its internal logic. The Rule of Law, as we know it, depends on the social reproduction of this linguistic community. Law Schools are the main means of socializing new lawyers, who are then credentialed to participate in and maintain the system which regulates society. Lawyers are typically good with words, and their practice is in a sense constrained by their language, but only in the widest of Sapir-Whorf senses. Law is constrained more the question of which language is institutionally recognized; indeed, those words and phrases that have been institutionally ratified are the law.

Let’s consider again the generative question of whether law could be written in software code. I will leave aside for a moment whether or not this would be desirable. I will entertain the idea in part because I believe that it is inevitable, because of how the algorithm is the form of modern rationality (Totaro and Ninno, 2014) and the evolutionary power of rationality.

A law written in software would need to be written in a programming language and this would all but entail that it is written on an “open core” of software. Concretely: one might write laws in Python.

The specific code in the software law might or might not be open. There might one day be a secretive authoritarian state with software laws that are not transparent or widely known. Nothing rules that out.

We could imagine a more democratic outcome as well. It would be natural, in a more liberal kind of state, for the software laws to be open on principle. The definitions here become a bit tricky: the designation of “open source software” is one within the schema of copyright and licensing. Could copyright laws and license be written in software? In other words, could the ‘openness’ of the software laws be guaranteed by their own form? This is an interesting puzzle for computer scientists and logicians.

For the sake of argument, suppose that something like this is accomplished. Perhaps it is accomplished merely by tradition: the institution that ratifies software laws publishes these on purpose, in order to facilitate healthy democratic debate about the law.

Even with all this in place, we still don’t have regulation. We have now discussed software legislation, but not software execution and enforcement. If software is only as powerful as the expression of a language. A deployed system, running that software, is an active force in the world. Such a system implicates a great many things beyond the software itself. It requires computers and and networking infrastructure. It requires databases full of data specific to the applications for which its ran.

The software dictates the internal logic by which a system operates. But that logic is only meaningful when coupled with an external societal situation. The membrane between the technical system and the society in which it participates is of fundamental importance to understanding the possibility of technical regulation, just as the membrane between the Rule of Law and society–which we might say includes elections and the courts in the U.S.–is of fundamental importance to understanding the possibility of linguistic regulation.

References

Lessig, L. (1999). Code is law. The Industry Standard18.

Hildebrandt, M. (2015). Smart technologies and the end (s) of law: Novel entanglements of law and technology. Edward Elgar Publishing.

Totaro, P., & Ninno, D. (2014). The concept of algorithm as an interpretative key of modern rationality. Theory, Culture & Society31(4), 29-49.

Winner, L. (1980). Do artifacts have politics?. Daedalus, 121-136.

thinking about meritocracy in open source communities

There has been a trend in open source development culture over the past ten years or so. It is the rejection of ‘meritocracy’. Just now, I saw this Post-Meritocracy Manifesto, originally created by Coraline Ada Ehmke. It is exactly what it sounds like: an explicit rejection of meritocracy, specifically in open source development. It captures a recent progressive wing of software development culture. It is attracting signatories.

I believe this is a “trend” because I’ve noticed a more subtle expression of similar ideas a few months ago. This came up when we were coming up with a Code of Conduct for BigBang. We wound up picking the Contributor Covenant Code of Conduct, though there’s still some open questions about how to integrate it with our Governance policy.

This Contributor Covenant is widely adopted and the language of it seems good to me. I was surprised though when I found the rationale for it specifically mentioned meritocracy as a problem the code of conduct was trying to avoid:

Marginalized people also suffer some of the unintended consequences of dogmatic insistence on meritocratic principles of governance. Studies have shown that organizational cultures that value meritocracy often result in greater inequality. People with “merit” are often excused for their bad behavior in public spaces based on the value of their technical contributions. Meritocracy also naively assumes a level playing field, in which everyone has access to the same resources, free time, and common life experiences to draw upon. These factors and more make contributing to open source a daunting prospect for many people, especially women and other underrepresented people.

If it looks familiar, it may be because it was written by the same author, Coraline Ada Ehmke.

I have to admit that though I’m quite glad that we have a Code of Conduct now in BigBang, I’m uncomfortable with the ideological presumptions of its rationale and the rejection of ‘meritocracy’. There is a lot packed into this paragraph that is open to productive disagreement and which is not necessary for a commitment to the general point that harassment is bad for an open source community.

Perhaps this would be easier for me to ignore if this political framing did not mirror so many other political tensions today, and if open source governance were not something I’ve been so invested in understanding. I’ve taught a course on open source management, and BigBang spun out of that effort as an experiment in scientific analysis of open source communities. I am, I believe, deep in on this topic.

So what’s the problem? The problem is that I think there’s something painfully misaligned about criticism of meritocracy in culture at large and open source development, which is a very particular kind of organizational form. There is also perhaps a misalignment between the progressive politics of inclusion expressed in these manifestos and what many open source communities are really trying to accomplish. Surely there must be some kind of merit that is not in scare quotes, or else there would not be any good open source software to use a raise a fuss about.

Though it does not directly address the issue, I’m reminded of an old email discussion on the Numpy mailing list that I found when I was trying to do ethnographic work on the Scientific Python community. It was a response by John Hunter, the creator of Matplotlib, in response to concerns raised when Travis Oliphant, the leader of NumPy, started Continuum Analytics and there were concerns about corporate control over NumPy. Hunter quite thoughtfully, in my opinion, debunked the idea that open source governance should be a ‘democracy’, like many people assume institutions ought to be by default. After a long discussion about how Travis had great merit as a leader, he argued:

Democracy is something that many of us have grown up by default to consider as the right solution to many, if not most or, problems of governance. I believe it is a solution to a specific problem of governance. I do not believe democracy is a panacea or an ideal solution for most problems: rather it is the right solution for which the consequences of failure are too high. In a state (by which I mean a government with a power to subject its people to its will by force of arms) where the consequences of failure to submit include the death, dismemberment, or imprisonment of dissenters, democracy is a safeguard against the excesses of the powerful. Generally, there is no reason to believe that the simple majority of people polled is the “best” or “right” answer, but there is also no reason to believe that those who hold power will rule beneficiently. The democratic ability of the people to check to the rule of the few and powerful is essential to insure the survival of the minority.

In open source software development, we face none of these problems. Our power to fork is precisely the power the minority in a tyranical democracy lacks: noone will kill us for going off the reservation. We are free to use the product or not, to modify it or not, to enhance it or not.

The power to fork is not abstract: it is essential. matplotlib, and chaco, both rely *heavily* on agg, the Antigrain C++ rendering library. At some point many years ago, Maxim, the author of Agg, decided to change the license of Agg (circa version 2.5) to GPL rather than BSD. Obviously, this was a non-starter for projects like mpl, scipy and chaco which assumed BSD licensing terms. Unfortunately, Maxim had a new employer which appeared to us to be dictating the terms and our best arguments fell on deaf ears. No matter: mpl and Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think that less than 1% of our users have even noticed. Yes, we forked the project, and yes, noone has noticed. To me this is the ultimate reason why governance of open source, free projects does not need to be democratic. As painful as a fork may be, it is the ultimate antidote to a leader who may not have your interests in mind. It is an antidote that we citizens in a state government may not have.

It is true that numpy exists in a privileged position in a way that matplotlib or scipy does not. Numpy is the core. Yes, Continuum is different than STScI because Travis is both the lead of Numpy and the lead of the company sponsoring numpy. These are important differences. In the worst cases, we might imagine that these differences will negatively impact numpy and associated tools. But these worst case scenarios that we imagine will most likely simply distract us from what is going on: Travis, one of the most prolific and valuable contributers to the scientific python community, has decided to refocus his efforts to do more. And that is a very happy moment for all of us.

This is a nice articulation of how forking, not voting, is the most powerful governance mechanism in open source development, and how it changes what our default assumptions about leadership ought to be. A critical but I think unacknowledged question is to how the possibility of forking interacts with the critique of meritocracy in organizations in general, and specifically what that means for community inclusiveness as a goal in open source communities. I don’t think it’s straightforward.

Note: Nick Doty has written a nice response to this on his blog.

The node.js fork — something new to think about

For Classics we are reading Albert Hirschman’s Exit, Voice, and Loyalty. Oddly, though normally I hear about ‘voice’ as an action from within an organization, the first few chapters of the book (including the introduction of the Voice concept itselt), are preoccupied with elaborations on the neoclassical market mechanism. Not what I expected.

I’m looking for interesting research use cases for BigBang, which is about analyzing the sociotechnical dynamics of collaboration. I’m building it to better understand open source software development communities, primarily. This is because I want to create a harmonious sociotechnical superintelligence to take over the world.

For a while I’ve been interested in Hadoop’s interesting case of being one software project with two companies working together to build it. This is reminiscent (for me) of when we started GeoExt at OpenGeo and Camp2Camp. The economics of shared capital are fascinating and there are interesting questions about how human resources get organized in that sort of situation. In my experience, there becomes a tension between the needs of firms to differentiate their products and make good on their contracts and the needs of the developer community whose collective value is ultimately tied to the robustness of their technology.

Unfortunately, building out BigBang to integrate with various email, version control, and issue tracking backends is a lot of work and there’s only one of me right now to both build the infrastructure, do the research, and train new collaborators (who are starting to do some awesome work, so this is paying off.) While integrating with Apache’s infrastructure would have been a smart first move, instead I chose to focus on Mailman archives and git repositories. Google Groups and whatever Apache is using for their email lists do not publish their archives in .mbox format, which is pain for me. But luckily Google Takeout does export data from folks’ on-line inbox in .mbox format. This is great for BigBang because it means we can investigate email data from any project for which we know an insider willing to share their records.

Does a research ethics issue arise when you start working with email that is openly archived in a difficult format, then exported from somebody’s private email? Technically you get header information that wasn’t open before–perhaps it was ‘private’. But arguably this header information isn’t personal information. I think I’m still in the clear. Plus, IRB will be irrelevent when the robots take over.

All of this is a long way of getting around to talking about a new thing I’m wondering about, the Node.js fork. It’s interesting to think about open source software forks in light of Hirschman’s concepts of Exit and Voice since so much of the activity of open source development is open, virtual communication. While you might at first think a software fork is definitely a kind of Exit, it sounds like IO.js was perhaps a friendly fork of just somebody who wanted to hack around. In theory, code can be shared between forks–in fact this was the principle that GitHub’s forking system was founded on. So there are open questions (to me, who isn’t involved in the Node.js community at all and is just now beginning to wonder about it) along the lines of to what extent a fork is a real event in the history of the project, vs. to what extent it’s mythological, vs. to what extent it’s a reification of something that was already implicit in the project’s sociotechnical structure. There are probably other great questions here as well.

A friend on the inside tells me all the action on this happened (is happening?) on the GitHub issue tracker, which is definitely data we want to get BigBang connected with. Blissfully, there appear to be well supported Python libraries for working with the GitHub API. I expect the first big hurdle we hit here will be rate limiting.

Though we haven’t been able to make integration work yet, I’m still hoping there’s some way we can work with MetricsGrimoire. They’ve been a super inviting community so far. But our software stacks and architecture are just different enough, and the layers we’ve built so far thin enough, that it’s hard to see how to do the merge. A major difference is that while MetricsGrimoire tools are built to provide application interfaces around a MySQL data backend, since BigBang is foremost about scientific analysis our whole data pipeline is built to get things into Pandas dataframes. Both projects are in Python. This too is a weird microcosm of the larger sociotechnical ecosystem of software production, of which the “open” side is only one (important) part.