Digifesto

Category: open source software

the make or buy decision (TCE) in the software and cybersecurity

The paradigmatic case of transaction cost economics (TCE) is the make-or-buy decision. A firm, F, needs something, C. Do they make it in-house or do they buy it from somewhere else?

If the firm makes it in-house, they will incur some bureaucratic overhead costs in addition to the costs of production. But they will also be able to specialize C for their purposes. They can institute their own internal quality controls. And so on.

If the firm buys it on the open market from some other firm, say, G, they don’t pay the overhead costs. They do lose the benefits of specialization, and the quality controls are only those based on economic competitive pressure on suppliers.

There is an intermediate option, which is a contract between F and G which establishes an ongoing relationship between the two firms. This contract creates a field in which C can be specialized for F, and there can be assurances of quality, while the overhead is distributed efficiently between F and G.

This situation is both extremely common in business practice and not well handled by neoclassical, orthodox economics. It’s the case that TCE is tremendously preoccupied with.


My background and research is in the software industry, which is rife with cases like these.

Developers are constantly faced with a decision to make-or-buy software components. In principle, they can developer any component themselves. In practice, this is rarely cost-effective.

In software, open source software components are a prevalent solution to this problem. This can be thought of as a very strange market where all the prices are zero. The most popular open source libraries are very generic , having little “asset specificity” in TCE terms.

The lack of contract between developers and open source components/communities is sometimes seen as a source of hazard in using open source components. The recent event-stream hack, where an upstream component was injected with malicious code by a developer who had taken over maintaining the package, illustrates the problems of outsourcing technical dependencies without a contract. In this case, the quality problem is manifest as a supply chain cybersecurity problem.

In Williamson’s analysis, these kinds of hazards are what drive firms away from purchasing on spot markets and towards contracting or in-house development. In practice, the role of open source support companies fills the role of being a responsible entity G that firm F can build a relationship with.

Advertisements

“the privatization of public functions”

An emerging theme from the conference on Trade Secrets and Algorithmic Systems was that legal scholars have become concerned about the privatization of public functions. For example, the use of proprietary risk assessment tools instead of the discretion of judges who are supposed to be publicly accountable is a problem. More generally, use of “trade secrecy” in court settings to prevent inquiry into software systems is bogus and moves more societal control into the realm of private ordering.

Many remedies were proposed. Most involved some kind of disclosure and audit to experts. The most extreme form of disclosure is making the software and, where it’s a matter of public record, training data publicly available.

It is striking to me to be encountering the call for government use of open source systems because…this is not a new issue. The conversation about federal use of open source software was alive and well over five years ago. Then, the arguments were about vendor lock-in; now, they are about accountability of AI. But the essential problem of whether core governing logic should be available to public scrutiny, and the effects of its privatization, have been the same.

If we are concerned with the reliability of a closed and large-scale decision-making process of any kind, we are dealing with problems of credibility, opacity, and complexity. The prospects of an efficient market for these kinds of systems are dim. These market conditions are the conditions of sustainability of open source infrastructure. Failures in sustainability are manifest as software vulnerabilities, which are one of the key reasons why governments are warned against OSS now, though the process of measurement and evaluation of OSS software vulnerability versus proprietary vulnerabilities is methodologically highly fraught.

open source sustainability and autonomy, revisited

Some recent chats with Chris Holdgraf and colleagues at NYU interested in “critical digital infrastracture” have gotten me thinking again about the sustainability and autonomy of open source projects again.

I’ll admit to having had naive views about this topic in the past. Certainly, doing empirical data science work on open source software projects has given me a firmer perspective on things. Here are what I feel are the hardest earned insights on the matter:

  • There is tremendous heterogeneity in open source software projects. Almost all quantitative features of these projects fall in log-normal distributions. This suggests that the keys to open source software success are myriad and exogenous (how the technology fits in the larger ecosystem, how outside funding and recognition is accomplished, …) rather than endogenous factors (community policies, etc.) While many open source projects start as hobby and unpaid academic projects, those that go on to be successful find one or more funding sources. This funding is an exogenous factor.
  • The most significant exogenous factors to an open source software project’s success are the industrial organization of private tech companies. Developing an open technology is part of the strategic repertoire of these companies: for example, to undermine the position of a monopolist, developing an open source alternative decreases barriers to market entry and allows for a more competitive field in that sector. Another example: Google funded Mozilla for so long arguably to deflect antitrust action over Google Chrome.
  • There is some truth to Chris Kelty’s idea of open source communities as recursive publics, cultures that have autonomy that can assert political independence at the boundaries of other political forces. This autonomy comes from: the way developers of OSS get specific and valuable human capital in the process of working with the software and their communities; the way institutions begin to depend on OSS as part of their technical stack, creating an installed base; and how many different institutions may support the same project, creating competition for the scarce human capital of the developers. Essentially, at the point where the software and the skills needed to deploy it effectively and the community of people with those skills is self-organized, the OSS community has gained some economic and political autonomy. Often this autonomy will manifest itself in some kind of formal organization, whether a foundation, a non-profit, or a company like Redhat or Canonical or Enthought. If the community is large and diverse enough it may have multiple organizations supporting it. This is in principle good for the autonomy of the project but may also reflect political tensions that can lead to a schism or fork.
  • In general, since OSS development is internally most often very fluid, with the primary regulatory mechanism being the fork, the shape of OSS communities is more determined by exogenous factors than endogenous ones. When exogenous demand for the technology rises, the OSS community can find itself with a ‘surplus’, which can be channeled into autonomous operations.

fancier: scripts to help manage your Twitter account, in Python

My Twitter account has been a source of great entertainment, distraction, and abuse over the years. It is time that I brought it under control. I am too proud and too cheap to buy a professional grade Twitter account manager, and so I’ve begun developing a new suite of tools in Python that will perform the necessary tasks for me.

I’ve decided to name these tools fancier, because the art and science of breeding domestic pigeons is called pigeon fancying. Go figure.

The project is now available on GitHub, and of course I welcome any collaboration or feedback!

At the time of this writing, the project has only one feature: it searches through who you follow on Twitter, finds which accounts are both inactive in 90 days and don’t follow you back, and then unfollows them.

This is a common thing to try to do when grooming and/or professionalizing your Twitter account. I saw a script for this shared in a pastebin years ago, but couldn’t find it again. There are some on-line services that will help you do this, but they charge a fee to do it at scale. Ergo: the open source solution. Voila!

moved BigBang core repository to DATACTIVE organization

I made a small change this evening which I feel really, really good about.

I transferred the BigBang project from my personal GitHub account to the datactive organization.

I’m very grateful for DATACTIVE‘s interest in BigBang and am excited to turn over the project infrastructure to their stewardship.

trust issues and the order of law and technology cf @FrankPasquale

I’ve cut to the last chapter of Pasquale’s The Black Box Society, “Towards an Intelligible Society.” I’m interested in where the argument goes. I see now that I’ve gotten through it that the penultimate chapter has Pasquale’s specific policy recommendations. But as I’m not just reading for policy and framing but also for tone and underlying theoretical commitments, I think it’s worth recording some first impressions before doubling back.

These are some points Pasquale makes in the concluding chapter that I wholeheartedly agree with:

  • A universal basic income would allow more people to engage in high risk activities such as the arts and entrepreneurship and more generally would be great for most people.
  • There should be publicly funded options for finance, search, and information services. A great way to provide these would be to fund the development of open source algorithms for finance and search. I’ve been into this idea for so long and it’s great to see a prominent scholar like Pasquale come to its defense.
  • Regulatory capture (or, as he elaborates following Charles Lindblom, “regulatory circularity”) is a problem. Revolving door participation in government and business makes government regulation an unreliable protector of the public interest.

There is quite a bit in the conclusion about the specifics of regulation the finance industry. There is an impressive amount of knowledge presented about this and I’ll admit much of it is over my head. I’ll probably have a better sense of it if I get to reading the chapter that is specifically about finance.

There are some things that I found bewildering or off-putting.

For example, there is a section on “Restoring Trust” that talks about how an important problem is that we don’t have enough trust in the reputation and search industries. His solution is to increase the penalties that the FTC and FCC can impose on Google and Facebook for its e.g. privacy violations. The current penalties are too trivial to be effective deterrence. But, Pasquale argues,

It is a broken enforcement model, and we have black boxes to thank for much of this. People can’t be outraged by what they can’t understand. And without some public concern about the trivial level of penalties for lawbreaking here, there are no consequences for the politicians ultimately responsible for them.

The logic here is a little mad. Pasquale is saying that people are not outraged enough by search and reputation companies to demand harsher penalties, and this is a problem because people don’t trust these companies enough. The solution is to convince people to trust these companies less–get outraged by them–in order to get them to punish the companies more.

This is a bit troubling, but makes sense based on Pasquale’s theory of regulatory circularity, which turns politics into a tug-of-war between interests:

The dynamic of circularity teaches us that there is no stable static equilibrium to be achieved between regulators and regulated. The government is either pushing industry to realize some public values in its activities (say, by respecting privacy or investing in sustainable growth), or industry is pushing regulators to promote its own interests.

There’s a simplicity to this that I distrust. It suggests for one that there are no public pressures on industry besides the government such as consumer’s buying power. A lot of Pasquale’s arguments depend on the monopolistic power of certain tech giants. But while network effects are strong, it’s not clear whether this is such a problem that consumers have no market buy in. In many cases tech giants compete with each other even when it looks like they aren’t. For example, many many people have both Facebook and Gmail accounts. Since there is somewhat redundant functionality in both, consumers can rather seemlessly allocate their time, which is tied to advertising revenue, according to which service they feel better serves them, or which is best reputationally. So social media (which is a bit like a combination of a search and reputation service) is not a monopoly. Similarly, if people have multiple search options available to them because, say, the have both Siri on their smart phone and can search Google directly, then that provides an alternative search market.

Meanwhile, government officials are also often self-interested. If there is a road to hell for industry that is to provide free web services to people to attain massive scale, then abuse economic lock-in to extract value from customers, then lobby for further rent-seeking, there is a similar road to hell in government. It starts with populist demagoguery, leads to stable government appointment, and then leverages that power for rents in status.

So, power is power. Everybody tries to get power. The question is what you do once you get it, right?

Perhaps I’m reading between the lines too much. Of course, my evaluation of the book should depend most on the concrete policy recommendations which I haven’t gotten to yet. But I find it unfortunate that what seems to be a lot of perfectly sound history and policy analysis is wrapped in a politics of professional identity that I find very counterproductive. The last paragraph of the book is:

Black box services are often wondrous to behold, but our black-box society has become dangerously unstable, unfair, and unproductive. Neither New York quants nor California engineers can deliver a sound economy or a secure society. Those are the tasks of a citizenry, which can perform its job only as well as it understands the stakes.

Implicitly, New York quants and California engineers are not citizens, to Pasquale, a law professor based in Maryland. Do all real citizens live around Washington, DC? Are they all lawyers? If the government were to start providing public information services, either by hosting them themselves or by funding open source alternatives, would he want everyone designing these open algorithms (who would be quants or engineers, I presume) to move to DC? Do citizens really need to understand the stakes in order to get this to happen? When have citizens, en masse, understood anything, really?

Based on what I’ve read so far, The Black Box Society is an expression of a lack of trust in the social and economic power associated with quantification and computing that took off in the past few dot-com booms. Since expressions of lack of trust for these industries is nothing new, one might wonder (under the influence of Foucault) how the quantified order and the critique of the quantified order manage to coexist and recreate a system of discipline that includes both and maintains its power as a complex of superficially agonistic forces. I give sincere credit to Pasquale for advocating both series income redistribution and public investment in open technology as ways of disrupting that order. But when he falls into the trap of engendering partisan distrust, he loses my confidence.

Innovation, automation, and inequality

What is the economic relationship between innovation, automation, and inequality?

This is a recurring topic in the discussion of technology and the economy. It comes up when people are worried about a new innovation (such as data science) that threatens their livelihood. It also comes up in discussions of inequality, such as in Piketty’s Capital in the Twenty-First Century.

For technological pessimists, innovation implies automation, and automation suggests the transfer of surplus from many service providers to a technological monopolist providing a substitute service at greater scale (scale being one of the primary benefits of automation).

For Piketty, it’s the spread of innovation in the sense of the education of skilled labor that is primary force that counteracts capitalism’s tendency towards inequality and (he suggests) the implied instability. For the importance Piketty places on this process, he treats it hardly at all in his book.

Whether or not you buy Piketty’s analysis, the preceding discussion indicates how innovation can cut both for and against inequality. When there is innovation in capital goods, this increases inequality. When there is innovation in a kind of skilled technique that can be broadly taught, that decreases inequality by increasing the relative value of labor to capital (which is generally much more concentrated than labor).

I’m a software engineer in the Bay Area and realize that it’s easy to overestimate the importance of software in the economy at large. This is apparently an easy mistake for other people to make as well. Matthew Rognlie, the economist who has been declared Piketty’s latest and greatest challenger, thinks that software is an important new form of capital and draws certain conclusions based on this.

I agree that software is an important form of capital–exactly how important I cannot yet say. One reason why software is an especially interesting kind of capital is that it exists ambiguously as both a capital good and as a skilled technique. While naively one can consider software as an artifact in isolation from its social environment, in the dynamic information economy a piece of software is only as good as the sociotechnical system in which it is embedded. Hence, its value depends both on its affordances as a capital good and its role as an extension of labor technique. It is perhaps easiest to see the latter aspect of software by considering it a form of extended cognition on the part of the software developer. The human capital required to understand, reproduce, and maintain the software is attained by, for example, studying its source code and documentation.

All software is a form of innovation. All software automates something. There has been a lot written about the potential effects of software on inequality through its function in decision-making (for example: Solon Barocas, Andrew D. Selbst, “Big Data’s Disparate Impact” (link).) Much less has been said about the effects of software on inequality through its effects on industrial organization and the labor market. After having my antennas up for this for many reasons, I’ve come to a conclusion about why: it’s because the intersection between those who are concerned about inequality in society and those that can identify well enough with software engineers and other skilled laborers is quite small. As a result there is not a ready audience for this kind of analysis.

However unreceptive society may be to it, I think it’s still worth making the point that we already have a very common and robust compromise in the technology industry that recognizes software’s dual role as a capital good and labor technique. This compromise is open source software. Open source software can exist both as an unalienated extension of its developer’s cognition and as a capital good playing a role in a production process. Human capital tied to the software is liquid between the software’s users. Surplus due to open software innovations goes first to the software users, then second to the ecosystem of developers who sell services around it. Contrast this with the proprietary case, where surplus goes mainly to a singular entity that owns and sells the software rights as a monopolist. The former case is vastly better if one considers societal equality a positive outcome.

This has straightforward policy implications. As an alternative to Piketty’s proposed tax on capital, any policies that encourage open source software are ones that combat societal inequality. This includes procurement policies, which need not increase government spending. On the contrary, if governments procure primarily open software, that should lead to savings over time as their investment leads to a more competitive market for services. Equivalently, R&D funding to open science institutions results in more income equality than equivalent funding provided to private companies.

Hirschman, Nigerian railroads, and poor open source user interfaces

Hirschman says he got the idea for Exit, Voice, and Loyalty when studying the failure of the Nigerian railroad system to improve quality despite the availability of trucking as a substitute for long-range shipping. Conventional wisdom among economists at the time was that the quality of a good would suffer when it was provisioned by a monopoly. But why would a business that faced healthy competition not undergo the management changes needed to improve quality?

Hirschman’s answer is that because the trucking option was so readily available as an alternative, there wasn’t a need for consumers to develop their capacity for voice. The railroads weren’t hearing the complaints about their service, they were just seeing a decline in use as their customers exited. Meanwhile, because it was a monopoly, loss in revenue wasn’t “of utmost gravity” to the railway managers either.

The upshot of this is that it’s only when customers are locked in that voice plays a critical role in the recuperation mechanism.

This is interesting for me because I’m interested in the role of lock-in in software development. In particular, one argument made in favor of open source software is that because it is not technology held by a single firm, users of the software are not locked-in. Their switching costs are reduced, making the market more liquid and, in theory favorable.

You can contrast this with proprietary enterprise software, where vendor lock-in is a principle part of the business model as this establishes the “installed base” and customer support armies are necessary for managing disgruntled customer voice. Or, in the case of social media such as Facebook, network effects create a kind of perceived consumer lock-in and consumer voice gets articulated by everybody from Twitter activists to journalists to high-profile academics.

As much as it pains me to admit it, this is one good explanation for why the user interfaces of a lot of open source software projects are so bad specifically if you combine this mechanism with the idea that user-centered design is important for user interfaces. Open source projects generally make it easy to complain about the software. If they know what they are doing at all, they make it clear how to engage the developers as a user. There is a kind of rumor out there that open source developers are unfriendly towards users and this is perhaps true when users are used to the kind of customer support that’s available on a product for which there is customer lock-in. It’s precisely this difference between exit culture and voice culture, driven by the fundamental economics of the industry, that creates this perception. Enterprise open source business models (I’m thinking about models like the Pentaho ‘beekeeper’) theoretically provide a corrective to this by being an intermediary between consumer voice and developer exit.

A testable hypothesis is whether and to what extent a software project’s responsiveness to tickets scales with the number of downstream dependent projects. In software development, technical architecture is a reasonable proxy for industrial organization. A widely used project has network effects that increasing switching costs for its downstream users. How do exit and voice work in this context?

The node.js fork — something new to think about

For Classics we are reading Albert Hirschman’s Exit, Voice, and Loyalty. Oddly, though normally I hear about ‘voice’ as an action from within an organization, the first few chapters of the book (including the introduction of the Voice concept itselt), are preoccupied with elaborations on the neoclassical market mechanism. Not what I expected.

I’m looking for interesting research use cases for BigBang, which is about analyzing the sociotechnical dynamics of collaboration. I’m building it to better understand open source software development communities, primarily. This is because I want to create a harmonious sociotechnical superintelligence to take over the world.

For a while I’ve been interested in Hadoop’s interesting case of being one software project with two companies working together to build it. This is reminiscent (for me) of when we started GeoExt at OpenGeo and Camp2Camp. The economics of shared capital are fascinating and there are interesting questions about how human resources get organized in that sort of situation. In my experience, there becomes a tension between the needs of firms to differentiate their products and make good on their contracts and the needs of the developer community whose collective value is ultimately tied to the robustness of their technology.

Unfortunately, building out BigBang to integrate with various email, version control, and issue tracking backends is a lot of work and there’s only one of me right now to both build the infrastructure, do the research, and train new collaborators (who are starting to do some awesome work, so this is paying off.) While integrating with Apache’s infrastructure would have been a smart first move, instead I chose to focus on Mailman archives and git repositories. Google Groups and whatever Apache is using for their email lists do not publish their archives in .mbox format, which is pain for me. But luckily Google Takeout does export data from folks’ on-line inbox in .mbox format. This is great for BigBang because it means we can investigate email data from any project for which we know an insider willing to share their records.

Does a research ethics issue arise when you start working with email that is openly archived in a difficult format, then exported from somebody’s private email? Technically you get header information that wasn’t open before–perhaps it was ‘private’. But arguably this header information isn’t personal information. I think I’m still in the clear. Plus, IRB will be irrelevent when the robots take over.

All of this is a long way of getting around to talking about a new thing I’m wondering about, the Node.js fork. It’s interesting to think about open source software forks in light of Hirschman’s concepts of Exit and Voice since so much of the activity of open source development is open, virtual communication. While you might at first think a software fork is definitely a kind of Exit, it sounds like IO.js was perhaps a friendly fork of just somebody who wanted to hack around. In theory, code can be shared between forks–in fact this was the principle that GitHub’s forking system was founded on. So there are open questions (to me, who isn’t involved in the Node.js community at all and is just now beginning to wonder about it) along the lines of to what extent a fork is a real event in the history of the project, vs. to what extent it’s mythological, vs. to what extent it’s a reification of something that was already implicit in the project’s sociotechnical structure. There are probably other great questions here as well.

A friend on the inside tells me all the action on this happened (is happening?) on the GitHub issue tracker, which is definitely data we want to get BigBang connected with. Blissfully, there appear to be well supported Python libraries for working with the GitHub API. I expect the first big hurdle we hit here will be rate limiting.

Though we haven’t been able to make integration work yet, I’m still hoping there’s some way we can work with MetricsGrimoire. They’ve been a super inviting community so far. But our software stacks and architecture are just different enough, and the layers we’ve built so far thin enough, that it’s hard to see how to do the merge. A major difference is that while MetricsGrimoire tools are built to provide application interfaces around a MySQL data backend, since BigBang is foremost about scientific analysis our whole data pipeline is built to get things into Pandas dataframes. Both projects are in Python. This too is a weird microcosm of the larger sociotechnical ecosystem of software production, of which the “open” side is only one (important) part.