Digifesto

Autonomy as link between privacy and cybersecurity

A key aspect of the European approach to privacy and data protection regulation is that it’s rooted in the idea of an individual’s autonomy. Unlike an American view of privacy which suggests that privacy is important only because it implies some kind of substantive harm—such as reputational loss or discrimination–in European law it’s understood that personal data matters because of its relevance to a person’s self-control.

Autonomy etymologically is “self-law”. It is traditionally associated with the concept of rationality and the ability to commit oneself to duty. My colleague Jake Goldenfein argues that autonomy is the principle that one has the power to express one’s own narrative about oneself, and for that narrative to have power. Uninterpretable and unaccountable surveillance, “nudging”, manipulation, profiling, social sorting, and so on are all in a sense an attack on autonomy. They interfere with the individual’s capacity to self-rule.

It is more rare to connect the idea of autonomy to cybersecurity, though here the etymology of the words also weighs in favor of it. Cyber- has its root in in Greek kybernetes, for steersman, governor, pilot, or rudder. To be secure means to be free from threat. So cybersecurity for a person or organization is the freedom of their (self-control) from external threat. Cybersecurity is the condition of being free to control oneself–to be autonomous.

Understood in this way, privacy is just one kind of cybersecurity: the cybersecurity of the individual person. We can speak additionally of the cybersecurity of a infrastructure, such as a power grid, or of an organization, such as a bank, or of a device, such as a smartphone. What both the privacy and cybersecurity discussions implicate are questions of the ontology of the entities involved and their ability to control themselves and control each other.

Open Source Software (OSS) and regulation by algorithms

It has long been argued that technology, especially built infrastructure, has political implications (Winner, 1980). With the rise of the Internet as the dominating form of technological infrastructure, Lessig (1999), among others, argued that software code is a regulating force parallel to the law. By extension of this argument, we would expect open source software to be a regulating force in society.

This is not the case. There is a lot of open source software and much of it is very important. But there’s no evidence to say that open source software, in and of itself, regulates society except in the narrow sense in that the communities that build and maintain it are necessarily constrained by its technical properties.

On the other hand, there are countless platforms and institutions that deploy open source software as part of their activity, which does have a regulating force on society. The Big Tech companies that are so powerful that they seem to rival some states are largely built on an “open core” of software. Likewise for smaller organizations. OSS is simply part of the the contemporary software production process, and it is ubiquitous.

Most widely used programming languages are open source. Perhaps a good analogy for OSS is that it is a collection of languages, and literatures in those languages. These languages and much of their literature are effectively in the public domain. We might say the same thing about English or Chinese or German or Hindi.

Law, as we know it in the modern state, is a particular expression of language with purposeful meaning. It represents, at its best, a kind of institutional agreement that constraints behavior based on its repetition and appeals to its internal logic. The Rule of Law, as we know it, depends on the social reproduction of this linguistic community. Law Schools are the main means of socializing new lawyers, who are then credentialed to participate in and maintain the system which regulates society. Lawyers are typically good with words, and their practice is in a sense constrained by their language, but only in the widest of Sapir-Whorf senses. Law is constrained more the question of which language is institutionally recognized; indeed, those words and phrases that have been institutionally ratified are the law.

Let’s consider again the generative question of whether law could be written in software code. I will leave aside for a moment whether or not this would be desirable. I will entertain the idea in part because I believe that it is inevitable, because of how the algorithm is the form of modern rationality (Totaro and Ninno, 2014) and the evolutionary power of rationality.

A law written in software would need to be written in a programming language and this would all but entail that it is written on an “open core” of software. Concretely: one might write laws in Python.

The specific code in the software law might or might not be open. There might one day be a secretive authoritarian state with software laws that are not transparent or widely known. Nothing rules that out.

We could imagine a more democratic outcome as well. It would be natural, in a more liberal kind of state, for the software laws to be open on principle. The definitions here become a bit tricky: the designation of “open source software” is one within the schema of copyright and licensing. Could copyright laws and license be written in software? In other words, could the ‘openness’ of the software laws be guaranteed by their own form? This is an interesting puzzle for computer scientists and logicians.

For the sake of argument, suppose that something like this is accomplished. Perhaps it is accomplished merely by tradition: the institution that ratifies software laws publishes these on purpose, in order to facilitate healthy democratic debate about the law.

Even with all this in place, we still don’t have regulation. We have now discussed software legislation, but not software execution and enforcement. If software is only as powerful as the expression of a language. A deployed system, running that software, is an active force in the world. Such a system implicates a great many things beyond the software itself. It requires computers and and networking infrastructure. It requires databases full of data specific to the applications for which its ran.

The software dictates the internal logic by which a system operates. But that logic is only meaningful when coupled with an external societal situation. The membrane between the technical system and the society in which it participates is of fundamental importance to understanding the possibility of technical regulation, just as the membrane between the Rule of Law and society–which we might say includes elections and the courts in the U.S.–is of fundamental importance to understanding the possibility of linguistic regulation.

References

Lessig, L. (1999). Code is law. The Industry Standard18.

Hildebrandt, M. (2015). Smart technologies and the end (s) of law: Novel entanglements of law and technology. Edward Elgar Publishing.

Totaro, P., & Ninno, D. (2014). The concept of algorithm as an interpretative key of modern rationality. Theory, Culture & Society31(4), 29-49.

Winner, L. (1980). Do artifacts have politics?. Daedalus, 121-136.

The diverging philosophical roots of U.S. and E.U. privacy regimes

For those in the privacy scholarship community, there is an awkward truth that European data protection law is going to a different direction from U.S. Federal privacy law. A thorough realpolitical analysis of how the current U.S. regime regarding personal data has been constructed over time to advantage large technology companies can be found in Cohen’s Between Truth and Power (2019). There is, to be sure, a corresponding story to be told about EU data protection law.

Adjacent, somehow, to the operations of political power are the normative arguments leveraged both in the U.S. and in Europe for their respective regimes. Legal scholarship, however remote from actual policy change, remains as a form of moral inquiry. It is possible, still, that through professional training of lawyers and policy-makers, some form of ethical imperative can take root. Democratic interventions into the operations of power, while unlikely, are still in principle possible: but only if education stays true to principle and does not succumb to mere ideology.

This is not easy for educational institutions to accomplish. Higher education certainly is vulnerable to politics. A stark example of this was the purging of Marxist intellectuals from American academic institutions under McCarthyism. Intellectual diversity in the United States has suffered ever since. However, this was only possible because Marxism as a philosophical movement is extraneous to the legal structure of the United States. It was never embedded at a legal level in U.S. institutions.

There is a simply historical reason for this. The U.S. legal system was founded under a different set of philosophical principles; that philosophical lineage still impacts us today. The Founding Fathers were primarily influenced by John Locke. Locke rose to prominence in Britain when the Whigs, a new bourgeois class of Parliamentarian merchant leaders, rose to power, contesting the earlier monarchy. Locke’s political contributions were a treatise pointing out the absurdity of the Divine Right of Kings, the prevailing political ideology of the time, and a second treatise arguing for a natural right to property based on the appropriation of nature. This latter political philosophy was very well aligned with Britain’s new national project of colonialist expansion. With the founding of the United States, it was enshrined into the Constitution. The liberal system of rights that we enjoy in the U.S. are founded in the Lockean tradition.

Intellectual progress in Europe did not halt with Locke. Locke’s ideas were taken up by David Hume, whose introduced arguments that were so agitating that they famously woke Immanuel Kant, in Germany, from his “dogmatic slumber”, leading him to develop a new highly systematic system of morality and epistemology. Among the innovations in this work was the idea that human freedom is grounded in the dignity of being an autonomous person. The source of dignity is not based in a natural process such as the tilling of land. It is rather based in on transcendental facts about what it means to be human. The key to morality is treating people like ends, not means; in other words, not using people as tools to other aims, but as aims in themselves.

If this sound overly lofty to an American audience, it’s because this philosophical tradition has never taken hold in American education. In both the United Kingdom and Britain, Kantian philosophy has always been outside the mainstream. The tradition of Locke, through Hume, has continued on in what philosophers will call “analytic philosophy”. This philosophy has taken on the empiricist view that the only source of knowledge is individual experience. It has transformed over centuries but continues to orbit around the individual and their rights, grounded in pragmatic considerations, and learning normative rules using the case-by-case approach of Common Law.

From Kant, a different “continental philosophy” tradition produced Hegel, who produced Marx. We can trace from Kant’s original arguments about how morality is based on the transcendental dignity of the individual to the moralistic critique that Marx made against capitalism. Capitalism, Marx argued, impugns the dignity of labor because it treats it like a means, not an end. No such argument could take root in a Lockean system, because Lockean ethics has no such prescription against treating others instrumentally.

Germany lost its way at the start of the 20th century. But the post-war regime, funded by the Marshall plan, directed by U.S. constitutional scholars as well as repatriating German intellectuals, had the opportunity to rewrite their system of governance. They did so along Kantian lines: with statutory law, reflecting a priori rational inquiry, instead of empiricist Common Law. They were able to enshrine into their system the Kantian basis of ethics, with its focus on autonomy.

Many of the intellectuals influencing the creation of the new German state were “Marxist” in the loose sense that they were educated in the German continental intellectual tradition which, at that time, included Marx as one of its key figures. By the mid-20th century they had naturally surpassed this ideological view. However, as a consequence, the McCarthyist attack on Marxism had the effect of also purging some of the philosophical connection between German and U.S. legal education. Kantian notions of autonomy are still quite foreign to American jurisprudence. Legal arguments in the United States draw instead on a vast collection of other tools based on a much older and more piecemeal way of establishing rights. But are any of these tools up to the task of protecting human dignity?

The EU is very much influenced by Germany and the German legal system. The EU has the Kantian autonomy ethic at the heart of its conception of human rights. This philosophical commitment has recently expressed itself in the EU’s assertion of data protection law through the GDPR, whose transnational enforcement clauses have brought this centuries-old philosophical fight into contemporary legal debate in legal jurisdictions that predate the neo-Kantian legal innovations of Continental states.

The puzzle facing American legal scholars is this: while industrial advocates and representatives tend to disagree with the strength of the GDPR, arguing that it is unworkable and/or based on poorly defined principle, the data protections that it offer seem so far to be compelling to users, and the shifting expectations around privacy in part induced by it are having effects on democratic outcomes (such as the CCPA). American legal scholars now have to try to make sense of the GDPR’s rules and find a normative basis for them. How can these expansive ideas of data protection, which some have had the audacity to argue is a new right (Hildebrandt, 2015), be grafted onto the the Common Law, empiricist legal system in a way that gives it the legitimacy of being an authentically American project? Is there a way to explain data protection law that does not require the transcendental philosophical apparatus which, if adopted, would force the American mind to reconsider in a fundamental way the relationship between individuals and the collective, labor and capital, and other cornerstones of American ideology?

There may or may not be. Time will tell. My own view is that the corporate powers, which flourished under the Lockean judicial system because of the weaknesses in that philosophical model of the individual and her rights, will instinctively fight what is in fact a threatening conception of the person as autonomous by virtue of their transcendental similarity with other people. American corporate power will not bother to make a philosophical case at all; it will operate in the domain of realpolitic so well documented by Cohen. Even if this is so, it is notable that so much intellectual and economic energy is now being exerted in the friction around a poweful an idea.

References

Cohen, J. E. (2019). Between Truth and Power: The Legal Constructions of Informational Capitalism. Oxford University Press, USA.

Hildebrandt, M. (2015). Smart technologies and the end (s) of law: Novel entanglements of law and technology. Edward Elgar Publishing.

Notes on Krussell & Smith, 1998 and macroeconomic theory

I’m orienting towards a new field through my work on HARK. A key paper in this field is Krusell and Smith, 1998 “Income and wealth heterogeneity in the macroeconomy.” The learning curve here is quite steep. These are, as usual, my notes as I work with this new material.

Krusell and Smith are approaching the problem of macroeconomic modeling on a broad foundation. Within this paradigm, the economy is imagined as a large collection of people/households/consumers/laborers. These exist at a high level of abstraction and are imagined to be intergenerationally linked. A household might be an immortal dynasty.

There is only one good: capital. Capital works in an interesting way in the model. It is produced every time period by a combination of labor and other capital. It is distributed to the households, apportioned as both a return on household capital and as a wage for labor. It is also consumed each period, for the utility of the households. So all the capital that exists does so because it was created by labor in a prior period, but then saved from immediate consumption, then reinvested.

In other words, capital in this case is essentially money. All other “goods” are abstracted way into this single form of capital. The key thing about money is that it can be saved and reinvested, or consumed for immediate utility.

Households also can labor, when they have a job. There is an unemployment rate and in the model households are uniformly likely to be employed or not, no matter how much money they have. The wage return on labor is determined by an aggregate economic productivity function. There are good and bad economic periods. These are determine exogenously and randomly. There are good times and bad times; employment rates are determined accordingly. One major impetus for saving is insurance for bad times.

The problem raised by Krusell and Smith in this, what they call their ‘baseline model’, is that because all households are the same, the equilibrium distribution of wealth is far too even compared with realistic data. It’s more normally distributed than log-normally distributed. This is implicitly a critique at all prior macroeconomics, which had used the “representative agent” assumption. All agents were represented by one agent. So all agents are approximately as wealthy as all others.

Obviously, this is not the case. This work was done in the late 90’s, when the topic of wealth inequality was not nearly as front-and-center as it is in, say, today’s election cycle. It’s interesting that one reason why it might have not been front and center was because prior to 1998, mainstream macroeconomic theory didn’t have an account of how there could be such inequality.

The Krusell-Smith model’s explanation for inequality is, it must be said, a politically conservative one. They introduce minute differences in utility discount factor. The discount factor is how much you discount future utility compared to today’s utility. If you have a big discount factor, you’re going to want to consume more today. If you have a small discount factor, you’re more willing to save for tomorrow.

Krussell and Smith show that teeny tiny differences in discount factor, even if they are subject to a random walk around a mean with some persistence within households, leads to huge wealth disparities. Their conclusion is that “Poor households are poor because they’ve chosen to be poor”, by not saving more for the future.

I’ve heard, like one does, all kinds of critiques of Economics as an ideological discipline. It’s striking to read a landmark paper in the field with this conclusion. It strikes directly against other mainstream political narratives. For example, there is no accounting of “privilege” or inter-generational transfer of social capital in this model. And while they acknowledge that in other papers there is the discussion of whether having larger amounts of household capital leads to larger rates of return, Kruselll and Smith sidestep this and make it about household saving.

The tools and methods in the paper are quite fascinating. I’m looking forward to more work in this domain.

References

Krusell, P., & Smith, Jr, A. A. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of political Economy106(5), 867-896.

Herbert Simon and the missing science of interagency

Few have ever written about the transformation of organizations by information technology with the clarity of Herbert Simon. Simon worked at a time when disciplines were being reconstructed and a shift was taking place. Older models of economic actors as profit maximizing agents able to find their optimal action were giving way as both practical experience and the exact sciences told a different story.

The rationality employed by firms today is not the capacity to choose the best action–what Simon calls substantive rationality. It is the capacity to engage in steps to discover better ways of acting–procedural rationality.

So we proceed step by step from the simple caricature of the firm depicted in textbooks to the complexities of real firms in the real world of business. At each step towards realism, the problem gradually changes from choosing the right course of action (substantive rationality) to finding way of calculating, very approximately, where a good course of action lies (procedural rationality). With this shift, the theory of the firm becomes a theory of estimation under uncertainty and a theory of computation.

Simon goes on to briefly describe the fields that he believes are poised to drive the strategic behavior of firms. These are Operations Research (OR) and artificial intelligence (AI). The goal of both these fields is to translate problems into mathematical specifications that can be executed by computers. There is some variation within these fields as to whether they aim at satisficing solutions or perfect answers to combinatorial problems, but for the purposes to this article they are the same–certainly the fields have cross-pollinated much since 1969.

Simon’s analysis was prescient. The impact of OR and AI on organizations simply can’t be understated. My purpose in writing this is to point to the still unsolved analytical problems of this paradigm. Simon notes that the computational techniques he refers to percolate only so far up the corporate ladder.

OR and AI have been applied mainly to business decisions at the middle levels of management. A vast range of top management decisions (e..g. strategic decisions about investment, R&D, specialization and diversification, recruitment, development, and retention of managerial talent) are still mostly handled traditionally, that is, by experienced executives’ exercise of judgment.

Simon’s proposal for how to make these kinds of decisions more scientific is the paradigm of “expert systems”, which did not, as far as I know, take off. However, these were early days, and indeed at large firms AI techniques are used to make these kinds of executive decisions. Though perhaps equally, executives defend their own prerogative for human judgment, for better or for worse.

The unsolved scientific problem that I find very motivating is based on a subtle divergence of how the intellectual fields have proceeded. Surely economic value and consequences of business activities are wrapped up not in the behavior of an individual firm, but of many firms. Even a single firm contains many agents. While in the past the need for mathematical tractability led to assumptions of perfect rationality for these agents, we are now far past that and “the theory of the firm becomes a theory of estimation under uncertainty and a theory of computation.” But the theory of decision-making under uncertainty and the theory of computation are largely poised to address problems of the solving a single agent’s specific task. The OR or AI system fulfills a specific function of middle management; it does not, by and large, oversee the interactions between departments, and so on. The complexity of what is widely called “politics” is not captured yet within the paradigms of AI, though anybody with an ounce of practical experience would note that politics is part of almost any organizational life.

How can these kinds of problems be addressed scientifically? What’s needed is a formal, computational framework for modeling the interaction of heterogeneous agents, and a systematic method of comparing the validity of these models. Interagential activity is necessarily quite complex; this is complexity that does not fit well into any available machine learning paradigm.

References

Simon, H. A. (1969). The sciences of the artificial. Cambridge, MA.

“Private Companies and Scholarly Infrastructure”

I’m proud to link to this blog post on the Cornell Tech Digital Life Initiative blog by Jake Goldenfein, Daniel Griffin, and Eran Toch, and myself.

The academic funding scandals plaguing 2019 have highlighted some of the more problematic dynamics between tech industry money and academia (see e.g. Williams 2019, Orlowski 2017). But the tech industry’s deeper impacts on academia and knowledge production actually stem from the entirely non-scandalous relationships between technology firms and academic institutions. Industry support heavily subsidizes academic work. That support comes in the form of direct funding for departments, centers, scholars, and events, but also through the provision of academic infrastructures like communications platforms, computational resources, and research tools. In light of the reality that infrastructures are themselves political, it is imperative to unpack the political dimensions of scholarly infrastructures provided by big technology firms, and question whether they might problematically impact knowledge production and the academic field more broadly.

Goldenfein, Benthall, Griffin, and Toch, “Private Companies and Scholarly Infrastructure – Google Scholar and Academic Autonomy”, 2019

Among other topics, the post is about how the reorientation of academia onto commercial platforms possibly threatens the autonomy that is a necessary condition of the objectivity of science (Bourdieu, 2004).

This is perhaps a cheeky argument. Questioning whether Big Tech companies have an undue influence on academic work is not a popular move because so much great academic work is funded by Big Tech companies.

On the other hand, calling into question the ethics of Big Tech companies is now so mainstream that it is actively debated in the Democratic 2020 primary by front-running candidates. So we are well within the Overton window here.

On a philosophical level (which is not the primary orientation of the joint work), I wonder how much these concerns are about the relationship between capitalist modes of production and ideology with academic scholarship in general, and how much this specific manifestation (Google Scholar’s becoming the site of a disciplinary collapse (Benthall, 2015) in scholarly metrics is significant. Like many contemporary problems in society and technology, the “problem” may be that a technical intervention that might have at one point seemed like a desirable intervention by challengers (in the Fligstein (1997) field theory sense) is now having the political impact that is questioned and resisted by incumbents. I.e., while there has always been a critique of the system, the system has changed and so the critique comes from a different social source.

References

Benthall, S. (2015). Designing networked publics for communicative action. Interface, 1(1), 3.

Bourdieu, Pierre. Science of science and reflexivity. Polity, 2004.

Fligstein, Neil. “Social skill and institutional theory.” American behavioral scientist 40.4 (1997): 397-405.

Orlowski, A. (2017). Academics “funded” by Google tend not to mention it in their work. The Register, 13 July 2017.

Williams, O. (2019). How Big Tech funds the debate on AI Ethics. New Statesman America, 6 June 2019 < https://www.newstatesman.com/science-tech/technology/2019/06/how-big-tech-funds-debate-ai-ethics>.

Ashby’s Law and AI control

I’ve recently discovered Ashby’s Law, also know as the First Law of Cybernetics, by reading Stafford Beer’s “Designing Freedom” lectures. Ashby’s Law is a powerful idea, one I’ve been grasping at intuitively for some time. For example, here I was looking for something like it and thought I could get it from the Data Processing Inequality in information theory. I have not yet grokked the mathematical definition of Ashby’s Law, which I gather is in Ross Ashby’s An Introduction to Cybernetics. Though I am not sure yet, I expect the formulation there can use an update. But if I am right about its main claims, I think the argument of this post will stand.

Ashby’s Law is framed in terms of ‘variety’, which is the number of states that it is possible for a system to be in. A six-sided die has six possible states (if you’re just looking at the top of it). A laptop has many more. A brain has many more even than that. A complex organization with many people in it, all with laptops, has even more. And so on.

The law can be stated in many ways. One of them is that:

When the variety or complexity of the environment exceeds the capacity of a system (natural or artificial) the environment will dominate and ultimately destroy that system.

The law is about the relationship between a system and its environment. Or, in another sense, it is about a system to be controlled and a different system that tries to control that system. The claim is that the control unit needs to have at least as much variety as the system to be controlled for it to be effective.

This reminds me of an argument I had with a superintelligence theorist back when I was thinking about such things. The Superintelligence people, recall, worry about an AI getting the ability to improve itself recursively and causing an “intelligence explosion”. Its own intelligence, so to speak, explodes, surpassing all other intelligent life and giving it total domination over the fate of humanity.

Here is the argument that I posed a few years ago, reframed in terms of Ashby’s Law:

  • The AI in question is a control unit, C, and the world it would control is the system, S.
  • For the AI to have effective domination over S, C would need at least as much variety as S.
  • But S includes C within it. The control unit is part of the larger world.
  • Hence, no C can perfectly control S.

Superintelligence people will no doubt be unsatisfied by this argument. The AI need not be effective in the sense dictated by Ashby’s Law. It need only be capable of outmaneuvering humans. And so on.

However, I believe the argument gets at why it is difficult for complex control systems to ever truly master the world around them. It is very difficult for a control system to have effective control over itself, let alone itself in a larger systemic context, without some kind of order constraining the behavior of the total system (the system including the control unit) imposed from without. The idea that it is possible to gain total mastery or domination through an AI or better data systems is a fantasy because the technical controls adds their own complexity to the world that is to be controlled.

This is a bit of a paradox, as it raises the question of how any control unites work at all. I’ll leave this for another day.

Bridging between transaction cost and traditional economics

Some time ago I was trying to get my head around transaction cost economics (TCE) because of its implications for the digital economy and cybersecurity. (1, 2, 3, 4, 5). I felt like I had a good grasp of the relevant theoretical claim of TCE which is the interaction between asset specificity and the make-or-buy decision. But I didn’t have a good sense of the mechanism that drove that claim.

I worked it out yesterday.

Recall that in the make or buy decision, a firm is determining whether or not to make some product in-house or to buy it from the market. This is a critical decision made by software and data companies, as often these businesses operate by assembling components and data streams into a new kind of service; these services often are the components and data streams used in other firms. And so on.

The most robust claim of TCE is that if the asset (component, service, data stream) is very specific to the application of the firm, then the firm will be more likely to make it. If the asset is more general-purpose, then it buy it as a commodity on the market.

Why is this? TCE does not attempt to describe this phenomenon in a mathematical model, at least as far as I have found. Nevertheless, this can be worked out with a much more general model of the economy.

Assume that for some technical component there are fix costs f and marginal costs $c$. Consider two extreme cases: in case A, the asset is so specific that only one firm will want to buy it. In case B, the asset is very general so there’s many firms that want to purchase it.

In case A, a vendor will have costs of f + c and so will only make the good if the buyer can compensate them at least that much. At the point where the buyer is paying for both the fixed and marginal costs of the product, they might as well own it! If there are other discovered downstream uses for the technology, that’s a revenue stream. Meanwhile, since the vendor in this case will have lock-in power over the buyer (because switching will mean paying the fixed cost to ramp up a new vendor), that gives the vendor market power. So, better to make the asset.

In case B, there’s broader market demand. It’s likely that there’s already multiple vendors in place who have made the fixed cost investment. The price to the buying firm is going to be closer to c, the market price that converges over time to the fixed cost, as opposed to c =+ f, which includes the fixed costs. Because there are multiple vendors, lock-in is not such an issue. Hence the good becomes a commodity.

A few notes on the implications of this for the informational economy:

  • Software libraries have high fixed cost and low marginal cost. The tendency of companies to tilt to open source cores with their products built on top is a natural result of the market. The modularity of open source software is in part explained by the ways “asset specificity” is shaped exogenously by the kinds of problems that need to be solved. The more general the problem, the more likely the solution has been made available open source. Note that there is still an important transaction cost at work here, the search cost. There’s just so many software libraries.
  • Data streams can vary a great deal as to whether and how they are asset specific. When data streams are highly customized to the downstream buyer, they are specific; the customization is both costly to the vendor and adding value to the buyer. However, it’s rarely possible to just “make” data: it needs to be sourced from somewhere. When firms buy data, it is normally in a subscription model that takes into account industrial organization issues (such as lock in) within the pricing.
  • Engineering talent, and related labor costs, are interesting in that for a proprietary system, engineering human capital gains tend to be asset specific, while for open technologies engineering skill is a commodity. The structure of the ‘tech business’, which requires mastery of open technology in order to build upon it a proprietary system, is a key dynamic that drives the software engineering practice.

There are a number of subtleties I’m missing in this account. I mentioned search costs in software libraries. There’s similar costs and concerns about the inherent riskiness of a data product: by definition, a data product is resolving some uncertainty with respect to some other goal or values. It must always be a kind of credence good. The engineering labor market is quite complex in no small part because it is exposed to the complexities of its products.

The ontology of software, revisited

I’m now a software engineer again after many years doing and studying other things. My first-person experience, my phenomenological relationship with this practice, is different this time around. I’ve been meaning to jot down some notes based on that fresh experience. Happily, there’s resonance with topics of my academic focus as well. I’m trying to tease out these connections.

To briefly recap: There’s a recurring academic discourse around technology ethics. Roughly speaking, it starts with a concern about a newish technology that has media or funding agency interest. Articles then get written capitalizing on this hot topic; these articles are fractured according to the disciplinary background of their authors.

  • Engineers try to come up with an improved version of the technology.
  • Lawyers try to come up with ways to regulate the production and use of the technology broadly speaking.
  • Organizational sociologists come up with institutional practices (‘ethics boards’, ‘contestability’) which would prevent the technology from being misused.
  • Critical theorists argue that the technology would be less worrisome if representational desiderata within the field of technology production were better.
  • … and so on.

This is a very active and interesting discourse, but from my (limited) perspective, is rarely impacts industry practice. This isn’t because people in industry don’t care about the ethical implications of their work. It’s because people in industry are engaged full-time in a different discourse. This is the discourse of industry practitioners.

My industrial background is in software development and data science. Obviously there are other kinds of industrial work–hardware, biotech, etc. But it’s fair to say that a great deal of the production of “technology” in the 21st century is, specifically, software development. And my point here is that software development has its own field of discourse that is rich and vivid and a full-time job to keep up with. Here’s some examples of what I’m getting at:

  • There is always-already a huge world of communication between engineers about what technologies are interesting, how to use them effectively, how they compare with prior technologies, the implications of these trends for technical careers, and so on. Browse Hacker News. Look at industry software conferences.
  • There’s also a huge world of industrial discussion about the social practices of software development. A lot of my knowledge of this is a bit dated. But as I come back to industry, I find myself looking back to now Classic sources on how-to-work-effectively-on-software. I’m linking to articles from Joel Spolsky’s blog. I’m ordering a copy of Fred Brooks’s classic The Mythical Man-Month.
  • I’m reading documentation, endlessly, about how to configure and use the various SaaS, IaaS, PaaS, etc. tools that are now necessary parts of full-stack development. When the documentation is limited, I’m engaging with customer service people of technical products, who have their own advice, practices, etc.

This is a complex world of literature and practice. Part of what makes it complex is that it is always-already densely documented and self-referential, enacted by smart and literate people, most of whom are quite socially skilled. It’s people working full-time jobs in a field that is now over 40 years old.

I’ve argued in other posts that if we want to solve the ‘technology ethics’ problem, we should see it as an economic problem. At a high level, I still believe that’s true. I want to qualify that point though, and say: now that I’m back in a more engage position with respect to the field of technical production, I believe there are institutional/organizational ways to address broader social concerns through interventions on engineering practice.

What is missing, in my view, is a sincere engagement with the nitty-gritty of engineering practice itself. I know there are anthropologists who think they do this. I haven’t read anybody who really does it, in their writing, and I believe the reason for that is: anthropologists writing for other academic anthropologists are not going to write what would be actually useful here, which is a guide for product and project management that would likely recapitulate a lot of conventional (but too often ignored) wisdom about software engineering “best practices”–documentation, testing, articulation of use cases, etc. These are the kinds of things that improve technical quality in a real way.

Now that I write this, I recall that the big ethics research teams at, say, Google, do stuff like this. It’s great.


I was going to say something about the ontology of software.

Recall: I have a position on the ontology of data, which I’ve called Situated Information Flow Theory (SIFT). I worked hard on it. According to SIFT, an information flow is a causal flow situated in a network of other causal relations. The meaning of the information depends on that causally defined situation.

What then is software?

“Software” refers to sets of instructions written by people in a specialized “programming” language as text data, which is then interpreted and compiled by a machine. In paradigmatic industrial practice (I’m simplifying, bear with me), ultimately these instructions will be used to control the behavior of a machine that interfaces with the world in a real-time, consequential way. This latter machine is referred to, internally, as being “in production”.

When you’re programming a technical product, first you write software “in development”. You are writing drafts of code. You get your colleagues to review it. You link up the code you wrote to the code the other team wrote and you see if it works together. There is a long and laborious process of building tests for new requirements and fixing the code so that it meets those requirements. There are designs, and redesigns, of internal and external facing features. The complexity of the total task is divided up into modules; the boundaries of those modules shifts over time. The social structure of the team adapts as new modules become necessary.

There is an isomorphism, a well documented phenomenon in organizational social theory, between the technology being created and the social structure that creates it. The team structure mirrors the software architecture.

When the pieces are in place adequately enough–and when the investors/management has grown impatient enough–the software is finally “deployed to production”. It “goes live”. What was an internal exercise is now a process with reputational consequences for the business, as well as possibly real consequences for the users of the technology.

Inevitably, the version of the product “in production” is not complete. There are errors. There are new features requested. So the technology firm now organizes itself around several “cycles” running at different frequencies in parallel. There’s a “development cycle” of writing new software code. There’s a “release cycle” of packaging new improvements into bundles that are documented and tested for quality. The releases are deployed to production on a schedule. Different components may have different development and release cycles. The impedance match or mismatch between these cycles becomes its own source of robustness or risk. (I’ve done some empirical research work on this.)

What does this mean for the ontology of software?

The first thing it means is that the notion that software is a static artifact, something like either a physical object (like a bicycle) or a publication (like a book) is mostly irrelevant to what’s happening. The software production process depends on the fluidity of source code. When software is deployed “as a service”, it’s dubious for it to qualify as a “creative work”, subject to copyright law, except by virtue of legal inertia. Something totally different is going on.

The second thing it means is that the live technical product is an ongoing institutional accomplishment. It’s absurd to ever say that humans are not “in the loop”. This is one of the big insights of the critical/anthro reaction to “Big Tech” in the past five years or so. But it has also been common knowledge within the industry for fifteen years or so.

The third thing it means is that software is the structuring of a system of causal relations. Software, when it’s deployed, determines what causes what. See above for a definition of the the nature of information: it’s a causal flow situated in other causal relations. The link between software and information then is quite clear and direct. Software (as far as it goes) is a definition of a causal situation.

The fourth thing it means is that software products are the result of agreement between people. Software only makes it into production if it has gotten there through agreed-upon processes by the team that deploys it. The strength of software is in the collective input that went into it. In a sense, software is much more like a contract, in legal terms, than it is like a creative work. In the extended network of human and machine actors, software is the result of, the expression of, self-regulation first. Only secondarily does it, in Lessig’s terms, become a regulatory force more broadly.

What is software? Software is a form of social structure.