artificial intelligence

June 13, 2026

LLMs as computation

LLMs are now”doing” a lot of technical system design and are the object of a great deal of computer science research. However, I’ve surprised by much of the research that crosses my way (admittedly likely not a great sample) treats LLMs as a general form of intelligence without treating it as a form of computation. I expect that some combination of theory of computation (such as algorithmic information theory) and structural economics is needed to get a rigorous handle on the AI economy. This blog post contains some notes toward this end.

As we all know, an LLM is a collection of neural network weights, trained on a massive amount of information, which consumes tokens and emits predicted next tokens. Simplifying a bit, we can model an LLM as a machine that, given a string of tokens, emits a string of tokens.

Let $\Sigma$ be the set of tokens, $\Sigma^*$ be the space of token strings of any length. Perhaps an LLM is a function:

L: \Sigma^* \rightarrow \Sigma^*

Really, this is LLM “inference”. I’m omitting the inherent stochasticity of LLMs — more realistically, $L$ would be a conditional probability distribution. But leave that aside for now.

Assuming that $L$ can consume as input any string, and in principle produce as output any string, what we have here is a class of “universal programming language”, another formal mathematical construct. “universal programming languages” appear in algorithmic information theory.

The simplest form of “universal programming language” is the print function. It repeats as output anything put into it. People (including myself) once joked that LLMs are glorified autocomplete; they clearly do more than this. The weights must matter.

Really, LLMs are parameterized functions — the parameters $\theta$ are weights of the neural network.

L_\theta: \Sigma^* \rightarrow \Sigma^*

The weights are a compression of a great deal of training data $\mathbf{D}$ . Let’s assume training has converted this data to a set of weights $T(\mathbf{D}) \rightarrow \mathbf{\theta}$ . We can refer to this foundation model as $\mathbf{L_\theta}$ or $\mathbf{L_D}$ .

What else can you do with these models? You can provide them ‘context’ — additional strings as input. You can fine-tune them on more data. And you can use them for ‘reasoning’ by chaining inputs and outputs.

Context: allow multiple string inputs $L_\theta(c, i) \rightarrow o$
Fine-tuning: $T(L_D, d) \rightarrow L_{D + d}$ — further compresses additional data $d$ into the model weights
Reasoning: $L^n_\theta(i) \rightarrow L_\theta(L_\theta(…(L_\theta(i))) \rightarrow o$ applies the model recursively $n$ times

So if we want to look at the data and computation pipeline of an LLM based system, we get something like:

(T^n(D,d_1, …d_n))^m(c,i) \rightarrow o

I.e., we train on a base data set and several fine-tuning data sets, pick context and an input, and run inference some number of times. Each of these steps has a cost function, and we can then computer the average costs of solving various sets of problems given the available data, and other statistics. This then can be used to design the most efficient pipelines and markets.

I would be interested in hearing from anybody about whether and how this faithfully captures the essentials of LLMs as a form of computation. This is my ‘mental model’. I have left out tool use and interactivity, among other things, but those can be added in easily.

Why am I writing this? Because I think that clearly articulating the formal properties of LLMs brings a number of issues to light.

First, it foregrounds the importance of training data. Famously, the transformer architecture is very general, and early innovation in LLMs was largely about scaling it up to greater amounts of data. If we are interested in the behavior of LLMs, the training data and the training algorithm are the parts that are not “black boxes” to the model creators.

As we look at the future of LLMs in the economy, we will be looking at the results of differential access to data, as well as what data is commonly available. This shares a lot of patterns with previous iterations of concerns over “big data”, but this is obscured today because of the charisma of the models themselves.

Second, it makes explicit how information can flow and transform into a system output. The information comes first from training and fine-tuning data, then from context, then from system input. If the training and inference algorithms are general enough, none of the information relevant to a specific task comes from those parts of the system. Those algorithms are ‘general computing’.

Third, it breaks up training and inference. While training and inference are not so different in terms of information flow, they are in practice quite different because of their physical and economic costs. Currently, training is more expensive than inference. So, we see a race to, expensively, train general models with more and more data, so that less and less data is needed in context at inference time, and fewer steps are needed during reasoning. A structural model that distinguishes these can discriminate between several investment hypotheses in this space.

Fourth, by revealing LLMs as a form of general data processing and computation, it deflates (in what I think is a good and necessary way) the tendency to see ‘model evaluations’ as the best way to enforce AI accuracy, fairness, privacy, and so on. My general frustration with the model evaluation literature is that LLMs are that if they are a flavor of universal programming language by design, then there will, by definition, always be a jailbreak or a hallucination available to them. A lot of work on ‘guardrails’ at the model level seems to be about making certain kinds of outputs more difficult or expensive to get. As we’ve seen, there will be open models, and they will get fine-tuned by hobbyists and others to get around the guardrails, and so that’s not going to be an effective strategy long term.

This means that a lot of AI product design and regulation seems to be about shifting around the cost functions for achieving certain kinds of outputs with certain data. If ‘bad’ behaviors are expensive, and ‘good’ behaviors are cheap, then we have, in a sense, succeeded. But this means that the underlying economics must be part of the analysis for it to have forward-going relevance and replicability. Today’s model capabilities are a function of whatever the latest investment — at the training and inference level, as well as the data flow of context and inputs, which may go back into training — is. The entire pipeline produces ‘the intelligence’, and it does so at physical and economic cost. Computer science research, per se, with its focus on the currently available digital artifacts, is not going to achieve lasting results unless it expands its purview to these broader systems and considerations. Likewise, evaluations of models alone will not provide us the reliable theoretical knowledge needed to steer public policy. We must take into account production costs and data pipelines.

Leave a comment

February 17, 2026

updates and stubbornness about superintelligence

We seem to be in a new moment of media excitement about the implications of artificial intelligence. This time, the moment is driven by the experience of software engineers and other knowledge workers who are automating their work with ‘agents’. Clause Code etc. The latest generation of models and services is really good at doing things.

Does this change anything about my “position on AI” and superintelligence in particular?

I wrote a brief paper in 2017 about Bostrom’s Superintelligence argument. I concluded that algorithmic self-improvement at the software level would not produce superintelligence. Rather, intelligence group is limited by data and hardware.

In 2025, this conclusion still holds up, as we’ve seen that the recent impressive advances in AI has depended on tremendous capital expenditure on data centers, high-performing chips, and energy. It also depends on well-publicized efforts to collect all the text known to humankind for training data.

About 8 years ago when I was thinking about this, I wrote a bit about the connection between the Superintelligence argument and the Frankfurt School’s views on instrumental reason and capitalism. The alignment of AI with capital has born out, and has been written about by many others. What is striking about the current moment is just how on-the-nose that alignment is in the US, in terms of the full stack of energy, hardware, models, applications, and then some.

So, so far, no update.

In 2021 I published an article saying that we already had artificial systems with the capacity to outperform individual humans at many tasks. They were and still are called corporations or firms. We also had replaced markets with platform, which are similarly more performant in terms of reducing transaction costs. In that article, Jake Goldenfein and I argue that what ultimately matters are the purposes of the social system that operates the AI technology.

I believe this argument also continues to hold up. The successful models and service we are seeing are corporate accomplishments. The corporation is still the relevant unit of analysis when considering AI.

There are a number of interesting things happening now which I think are undertheorized:

What is the real economics of AI, given that the supply chains are so long and complex, consistent of both material and intellectual inputs, and the market for demand is uncertain? This is the trillion dollar question in terms of valuations, and it’s unanswered. The empirics here are not very good because things are far out of equlibrium.
Put another way: what does AI mean for the relationships between capital, corporations, labor, and consumers? Some of these relationships are mediated by rules about corporate law, intellectual property and data use, and so are determinable by law rather than technology. Information law therefore is a key point of political intervention in an economic system that is otherwise determined by laws of nature (energy, computation, etc.?

To put it another way: superintelligence has been happening and continues to happen. Some of this is due to laws of nature. But there is still a meaningful point of human intervention, which is the laws of humanity. Designing and implementing those laws well remains an important challenge.

One last thought. I’ve been inspired by Beninger’s The Control Revolution (1986) which is a historical account of the information economy in terms of cybernetics and information theory. You can ask an AI to tell you more about it, but one item comes to mind: that each new information technology first seems to threaten the jobs of people doing information work, and then leads to an expanded number of information jobs. This has to do with the way complexity is and is not managed by the technology. There’s an open question whether this generation of AI is any different. The question is truly open, but my hunch at the moment is that today’s AI systems are creating a lot more complexity than they are controlling. We will see.

Leave a comment

May 28, 2025

I’m building something new

Push has come to shove, and I have started building something new.

I’m not building it alone, thank God, but it’s a very petite open source project at the moment.

I’m convinced that it is actually something new. Some of my colleagues are excited to hear that what I’m building will exist for them shortly. That’s encouraging! I also tried to build this thing in the context of another project that’s doing something similar. I was told that I was being too ambitious and couldn’t pull it off. That wasn’t exactly encouraging, but was good evidence that I’m actually doing something new. I will now do this thing and take the credit. So much the better for me.

What is it that I’m building?

Well, you see, it’s software for modeling economic systems. Or sociotechnical systems. Or, broadly speaking, complex systems with agents in them. Also, doing statistics with those models: fitting them to data, understanding what emergent properties occur in them, exploring counterfactuals, and so on.

I will try to answer some questions I wish somebody would ask me.

Q: Isn’t that just agent-based modeling? Why aren’t you just using NetLogo or something?

A: Agent-based modeling (ABM) is great, but it’s a very expansive term that means a lot of things. Very often, ABMs consist of agents whose behavior is governed by simple rules, rather directed towards accomplishing goals. That notion of “agent” in ABM is almost entirely opposed to the notion of “agent” used in AI — propagated by Stuart Russell, for example. To AI people, goal-directedness is essential for agency. I’m not committed to rational behavior in this framework — I’m not an economist! But I think a requirement to be able to train agents’ decision rules with respect to their goals.

There are a couple other ways in which I’m not doing paradigmatic ABM with this project. One is that I’m not focused on agents moving in 2D or 3D space. Rather, I’m much more interested in the settings defined by systems of structural equations. So, more continuous state spaces. I’m basing this work on years of contributing to heterogeneous agent macroeconomics tooling, and my frustrating with that paradigm. So, no turtles on patches. I anticipate spatial and even geospatial extensions to what I’m building would be really cool and useful. But I’m not there yet.

I think what I’m working on is ABM in the extended sense that Rob Axtell and Doyne Farmer use the term, and I hope to one day show them what I’m doing and for them to think it’s cool.

Q: Wait, is this about AI agents, as in Generative AI?

A: Ahaha… mainly no, but a little yes. I’m talking about “agents” in the more general sense used before the GenAI industry tried to make the word about them. I don’t see Generative AI or LLMs to be a fundamental part of what I’m building. However, I do see what I’m building as a tool for evaluating the economic impact and trustworthiness of GenAI systems by modeling their supply chains and social consequences. And I can imagine deeper integrations with “(generative) agentic AI” down the line. I am building a tool, and an LLM might engage it through “tool use”. It’s also I suppose possible to make the agents inside the model use LLMs somehow, though I don’t see a good reason for that at the moment.

Q: Does it use AI at all?

A: Yes! I mean, perhaps you know that “AI” has meant many things and much of what it has meant is now considered quite mundane. But it does use deep learning, which is something at “AI” means now. In particular, part of the core functionality that I’m trying to build into it is a flexible version of the deep learning econometrics methods invented not-too-long-ago by Lilia and Serguei Maliar. I hope to one day show this project to them, and for them to think it’s cool. Deep learning methods have become quite popular in economics, and this is in some ways yet-another-deep-learning-economics project. I hope it has a few features that distinguish it.

Q: How is this different from somebody else’s deep learning economics analysis package?

A: Great question! There are a few key ways that it’s different. One is that it’s designed around a clean separation between model definition and solution algorithms. There will be no model-specific solution code in this project. It’s truly intended to be library, comparable to scikit-learn, but for systems of agents. In fact, I’m calling this project scikit-agent. You heard it hear first!

Separating the model definitions from the solution algorithms means that there’s a lot more flexibility in how models are composed. This framework is based on the idea that parts of a model can be “blocks” which can be composed into more complex models. The “blocks” are bundles of structural equations, which can include state, control, and reward variables.

These ‘blocks’ are symbolically defined systems or environments. “Solving” the agent strategies in the multi-agent environment will be done with deep learning, otherwise known as artificial neural networks. So I think that it will be fair to call this framework a “neurosymbolic AI system”. I hope that saying that makes it easier to find funding for it down the line :)

Q: That sounds a little like causal game theory, or multi-agent influence diagrams. Are those part of this?

A: In fact, yes, so glad you asked. I think there’s a deep equivalence between multi-agent influence diagrams and ABM/computational economics which hasn’t been well explored. There are small notational differences that keep these communities from communicating. There are also a handful of substantively difficult theoretical issues that need to be settled with respect to, say, under what conditions a dynamic structure causal game can be solved using multi-agent reinforcement learning. These are cool problems, and I hope the thing I’m building implements good solutions to them.

Q: So, this is a framework for modeling dynamic Pearlian causal models, with multiple goal-directed agents, solving those models for agent strategy, and using those model econometrically?

A: Exactly.

Q: Does the thing you are building have any practical value? Or is it just more weird academic code?

A: I have high hopes that this thing I’m building could have a lot of practical value. Rigorous analysis of complex sociotechnical and economic systems remain hard problems. In finance, for example, as well as public policy, insurance, international relations, and other fields. I do hope what I’m building interfaces well with real data to help with significant decision-making. These are problems that Generative AI is quite bad at, I believe. I’m trying to build a strong, useful foundation for working with statistical models that include agents in them. This is more difficult than regression or even ‘transformer’-based learning from media, because the agents are solving optimization problems inside the model.

Q: What are the first applications you have in mind for this tool?

A: I’m motivated to build this because I think it’s needed to address questions in technology policy and design. This is the main subject of my NSF-funded research over the past several years. Here are some problems I’m actively working on which overlap with the scope of this tool:

Integrating Differential Privacy and Contextual Integrity. I have a working paper with Rachel Cummings where we use Structural Causal Games (SCGs) to set up the parameter tuning of a differentially private system as a mechanism design problem. The tool I’m building will be great at representing structural causal games (SCG). With it, the theoretical technique can be used in practice.
Understanding the Effects of Consumer Finance Policy. I’m working with an amazing team on a project modeling consumer lending and the effects of various consumer protection regulations. We are looking at anti-usury laws, nondiscrimination rules, forgiveness of negative information, the use of alternative data by fintech companies, and so on. This policy analysis involves comparing a number of different scenarios and looking for what regulations produce which results, robustly. I’m building a tool to solve this problem.
AI Governance and Fiduciary Duties. A lot of people have pointed out that effective AI governance requires an understanding of AI supply chains. AI services rely on complex data flows through multiple actors, which are often imperfectly aligned and incompletely contracted. They also depend on physical data centers and the consumption of energy. This raises many questions around liability and quality control that wind up ultimately being about institutional design rather than neural network architectures. I’m building a tool to help reason through these institution design questions. In other words, I’m building a new way to do threat modeling on AI supply chain ecosystems.

Q: Really?

A: Yes! I sometimes have a hard time wrapping my own head around what I’m doing, which is why I’ve written out this blog post. But I do feel very good about what I’m working on at the moment. I think it has a lot of potential.

Leave a comment

June 9, 2024

Envisioning the Future of Computer and Information Science Research: Some Ideas

A recent Dear Colleagues Letter from the National Science Foundation Directorate for Computer and Information Science and Engineering (CISE) calls for proposals for projects to envision research priorities. It is specifically not for research itself, but for promising ways to surface and communicate new R&D directions.

Essentially, the CISE directorate is asking for people to figure out a way to identify the future of computer and information science research. Just, you know, putting it out there.

The CISE Directorate is roughly 38 years old at the time of this writing, and computing and information science have, in that time, transformed pretty much everything.

At the same time, at this present moment, there’s a sense in which computer science feels… saturated. Maybe, indeed, lacking in future vision.

Why do I feel this is so? At least two reasons:

a) In the 90’s and 00’s, so much of the potential of computer science was being discovered and unleashed by startups. Even the companies that are today Big Tech were, then, startups. Now, notoriously, a lot of startups are just weird offshoots of Big Tech companies designed to be absorbed back in when legal or market conditions are favorable. So the technical research agenda is being set by huge companies with in-house research, rather than by a loose network of innovators.

b) “Artificial Intelligence” has for a long time meant “anything that computers can’t do yet”, with the Turing Test as one of the examples of what was still an unsolved problem in computer science. Deep learning has been blasting these unsolved problems out out of the water for almost a decade now. I’d argue that the newish LLM-powered chatbots appear so ominously to be a form of “general” AI is because they command natural language so convincingly — the key challenge of the Turing Test. So, computer science is running out of unsolved problems.

c) At the same time, this so widely hyped and lauded generation of AI, which has been credited with potentially literally apocalyptic powers, has gotten over the hump of the Gartner hype cycle, and it still can’t get hands right. On the other hand, it is supposed to be making software engineering obsolete as a profession, which would in principle cut down on the demand for computer science research.

d) It is now very clear that the success of computing and information science basic research depends on its uptake in commercial and industrial settings, and that these economics depend on business, legal, and social logic that is outside the scope of computer science research per se. Computer and information science research is not successful in virtue of, but rather in spite of, its agnosticism about social context. And, increasingly, that social context is being included within the scope of computer and information science.

So, what is to be done?

One answer, which I intend seriously, is imperialism. By this I mean the expansion of computer and information science research into areas beyond its core. Another answer is that it can occupy itself by adapting to critique. I actually think a combination of both the the best answer.

By imperialism, I mean searching for unsolved problems in other sciences, and trying to crack them with computational methods. This has been done already with Go and protein folding. But most problems in the social sciences remain unsolved problems, computationally. There are indeed parts of the social sciences that are opaque to themselves and without the guiding light of computational theory.

By adapting to critique, I mean responding to the now ample critical literature, mainly produced by humanistic scholars (some legal, some STS, etc.) which aims to show the shortcomings of computer science methodology. Indeed, a lot of “information science” today operates at this critical or political level. Humanistic critique tends to stop at the level of anthropological observation.

What is not yet solved is the internalization of these critiques into computational and information theory and methods, which entail advances in the foundations of computational social science.

There are at least three research arenas that I know of which are getting at parts of these problems.

a) The Agent Foundations research agendas (e.g. PIBBSS, Causal Incentives) that have spun out of the AI Safety research communities. This work has come to understand that some foundational advances in what an agent is, in terms of computation and information, is needed to address longtermist AI safety concerns, and perhaps also more pressing problems of AI compliance in the short term. This has quite a bit of funding from Effective Altruist philanthropists.

b) Various computational institutional theory projects that can be found in the vicinity of Metagov. A lot of this is motivated by the idea of the truly self-governing digital community, a long-held Internet dream, one which got an influx of funding and interest from the blockchain boom. That blockchain/crypto flavor has left it, to some, with a funny smell. But some more academic avenues such as the Institutional Grammar Research Initiative have a more based academic stance.

c) Research into the computational foundations of agent-based modeling, such as that led by Michael Wooldridge and Anisoara Calinescu at Oxford University. Part of the interdisciplinary social science mix at the Institute for New Economic Thought, this research vein finds useful computational methods research that pushes the limits of what social systems can be modeled with computers.

The problem with social scientific problems is that they are extremely hard. They can involve multiple agents in intractable situations. Today, we have almost no social systems that are not also sociotechnical systems where the technology is creating complications, so modeling these systems is recursive and perhaps necessarily approximate. To me, these problems remain philosophically tantalizing, when so many issues seem already to be reducible to fundamentals. Maybe this is the direction of the future of computer and information science research.

Leave a comment

July 6, 2020

Notes about “Data Science and the Decline of Liberal Law and Ethics”

Jake Goldenfein and I have put up on SSRN our paper, “Data Science and the Decline of Liberal Law and Ethics”. I’ve mentioned it on this blog before as something I’m excited about. It’s also been several months since we’ve finalized it, and I wanted to quickly jot some notes about it based on considerations going into it and since then.

The paper was the result of a long and engaged collaboration with Jake which started from a somewhat different place. We considered the question, “What is sociopolitical emancipation in the paradigm of control?” That was a mouthful, but it captured what we were going for:

Like a lot of people today, we are interested in the political project of freedom. Not just freedom in narrow, libertarian senses that have proven to be self-defeating, but in broader senses of removing social barriers and systems of oppression. We were ambivalent about the form that would take, but figured it was a positive project almost anybody would be on board with. We called this project emancipation.
Unlike a certain prominent brand of critique, we did not begin from an anthropological rejection of the realism of foundational mathematical theory from STEM and its application to human behavior. In this paper, we did not make the common move of suggesting that the source of our ethical problems is one that can be solved by insisting on the terminology or methodological assumptions of some other discipline. Rather, we took advances in, e.g., AI as real scientific accomplishments that are telling us how the world works. We called this scientific view of the world the paradigm of control, due to its roots in cybernetics.

I believe our work is making a significant contribution to the “ethics of data science” debate because it is quite rare to encounter work that is engaged with both project. It’s common to see STEM work with no serious moral commitments or valence. And it’s common to see the delegation of what we would call emancipatory work to anthropological and humanistic disciplines: the STS folks, the media studies people, even critical X (race, gender, etc.) studies. I’ve discussed the limitations of this approach, however well-intentioned, elsewhere. Often, these disciplines argue that the “unethical” aspect of STEM is because of their methods, discourses, etc. To analyze things in terms of their technical and economic properties is to lose the essence of ethics, which is aligned with anthropological methods that are grounded in respectful, phenomenological engagement with their subjects.

This division of labor between STEM and anthropology has, in my view (I won’t speak for Jake) made it impossible to discuss ethical problems that fit uneasily in either field. We tried to get at these. The ethical problem is instrumentality run amok because of the runaway economic incentives of private firms combined with their expanded cognitive powers as firms, a la Herbert Simon.

This is not a terribly original point and we hope it is not, ultimately, a fringe political position either. If Martin Wolf can write for the Financial Times that there is something threatening to democracy about “the shift towards the maximisation of shareholder value as the sole goal of companies and the associated tendency to reward management by reference to the price of stocks,” so can we, and without fear that we will be targeted in the next red scare.

So what we are trying to add is this: there is a cognitivist explanation for why firms can become so enormously powerful relative to individual “natural persons”, one that is entirely consistent with the STEM foundations that have become dominant in places like, most notably, UC Berkeley (for example) as “data science”. And, we want to point out, the consequences of that knowledge, which we take to be scientific, runs counter to the liberal paradigm of law and ethics. This paradigm, grounded in individual autonomy and privacy, is largely the paradigm animating anthropological ethics! So we are, a bit obliquely, explaining why the the data science ethics discourse has gelled in the ways that it has.

We are not satisfied with the current state of ‘data science ethics’ because to the extent that they cling to liberalism, we fear that they miss and even obscure the point, which can best be understood in a different paradigm.

We left as unfinished the hard work of figuring out what the new, alternative ethical paradigm that took cognitivism, statistics, and so on seriously would look like. There are many reasons beyond the conference publication page limit why we were unable to complete the project. The first of these is that, as I’ve been saying, it’s terribly hard to convince anybody that this is a project worth working on in the first place. Why? My view of this may be too cynical, but my explanations are that either (a) this is an interdisciplinary third rail because it upsets the balance of power between different academic departments, or (b) this is an ideological third rail because it successfully identifies a contradiction in the current sociotechnical order in a way that no individual is incentivized to recognize, because that order incentivizes individuals to disperse criticism of its core institutional logic of corporate agency, or (c) it is so hard for any individual to conceive of corporate cognition because of how it exceeds the capacity of human understanding that speaking in this way sounds utterly speculative to a lot of fo people. The problem is that it requires attributing cognitive and adaptive powers to social forms, and a successful science of social forms is, at best, in the somewhat gnostic domain of complex systems research.

The latter are rarely engaged in technology policy but I think it’s the frontier.

References

Benthall, Sebastian and Goldenfein, Jake, Data Science and the Decline of Liberal Law and Ethics (June 22, 2020). Ethics of Data Science Conference – Sydney 2020 (forthcoming). Available at SSRN: https://ssrn.com/abstract=

Leave a comment

January 9, 2019

computational institutions as non-narrative collective action

Nils Gilman recently pointed to a book chapter that confirms the need for “official futures” in capitalist institutions.

Great book on the necessity of official futures: “Collectively held images of how the future will unfold are critical because they free economic actors from paralyzing doubt, enabling them to commit resources and coordinate decisions even if those expectations prove inaccurate.” https://t.co/4oCHWgSSNp
— Nils Gilman (@nils_gilman) January 8, 2019

Nils indulged me in a brief exchange that helped me better grasp at a bothersome puzzle.

There is a certain class of intellectuals that insist on the primacy of narratives as a mode of human experience. These tend to be, not too surprisingly, writers and other forms of storytellers.

There is a different class of intellectuals that insists on the primacy of statistics. Statistics does not make it easy to tell stories because it is largely about the complexity of hypotheses and our lack of confidence in them.

The narrative/statistic divide could be seen as a divide between academic disciplines. It has often been taken to be, I believe wrongly, the crux of the “technology ethics” debate.

I questioned Nils as to whether his generalization stood up to statistically driven allocation of resources; i.e., those decisions made explicitly on probabilistic judgments. He argued that in the end, management and collective action require consensus around narrative.

In the (literal) final analysis, the various quantitative possibilities get distilled into narratives — that’s how they get used to drive decisions and collective action.
— Nils Gilman (@nils_gilman) January 8, 2019

In other words, what keeps narratives at the center of human activity is that (a) humans are in the loop, and (b) humans are collectively in the loop.

The idea that communication is necessary for collective action is one I used to put great stock in when studying Habermas. For Habermas, consensus, and especially linguistic consensus, is how humanity moves together. Habermas contrasted this mode of knowledge aimed at consensus and collective action with technical knowledge, which is aimed at efficiency. Habermas envisioned a society ruled by communicative rationality, deliberative democracy; following this line of reasoning, this communicative rationality would need to be a narrative rationality. Even if this rationality is not universal, it might, in Habermas’s later conception of governance, be shared by a responsible elite. Lawyers and a judiciary, for example.

The puzzle that recurs again and again in my work has been the challenge of communicating how technology has become an alternative form of collective action. The claim made by some that technologists are a social “other” makes more sense if one sees them (us) as organizing around non-narrative principles of collective behavior.

It is I believe beyond serious dispute that well-constructed, statistically based collective decision-making processes perform better than many alternatives. In the field of future predictions, Phillip Tetlock’s work on superforecasting teams and prior work on expert political judgment has long stood as an empirical challenge to the supposed primacy of narrative-based forecasting. This challenge has not been taken up; it seems rather one-sided. One reason for this may be because the rationale for the effectiveness of these techniques rests ultimately in the science of statistics.

It is now common to insist that Artificial Intelligence should be seen as a sociotechnical system and not as a technological artifact. I wholeheartedly agree with this position. However, it is sometimes implied that to understand AI as a social+ system, one must understand it one narrative terms. This is an error; it would imply that the collective actions made to build an AI system and the technology itself are held together by narrative communication.

But if the whole purpose of building an AI system is to collectively act in a way that is more effective because of its facility with the nuances of probability, then the narrative lens will miss the point. The promise and threat of AI is that is delivers a different, often more effective form of collective or institution. I’ve suggested that computational institution might be the best way to refer to such a thing.

Leave a comment

December 18, 2017

The Data Processing Inequality and bounded rationality

I have long harbored the hunch that information theory, in the classic Shannon sense, and social theory are deeply linked. It has proven to be very difficult to find an audience for this point of view or an opportunity to work on it seriously. Shannon’s information theory is widely respected in engineering disciplines; many social theorists who are unfamiliar with it are loathe to admit that something from engineering should carry essential insights for their own field. Meanwhile, engineers are rarely interested in modeling social systems.

I’ve recently discovered an opportunity to work on this problem through my dissertation work, which is about privacy engineering. Privacy is a subtle social concept but also one that has been rigorously formalized. I’m working on formal privacy theory now and have been reminded of a theorem from information theory: the Data Processing Theorem. What strikes me about this theorem is that is captures an point that comes up again and again in social and political problems, though it’s a point that’s almost never addressed head on.

The Data Processing Inequality (DPI) states that for three random variables, X, Y, and Z, arranged in Markov Chain such that $X \rightarrow Y \rightarrow Z$ , then $I(X,Z) \leq I(X,Y)$ , where here $I$ stands for mutual information. Mutual information is a measure of how much two random variables carry information about each other. If $I(X,Y) = 0$, that means the variables are independent. $I(X,Y) \geq 0$ always–that’s just a mathematical fact about how it’s defined.

The implications of this for psychology, social theory, and artificial intelligence are I think rather profound. It provides a way of thinking about bounded rationality in a simple and generalizable way–something I’ve been struggling to figure out for a long time.

Suppose that there’s a big world out the, $W$ and there’s am organism, or a person, or a sociotechnical organization within it, $Y$ . The world is big and complex, which implies that it has a lot of informational entropy, $H(W)$ . Through whatever sensory apparatus is available to $Y$ , it acquires some kind of internal sensory state. Because this organism is much small than the world, its entropy is much lower. There are many fewer possible states that the organism can be in, relative to the number of states of the world. $H(W) >> H(Y)$ . This in turn bounds the mutual information between the organism and the world: $I(W,Y) \leq H(Y)$

Now let’s suppose the actions that the organism takes, $Z$ depend only on its internal state. It is an agent, reacting to its environment. Well whatever these actions are, they can only be so calibrated to the world as the agent had capacity to absorb the world’s information. I.e., $I(W,Z) \leq H(Y) << H(W)$ . The implication is that the more limited the mental capacity of the organism, the more its actions will be approximately independent of the state of the world that precedes it.

There are a lot of interesting implications of this for social theory. Here are a few cases that come to mind.

I've written quite a bit here (blog links) and here (arXiv) about Bostrom’s superintelligence argument and why I’m generally not concerned with the prospect of an artificial intelligence taking over the world. My argument is that there are limits to how much an algorithm can improve itself, and these limits put a stop to exponential intelligence explosions. I’ve been criticized on the grounds that I don’t specify what the limits are, and that if the limits are high enough then maybe relative superintelligence is possible. The Data Processing Inequality gives us another tool for estimating the bounds of an intelligence based on the range of physical states it can possibly be in. How calibrated can a hegemonic agent be to the complexity of the world? It depends on the capacity of that agent to absorb information about the world; that can be measured in information entropy.

A related case is a rendering of Scott’s Seeing Like a State arguments. Why is it that “high modernist” governments failed to successfully control society through scientific intervention? One reason is that the complexity of the system they were trying to manage vastly outsized the complexity of the centralized control mechanisms. Centralized control was very blunt, causing many social problems. Arguably, behavioral targeting and big data centers today equip controlling organizations with more informational capacity (more entropy), but they
still get it wrong sometimes, causing privacy violations, because they can’t model the entirety of the messy world we’re in.

The Data Processing Inequality is also helpful for explaining why the world is so messy. There are a lot of different agents in the world, and each one only has so much bandwidth for taking in information. This means that most agents are acting almost independently from each other. The guiding principle of society isn’t signal, it’s noise. That explains why there are so many disorganized heavy tail distributions in social phenomena.

Importantly, if we let the world at any time slice be informed by the actions of many agents acting nearly independently from each other in the slice before, then that increases the entropy of the world. This increases the challenge for any particular agent to develop an effective controlling strategy. For this reason, we would expect the world to get more out of control the more intelligence agents are on average. The popularity of the personal computer perhaps introduced a lot more entropy into the world, distributed in an agent-by-agent way. Moreover, powerful controlling data centers may increase the world’s entropy, rather than redtucing it. So even if, for example, Amazon were to try to take over the world, the existence of Baidu would be a major obstacle to its plans.

There are a lot of assumptions built into these informal arguments and I’m not wedded to any of them. But my point here is that information theory provides useful tools for thinking about agents in a complex world. There’s potential for using it for modeling sociotechnical systems and their limitations.

Leave a comment

November 5, 2017

Managerialism as political philosophy

Technologically mediated spaces and organizations are frequently described by their proponents as alternatives to the state. From David Clark’s maxim of Internet architecture, “We reject: kings, presidents and voting. We believe in: rough consensus and running code”, to cyberanarchist efforts to bypass the state via blockchain technology, to the claims that Google and Facebook, as they mediate between billions of users, are relevant non-state actor in international affairs, to Lessig’s (1999) ever prescient claim that “Code is Law”, there is undoubtedly something going on with technology’s relationship to the state which is worth paying attention to.

There is an intellectual temptation (one that I myself am prone to) to take seriously the possibility of a fully autonomous technological alternative to the state. Something like a constitution written in source code has an appeal: it would be clear, precise, and presumably based on something like a consensus of those who participate in its creation. It is also an idea that can be frightening (Give up all control to the machines?) or ridiculous. The example of The DAO, the Ethereum ‘distributed autonomous organization’ that raised millions of dollars only to have them stolen in a technical hack, demonstrates the value of traditional legal institutions which protect the parties that enter contracts with processes that ensure fairness in their interpretation and enforcement.

It is more sociologically accurate, in any case, to consider software, hardware, and data collection not as autonomous actors but as parts of a sociotechnical system that maintains and modifies it. This is obvious to practitioners, who spend their lives negotiating the social systems that create technology. For those for whom it is not obvious, there’s reams of literature on the social embededness of “algorithms” (Gillespie, 2014; Kitchin, 2017). These themes are recited again in recent critical work on Artificial Intelligence; there are those that wisely point out that a functioning artificially intelligent system depends on a lot of labor (those who created and cleaned data, those who built the systems they are implemented on, those that monitor the system as it operates) (Kelkar, 2017). So rather than discussing the role of particular technologies as alternatives to the state, we should shift our focus to the great variety of sociotechnical organizations.

One thing that is apparent, when taking this view, is that states, as traditionally conceived, are themselves sociotechnical organizations. This is, again, an obvious point well illustrated in economic histories such as (Beniger, 1986). Communications infrastructure is necessary for the control and integration of society, let alone effective military logistics. The relationship between those industrial actors developing this infrastructure, whether it be building roads, running a postal service, laying rail or telegram wires, telephone wires, satellites, Internet protocols, and now social media–and the state has always been interesting and a story of great fortunes and shifts in power.

What is apparent after a serious look at this history is that political theory, especially liberal political theory as it developed in the 1700’s an onward as a theory of the relationship between individuals bound by social contract emerging from nature to develop a just state, leaves out essential scientific facts of the matter of how society has ever been governed. Control of communications and control infrastructure has never been equally dispersed and has always been a source of power. Late modern rearticulations of liberal theory and reactions against it (Rawls and Nozick, both) leave out technical constraints on the possibility of governance and even the constitution of the subject on which a theory of justice would have its ground.

Were political theory to begin from a more realistic foundation, it would need to acknowledge the existence of sociotechnical organizations as a political unit. There is a term for this view, “managerialism“, which, as far as I can tell is used somewhat pejoratively, like “neoliberalism”. As an “-ism”, it’s implied that managerialism is an ideology. When we talk about ideologies, what we are doing is looking from an external position onto an interdependent set of beliefs in their social context and identifying, through genealogical method or logical analysis, how those beliefs are symptoms of underlying causes that are not precisely as represented within those beliefs themselves. For example, one critiques neoliberal ideology, which purports that markets are the best way to allocate resources and advocates for the expansion of market logic into more domains of social and political life, but pointing out that markets are great for reallocating resources to capitalists, who bankroll neoliberal ideologues, but that many people who are subject to neoliberal policies do not benefit from them. While this is a bit of a parody of both neoliberalism and the critiques of it, you’ll catch my meaning.

We might avoid the pitfalls of an ideological managerialism (I’m not sure what those would be, exactly, having not read the critiques) by taking from it, to begin with, only the urgency of describing social reality in terms of organization and management without assuming any particular normative stake. It will be argued that this is not a neutral stance because to posit that there is organization, and that there is management, is to offend certain kinds of (mainly academic) thinkers. I get the sense that this offendedness is similar to the offense taken by certain critical scholars to the idea that there is such a thing as scientific knowledge, especially social scientific knowledge. Namely, it is an offense taken to the idea that a patently obvious fact entails ones own ignorance of otherwise very important expertise. This is encouraged by the institutional incentives of social science research. Social scientists are required to maintain an aura of expertise even when their particular sub-discipline excludes from its analysis the very systems of bureaucratic and technical management that its university depends on. University bureaucracies are, strangely, in the business of hiding their managerialist reality from their own faculty, as alternative avenues of research inquiry are of course compelling in their own right. When managerialism cannot be contested on epistemic grounds (because the bluff has been called), it can be rejected on aesthetic grounds: managerialism is not “interesting” to a discipline, perhaps because it does not engage with the personal and political motivations that constitute it.

What sets managerialism aside from other ideologies, however, is that when we examine its roots in social context, we do not discover a contradiction. Managerialism is not, as far as I can tell, successful as a popular ideology. Managerialism is attractive only to that rare segment of the population that work closely with bureaucratic management. It is here that the technical constraints of information flow and its potential uses, the limits of autonomy especially as it confronts the autonomies of others, the persistence of hierarchy despite the purported flattening of social relations, and so on become unavoidable features of life. And though one discovers in these situations plenty of managerial incompetence, one also comes to terms with why that incompetence is a necessary feature of the organizations that maintain it.

Little of what I am saying here is new, of course. It is only new in relation to more popular or appealing forms of criticism of the relationship between technology, organizations, power, and ethics. So often the political theory implicit in these critiques is a form of naive egalitarianism that sees a differential in power as an ethical red flag. Since technology can give organizations a lot of power, this generates a lot of heat around technology ethics. Starting from the perspective of an ethicist, one sees an uphill battle against an increasingly inscrutable and unaccountable sociotechnical apparatus. What I am proposing is that we look at things a different way. If we start from general principles about technology its role in organizations–the kinds of principles one would get from an analysis of microeconomic theory, artificial intelligence as a mathematical discipline, and so on–one can try to formulate managerial constraints that truly confront society. These constraints are part of how subjects are constituted and should inform what we see as “ethical”. If we can broker between these hard constraints and the societal values at stake, we might come up with a principle of justice that, if unpopular, may at least be realistic. This would be a contribution, at the end of the day, to political theory, not as an ideology, but as a philosophical advance.

References

Beniger, James R. “The Control Revolution: Technological and Economic Origins of the.” Information Society (1986).

Bird, Sarah, et al. “Exploring or Exploiting? Social and Ethical Implications of Autonomous Experimentation in AI.” (2016).

Gillespie, Tarleton. “The relevance of algorithms.” Media technologies: Essays on communication, materiality, and society 167 (2014).

Kelkar, Shreeharsh. “How (Not) to Talk about AI.” Platypus, 12 Apr. 2017, blog.castac.org/2017/04/how-not-to-talk-about-ai/.

Kitchin, Rob. “Thinking critically about and researching algorithms.” Information, Communication & Society 20.1 (2017): 14-29.

Lessig, Lawrence. “Code is law.” The Industry Standard 18 (1999).

Leave a comment

May 16, 2017

Similarities between the cognitive science/AI and complex systems/MAS fields

One of the things that made the research traditions of cognitive science and artificial intelligence so great was the duality between them.

Cognitive science tried to understand the mind at the same time that artificial intelligence tried to discover methods for reproducing the functions of cognition artificially. Artificial intelligence techniques became hypotheses for how the mind worked, and empirically confirmed theories of how the mind worked inspired artificial intelligence techniques.

There was a lot of criticism of these fields at one point. Writers like Hubert Dreyfus, Lucy Suchman, and Winograd and Flores critiqued especially heavily one paradigm that’s now called “Good Old Fashioned AI”–the kind of AI that used static, explicit representations of the world instead of machine learning.

That was a really long time ago and now machine learning and cognitive psychology (including cognitive neuroscience) are in happy conversation, with much more successful models of learning that by and large have absorbed the critiques of earlier times.

Some people think that these old critiques still apply to modern methods in AI. Isn’t AI still AI? I believe the main confusion is that lots of people don’t know that “computable” means something very precisely mathematical: it means a function that is calculable by a partial recursive function. It just so happens that computers, the devices we know and love, can compute any computable function.

So what changed in AI was not that they were using computation to solve problems, but the way they used computation. Similarly, while there was a period where cognitive psychology tried to model mental processes using a particular kind of computable representation, and these models are now known to be inaccurate, that doesn’t mean that the mind doesn’t perform other forms of computation.

A similar kind of relationship is going on between the study of complex systems, especially complex social systems, and the techniques of multi-agent system modeling. Multi-agent system modeling is, as Epstein clarifies, about generative modeling of social processes that is computable in the mathematical sense, but the fact that physical computers are involved is incidental. Multi-agent systems are supposed to be a more realistic way of modeling agent interactions than, say, neoclassical game theory, in the same way that machine learning is a more realistic way of modeling cognition than GOFAI.

Given that, despite (or, more charitably because of) the critiques leveled against it, cognitive science and artificial intelligence have developed into widely successful and highly respected fields. We should expect complex systems/multi-agent systems research to follow a similar trajectory.

Leave a comment

March 19, 2017

artificial life, artificial intelligence, artificial society, artificial morality

“Everyone” “knows” what artificial intelligence is and isn’t and why it is and isn’t a transformative thing happening in society and technology and industry right now.

But the fact is that most of what “we” “call” artificial intelligence is really just increasingly sophisticated ways of solving a single class of problems: optimization.

Essentially what’s happened in AI is that all empirical inference problems can be modeled as Bayesian problems, which are then solved using variational inference methods, which are essentially just turning the Bayesian statistic problem into a solvable form of an optimization problem, and solving it.

Advances in optimization have greatly expanded the number of things computers can accomplish as part of a weak AI research agenda.

Frequently these remarkable successes in Weak AI are confused with an impending revolution in what used to be called Strong AI but which now is more frequently called Artificial General Intelligence, or AGI.

Recent interest in AGI has spurred a lot of interesting research. How could it not be interesting? It is also, for me, extraordinarily frustrating research because I find the philosophical precommitments of most AGI researchers baffling.

One insight that I wish made its way more frequently into discussions of AGI is an insight made by the late Francisco Varela, who argued that you can’t really solve the problem of artificial intelligence until you have solved the problem of artificial life. This is for the simple reason that only living things are really intelligent in anything but the weak sense of being capable of optimization.

Once being alive is taken as a precondition for being intelligent, the problem of understanding AGI implicates a profound and fascinating problem of understanding the mathematical foundations of life. This is a really amazing research problem that for some reason is never ever discussed by anybody.

Let’s assume it’s possible to solve this problem in a satisfactory way. That’s a big If!

Then a theory of artificial general intelligence should be able to show how some artificial living organisms are and others are not intelligent. I suppose what’s most significant here is the shift in thinking of AI in terms of “agents”, a term so generic as to be perhaps at the end of the day meaningless, to thinking of AI in terms of “organisms”, which suggests a much richer set of preconditions.

I have similar grief over contemporary discussion of machine ethics. This is a field with fascinating, profound potential. But much of what machine ethics boils down to today are trolley problems, which are as insipid as they are troublingly intractable. There’s other, better machine ethics research out there, but I’ve yet to see something that really speaks to properly defining the problem, let alone solving it.

This is perhaps because for a machine to truly be ethical, as opposed to just being designed and deployed ethically, it must have moral agency. I don’t mean this in some bogus early Latourian sense of “wouldn’t it be fun if we pretended seatbelts were little gnomes clinging to our seats” but in an actual sense of participating in moral life. There’s a good case to be made that the latter is not something easily reducible to decontextualized action or function, but rather has to do with how own participates more broadly in social life.

I suppose this is a rather substantive metaethical claim to be making. It may be one that’s at odds with common ideological trainings in Anglophone countries where it’s relatively popular to discuss AGI as a research problem. It has more in common, intellectually and philosophically, with continental philosophy than analytic philosophy, whereas “artificial intelligence” research is in many ways a product of the latter. This perhaps explains why these two fields are today rather disjoint.

Nevertheless, I’d happily make the case that the continental tradition has developed a richer and more interesting ethical tradition than what analytic philosophy has given us. Among other reasons this is because of how it is able to situated ethics as a function of a more broadly understood social and political life.

I postulate that what is characteristic of social and political life is that it involves the interaction of many intelligent organisms. Which of course means that to truly understand this form of life and how one might recreate it artificially, one must understand artificial intelligence and, transitively, artificial life.

Only one artificial society is sufficiently well-understood could we then approach the problem of artificial morality, or how to create machines that truly act according to moral or ethical ideals.

2 Comments

Digifesto

Tag: artificial intelligence

June 13, 2026

LLMs as computation

February 17, 2026

updates and stubbornness about superintelligence

May 28, 2025

I’m building something new

January 9, 2019

computational institutions as non-narrative collective action

December 18, 2017

The Data Processing Inequality and bounded rationality

March 19, 2017

artificial life, artificial intelligence, artificial society, artificial morality