Digifesto

Category: open source software

I’m building something new

Push has come to shove, and I have started building something new.

I’m not building it alone, thank God, but it’s a very petite open source project at the moment.

I’m convinced that it is actually something new. Some of my colleagues are excited to hear that what I’m building will exist for them shortly. That’s encouraging! I also tried to build this thing in the context of another project that’s doing something similar. I was told that I was being too ambitious and couldn’t pull it off. That wasn’t exactly encouraging, but was good evidence that I’m actually doing something new. I will now do this thing and take the credit. So much the better for me.

What is it that I’m building?

Well, you see, it’s software for modeling economic systems. Or sociotechnical systems. Or, broadly speaking, complex systems with agents in them. Also, doing statistics with those models: fitting them to data, understanding what emergent properties occur in them, exploring counterfactuals, and so on.

I will try to answer some questions I wish somebody would ask me.

Q: Isn’t that just agent-based modeling? Why aren’t you just using NetLogo or something?

A: Agent-based modeling (ABM) is great, but it’s a very expansive term that means a lot of things. Very often, ABMs consist of agents whose behavior is governed by simple rules, rather directed towards accomplishing goals. That notion of “agent” in ABM is almost entirely opposed to the notion of “agent” used in AI — propagated by Stuart Russell, for example. To AI people, goal-directedness is essential for agency. I’m not committed to rational behavior in this framework — I’m not an economist! But I think a requirement to be able to train agents’ decision rules with respect to their goals.

There are a couple other ways in which I’m not doing paradigmatic ABM with this project. One is that I’m not focused on agents moving in 2D or 3D space. Rather, I’m much more interested in the settings defined by systems of structural equations. So, more continuous state spaces. I’m basing this work on years of contributing to heterogeneous agent macroeconomics tooling, and my frustrating with that paradigm. So, no turtles on patches. I anticipate spatial and even geospatial extensions to what I’m building would be really cool and useful. But I’m not there yet.

I think what I’m working on is ABM in the extended sense that Rob Axtell and Doyne Farmer use the term, and I hope to one day show them what I’m doing and for them to think it’s cool.

Q: Wait, is this about AI agents, as in Generative AI?

A: Ahaha… mainly no, but a little yes. I’m talking about “agents” in the more general sense used before the GenAI industry tried to make the word about them. I don’t see Generative AI or LLMs to be a fundamental part of what I’m building. However, I do see what I’m building as a tool for evaluating the economic impact and trustworthiness of GenAI systems by modeling their supply chains and social consequences. And I can imagine deeper integrations with “(generative) agentic AI” down the line. I am building a tool, and an LLM might engage it through “tool use”. It’s also I suppose possible to make the agents inside the model use LLMs somehow, though I don’t see a good reason for that at the moment.

Q: Does it use AI at all?

A: Yes! I mean, perhaps you know that “AI” has meant many things and much of what it has meant is now considered quite mundane. But it does use deep learning, which is something at “AI” means now. In particular, part of the core functionality that I’m trying to build into it is a flexible version of the deep learning econometrics methods invented not-too-long-ago by Lilia and Serguei Maliar. I hope to one day show this project to them, and for them to think it’s cool. Deep learning methods have become quite popular in economics, and this is in some ways yet-another-deep-learning-economics project. I hope it has a few features that distinguish it.

Q: How is this different from somebody else’s deep learning economics analysis package?

A: Great question! There are a few key ways that it’s different. One is that it’s designed around a clean separation between model definition and solution algorithms. There will be no model-specific solution code in this project. It’s truly intended to be library, comparable to scikit-learn, but for systems of agents. In fact, I’m calling this project scikit-agent. You heard it hear first!

Separating the model definitions from the solution algorithms means that there’s a lot more flexibility in how models are composed. This framework is based on the idea that parts of a model can be “blocks” which can be composed into more complex models. The “blocks” are bundles of structural equations, which can include state, control, and reward variables.

These ‘blocks’ are symbolically defined systems or environments. “Solving” the agent strategies in the multi-agent environment will be done with deep learning, otherwise known as artificial neural networks. So I think that it will be fair to call this framework a “neurosymbolic AI system”. I hope that saying that makes it easier to find funding for it down the line :)

Q: That sounds a little like causal game theory, or multi-agent influence diagrams. Are those part of this?

A: In fact, yes, so glad you asked. I think there’s a deep equivalence between multi-agent influence diagrams and ABM/computational economics which hasn’t been well explored. There are small notational differences that keep these communities from communicating. There are also a handful of substantively difficult theoretical issues that need to be settled with respect to, say, under what conditions a dynamic structure causal game can be solved using multi-agent reinforcement learning. These are cool problems, and I hope the thing I’m building implements good solutions to them.

Q: So, this is a framework for modeling dynamic Pearlian causal models, with multiple goal-directed agents, solving those models for agent strategy, and using those model econometrically?

A: Exactly.

Q: Does the thing you are building have any practical value? Or is it just more weird academic code?

A: I have high hopes that this thing I’m building could have a lot of practical value. Rigorous analysis of complex sociotechnical and economic systems remain hard problems. In finance, for example, as well as public policy, insurance, international relations, and other fields. I do hope what I’m building interfaces well with real data to help with significant decision-making. These are problems that Generative AI is quite bad at, I believe. I’m trying to build a strong, useful foundation for working with statistical models that include agents in them. This is more difficult than regression or even ‘transformer’-based learning from media, because the agents are solving optimization problems inside the model.

Q: What are the first applications you have in mind for this tool?

A: I’m motivated to build this because I think it’s needed to address questions in technology policy and design. This is the main subject of my NSF-funded research over the past several years. Here are some problems I’m actively working on which overlap with the scope of this tool:

  • Integrating Differential Privacy and Contextual Integrity. I have a working paper with Rachel Cummings where we use Structural Causal Games (SCGs) to set up the parameter tuning of a differentially private system as a mechanism design problem. The tool I’m building will be great at representing structural causal games (SCG). With it, the theoretical technique can be used in practice.
  • Understanding the Effects of Consumer Finance Policy. I’m working with an amazing team on a project modeling consumer lending and the effects of various consumer protection regulations. We are looking at anti-usury laws, nondiscrimination rules, forgiveness of negative information, the use of alternative data by fintech companies, and so on. This policy analysis involves comparing a number of different scenarios and looking for what regulations produce which results, robustly. I’m building a tool to solve this problem.
  • AI Governance and Fiduciary Duties. A lot of people have pointed out that effective AI governance requires an understanding of AI supply chains. AI services rely on complex data flows through multiple actors, which are often imperfectly aligned and incompletely contracted. They also depend on physical data centers and the consumption of energy. This raises many questions around liability and quality control that wind up ultimately being about institutional design rather than neural network architectures. I’m building a tool to help reason through these institution design questions. In other words, I’m building a new way to do threat modeling on AI supply chain ecosystems.

Q: Really?

A: Yes! I sometimes have a hard time wrapping my own head around what I’m doing, which is why I’ve written out this blog post. But I do feel very good about what I’m working on at the moment. I think it has a lot of potential.

Open Source Computational Economics: The State of the Art

Last week I spoke at PyData NYC 2023 about “Computational Open Source Economics: The State of the Art”.

It was a very nice conference, packed with practical guidance on using Python in machine learning workflows, interesting people, and some talks that were further afield. Mine was the most ‘academic’ talk that I saw there: it concerns recent developments in computational economics and what that means for open source economics tooling.

The talk discussed DYNARE, a widely known toolkit for representative agent modeling in a DSGE framework, and also more recently developed packages such as QuantEcon, Dolo, and HARK. It then outline how dynamic programming solutions to high-dimensional heterogeneous agent problems have ran into computational complexity constraints. Then, excitingly, how deep learning has been used to solve these models very efficiently, which greatly expands the scope of what can be modeled! This part of the talk drew heavily on Maliar, Maliar, and Winant (2021) and Chen, Didisheim, and Scheidegger (2023).

The talk concluded with some predictions about where computational economics is going. More standardized ways of formulating problems, coupled with reliable methods for encoding these problems into deep learning training routines, is a promising path forward for exploring a wide range of new models.

Slides are included below.

References

Chen, H., Didisheim, A., & Scheidegger, S. (2021). Deep Surrogates for Finance: With an Application to Option Pricing. Available at SSRN 3782722.

Maliar, L., Maliar, S., & Winant, P. (2021). Deep learning for solving dynamic economic models. Journal of Monetary Economics, 122, 76-101.

the make or buy decision (TCE) in the software and cybersecurity

The paradigmatic case of transaction cost economics (TCE) is the make-or-buy decision. A firm, F, needs something, C. Do they make it in-house or do they buy it from somewhere else?

If the firm makes it in-house, they will incur some bureaucratic overhead costs in addition to the costs of production. But they will also be able to specialize C for their purposes. They can institute their own internal quality controls. And so on.

If the firm buys it on the open market from some other firm, say, G, they don’t pay the overhead costs. They do lose the benefits of specialization, and the quality controls are only those based on economic competitive pressure on suppliers.

There is an intermediate option, which is a contract between F and G which establishes an ongoing relationship between the two firms. This contract creates a field in which C can be specialized for F, and there can be assurances of quality, while the overhead is distributed efficiently between F and G.

This situation is both extremely common in business practice and not well handled by neoclassical, orthodox economics. It’s the case that TCE is tremendously preoccupied with.


My background and research is in the software industry, which is rife with cases like these.

Developers are constantly faced with a decision to make-or-buy software components. In principle, they can developer any component themselves. In practice, this is rarely cost-effective.

In software, open source software components are a prevalent solution to this problem. This can be thought of as a very strange market where all the prices are zero. The most popular open source libraries are very generic , having little “asset specificity” in TCE terms.

The lack of contract between developers and open source components/communities is sometimes seen as a source of hazard in using open source components. The recent event-stream hack, where an upstream component was injected with malicious code by a developer who had taken over maintaining the package, illustrates the problems of outsourcing technical dependencies without a contract. In this case, the quality problem is manifest as a supply chain cybersecurity problem.

In Williamson’s analysis, these kinds of hazards are what drive firms away from purchasing on spot markets and towards contracting or in-house development. In practice, the role of open source support companies fills the role of being a responsible entity G that firm F can build a relationship with.

“the privatization of public functions”

An emerging theme from the conference on Trade Secrets and Algorithmic Systems was that legal scholars have become concerned about the privatization of public functions. For example, the use of proprietary risk assessment tools instead of the discretion of judges who are supposed to be publicly accountable is a problem. More generally, use of “trade secrecy” in court settings to prevent inquiry into software systems is bogus and moves more societal control into the realm of private ordering.

Many remedies were proposed. Most involved some kind of disclosure and audit to experts. The most extreme form of disclosure is making the software and, where it’s a matter of public record, training data publicly available.

It is striking to me to be encountering the call for government use of open source systems because…this is not a new issue. The conversation about federal use of open source software was alive and well over five years ago. Then, the arguments were about vendor lock-in; now, they are about accountability of AI. But the essential problem of whether core governing logic should be available to public scrutiny, and the effects of its privatization, have been the same.

If we are concerned with the reliability of a closed and large-scale decision-making process of any kind, we are dealing with problems of credibility, opacity, and complexity. The prospects of an efficient market for these kinds of systems are dim. These market conditions are the conditions of sustainability of open source infrastructure. Failures in sustainability are manifest as software vulnerabilities, which are one of the key reasons why governments are warned against OSS now, though the process of measurement and evaluation of OSS software vulnerability versus proprietary vulnerabilities is methodologically highly fraught.

open source sustainability and autonomy, revisited

Some recent chats with Chris Holdgraf and colleagues at NYU interested in “critical digital infrastracture” have gotten me thinking again about the sustainability and autonomy of open source projects again.

I’ll admit to having had naive views about this topic in the past. Certainly, doing empirical data science work on open source software projects has given me a firmer perspective on things. Here are what I feel are the hardest earned insights on the matter:

  • There is tremendous heterogeneity in open source software projects. Almost all quantitative features of these projects fall in log-normal distributions. This suggests that the keys to open source software success are myriad and exogenous (how the technology fits in the larger ecosystem, how outside funding and recognition is accomplished, …) rather than endogenous factors (community policies, etc.) While many open source projects start as hobby and unpaid academic projects, those that go on to be successful find one or more funding sources. This funding is an exogenous factor.
  • The most significant exogenous factors to an open source software project’s success are the industrial organization of private tech companies. Developing an open technology is part of the strategic repertoire of these companies: for example, to undermine the position of a monopolist, developing an open source alternative decreases barriers to market entry and allows for a more competitive field in that sector. Another example: Google funded Mozilla for so long arguably to deflect antitrust action over Google Chrome.
  • There is some truth to Chris Kelty’s idea of open source communities as recursive publics, cultures that have autonomy that can assert political independence at the boundaries of other political forces. This autonomy comes from: the way developers of OSS get specific and valuable human capital in the process of working with the software and their communities; the way institutions begin to depend on OSS as part of their technical stack, creating an installed base; and how many different institutions may support the same project, creating competition for the scarce human capital of the developers. Essentially, at the point where the software and the skills needed to deploy it effectively and the community of people with those skills is self-organized, the OSS community has gained some economic and political autonomy. Often this autonomy will manifest itself in some kind of formal organization, whether a foundation, a non-profit, or a company like Redhat or Canonical or Enthought. If the community is large and diverse enough it may have multiple organizations supporting it. This is in principle good for the autonomy of the project but may also reflect political tensions that can lead to a schism or fork.
  • In general, since OSS development is internally most often very fluid, with the primary regulatory mechanism being the fork, the shape of OSS communities is more determined by exogenous factors than endogenous ones. When exogenous demand for the technology rises, the OSS community can find itself with a ‘surplus’, which can be channeled into autonomous operations.

fancier: scripts to help manage your Twitter account, in Python

My Twitter account has been a source of great entertainment, distraction, and abuse over the years. It is time that I brought it under control. I am too proud and too cheap to buy a professional grade Twitter account manager, and so I’ve begun developing a new suite of tools in Python that will perform the necessary tasks for me.

I’ve decided to name these tools fancier, because the art and science of breeding domestic pigeons is called pigeon fancying. Go figure.

The project is now available on GitHub, and of course I welcome any collaboration or feedback!

At the time of this writing, the project has only one feature: it searches through who you follow on Twitter, finds which accounts are both inactive in 90 days and don’t follow you back, and then unfollows them.

This is a common thing to try to do when grooming and/or professionalizing your Twitter account. I saw a script for this shared in a pastebin years ago, but couldn’t find it again. There are some on-line services that will help you do this, but they charge a fee to do it at scale. Ergo: the open source solution. Voila!

moved BigBang core repository to DATACTIVE organization

I made a small change this evening which I feel really, really good about.

I transferred the BigBang project from my personal GitHub account to the datactive organization.

I’m very grateful for DATACTIVE‘s interest in BigBang and am excited to turn over the project infrastructure to their stewardship.

trust issues and the order of law and technology cf @FrankPasquale

I’ve cut to the last chapter of Pasquale’s The Black Box Society, “Towards an Intelligible Society.” I’m interested in where the argument goes. I see now that I’ve gotten through it that the penultimate chapter has Pasquale’s specific policy recommendations. But as I’m not just reading for policy and framing but also for tone and underlying theoretical commitments, I think it’s worth recording some first impressions before doubling back.

These are some points Pasquale makes in the concluding chapter that I wholeheartedly agree with:

  • A universal basic income would allow more people to engage in high risk activities such as the arts and entrepreneurship and more generally would be great for most people.
  • There should be publicly funded options for finance, search, and information services. A great way to provide these would be to fund the development of open source algorithms for finance and search. I’ve been into this idea for so long and it’s great to see a prominent scholar like Pasquale come to its defense.
  • Regulatory capture (or, as he elaborates following Charles Lindblom, “regulatory circularity”) is a problem. Revolving door participation in government and business makes government regulation an unreliable protector of the public interest.

There is quite a bit in the conclusion about the specifics of regulation the finance industry. There is an impressive amount of knowledge presented about this and I’ll admit much of it is over my head. I’ll probably have a better sense of it if I get to reading the chapter that is specifically about finance.

There are some things that I found bewildering or off-putting.

For example, there is a section on “Restoring Trust” that talks about how an important problem is that we don’t have enough trust in the reputation and search industries. His solution is to increase the penalties that the FTC and FCC can impose on Google and Facebook for its e.g. privacy violations. The current penalties are too trivial to be effective deterrence. But, Pasquale argues,

It is a broken enforcement model, and we have black boxes to thank for much of this. People can’t be outraged by what they can’t understand. And without some public concern about the trivial level of penalties for lawbreaking here, there are no consequences for the politicians ultimately responsible for them.

The logic here is a little mad. Pasquale is saying that people are not outraged enough by search and reputation companies to demand harsher penalties, and this is a problem because people don’t trust these companies enough. The solution is to convince people to trust these companies less–get outraged by them–in order to get them to punish the companies more.

This is a bit troubling, but makes sense based on Pasquale’s theory of regulatory circularity, which turns politics into a tug-of-war between interests:

The dynamic of circularity teaches us that there is no stable static equilibrium to be achieved between regulators and regulated. The government is either pushing industry to realize some public values in its activities (say, by respecting privacy or investing in sustainable growth), or industry is pushing regulators to promote its own interests.

There’s a simplicity to this that I distrust. It suggests for one that there are no public pressures on industry besides the government such as consumer’s buying power. A lot of Pasquale’s arguments depend on the monopolistic power of certain tech giants. But while network effects are strong, it’s not clear whether this is such a problem that consumers have no market buy in. In many cases tech giants compete with each other even when it looks like they aren’t. For example, many many people have both Facebook and Gmail accounts. Since there is somewhat redundant functionality in both, consumers can rather seemlessly allocate their time, which is tied to advertising revenue, according to which service they feel better serves them, or which is best reputationally. So social media (which is a bit like a combination of a search and reputation service) is not a monopoly. Similarly, if people have multiple search options available to them because, say, the have both Siri on their smart phone and can search Google directly, then that provides an alternative search market.

Meanwhile, government officials are also often self-interested. If there is a road to hell for industry that is to provide free web services to people to attain massive scale, then abuse economic lock-in to extract value from customers, then lobby for further rent-seeking, there is a similar road to hell in government. It starts with populist demagoguery, leads to stable government appointment, and then leverages that power for rents in status.

So, power is power. Everybody tries to get power. The question is what you do once you get it, right?

Perhaps I’m reading between the lines too much. Of course, my evaluation of the book should depend most on the concrete policy recommendations which I haven’t gotten to yet. But I find it unfortunate that what seems to be a lot of perfectly sound history and policy analysis is wrapped in a politics of professional identity that I find very counterproductive. The last paragraph of the book is:

Black box services are often wondrous to behold, but our black-box society has become dangerously unstable, unfair, and unproductive. Neither New York quants nor California engineers can deliver a sound economy or a secure society. Those are the tasks of a citizenry, which can perform its job only as well as it understands the stakes.

Implicitly, New York quants and California engineers are not citizens, to Pasquale, a law professor based in Maryland. Do all real citizens live around Washington, DC? Are they all lawyers? If the government were to start providing public information services, either by hosting them themselves or by funding open source alternatives, would he want everyone designing these open algorithms (who would be quants or engineers, I presume) to move to DC? Do citizens really need to understand the stakes in order to get this to happen? When have citizens, en masse, understood anything, really?

Based on what I’ve read so far, The Black Box Society is an expression of a lack of trust in the social and economic power associated with quantification and computing that took off in the past few dot-com booms. Since expressions of lack of trust for these industries is nothing new, one might wonder (under the influence of Foucault) how the quantified order and the critique of the quantified order manage to coexist and recreate a system of discipline that includes both and maintains its power as a complex of superficially agonistic forces. I give sincere credit to Pasquale for advocating both series income redistribution and public investment in open technology as ways of disrupting that order. But when he falls into the trap of engendering partisan distrust, he loses my confidence.

Innovation, automation, and inequality

What is the economic relationship between innovation, automation, and inequality?

This is a recurring topic in the discussion of technology and the economy. It comes up when people are worried about a new innovation (such as data science) that threatens their livelihood. It also comes up in discussions of inequality, such as in Piketty’s Capital in the Twenty-First Century.

For technological pessimists, innovation implies automation, and automation suggests the transfer of surplus from many service providers to a technological monopolist providing a substitute service at greater scale (scale being one of the primary benefits of automation).

For Piketty, it’s the spread of innovation in the sense of the education of skilled labor that is primary force that counteracts capitalism’s tendency towards inequality and (he suggests) the implied instability. For the importance Piketty places on this process, he treats it hardly at all in his book.

Whether or not you buy Piketty’s analysis, the preceding discussion indicates how innovation can cut both for and against inequality. When there is innovation in capital goods, this increases inequality. When there is innovation in a kind of skilled technique that can be broadly taught, that decreases inequality by increasing the relative value of labor to capital (which is generally much more concentrated than labor).

I’m a software engineer in the Bay Area and realize that it’s easy to overestimate the importance of software in the economy at large. This is apparently an easy mistake for other people to make as well. Matthew Rognlie, the economist who has been declared Piketty’s latest and greatest challenger, thinks that software is an important new form of capital and draws certain conclusions based on this.

I agree that software is an important form of capital–exactly how important I cannot yet say. One reason why software is an especially interesting kind of capital is that it exists ambiguously as both a capital good and as a skilled technique. While naively one can consider software as an artifact in isolation from its social environment, in the dynamic information economy a piece of software is only as good as the sociotechnical system in which it is embedded. Hence, its value depends both on its affordances as a capital good and its role as an extension of labor technique. It is perhaps easiest to see the latter aspect of software by considering it a form of extended cognition on the part of the software developer. The human capital required to understand, reproduce, and maintain the software is attained by, for example, studying its source code and documentation.

All software is a form of innovation. All software automates something. There has been a lot written about the potential effects of software on inequality through its function in decision-making (for example: Solon Barocas, Andrew D. Selbst, “Big Data’s Disparate Impact” (link).) Much less has been said about the effects of software on inequality through its effects on industrial organization and the labor market. After having my antennas up for this for many reasons, I’ve come to a conclusion about why: it’s because the intersection between those who are concerned about inequality in society and those that can identify well enough with software engineers and other skilled laborers is quite small. As a result there is not a ready audience for this kind of analysis.

However unreceptive society may be to it, I think it’s still worth making the point that we already have a very common and robust compromise in the technology industry that recognizes software’s dual role as a capital good and labor technique. This compromise is open source software. Open source software can exist both as an unalienated extension of its developer’s cognition and as a capital good playing a role in a production process. Human capital tied to the software is liquid between the software’s users. Surplus due to open software innovations goes first to the software users, then second to the ecosystem of developers who sell services around it. Contrast this with the proprietary case, where surplus goes mainly to a singular entity that owns and sells the software rights as a monopolist. The former case is vastly better if one considers societal equality a positive outcome.

This has straightforward policy implications. As an alternative to Piketty’s proposed tax on capital, any policies that encourage open source software are ones that combat societal inequality. This includes procurement policies, which need not increase government spending. On the contrary, if governments procure primarily open software, that should lead to savings over time as their investment leads to a more competitive market for services. Equivalently, R&D funding to open science institutions results in more income equality than equivalent funding provided to private companies.