scientific explanation

Explainable AI and computational approaches to macroeconomic theory

I have spent some time working with and around people concerned with the ethical implications of AI. A question that arises frequently in that context is to what extent automated decisions made by computational systems are “explainable” or “scrutable” (e.g. Selbst and Barocs, 2018). An important motivation for this line of inquiry is the idea that for AI systems to be effectively regulated by the Rule of Law, they need to be comprehensible to lawyers and understood within lawyerly discursive pracice (Hildebrandt, 2015). This is all very interesting, but analyses of the problem and its potential solutions rarely transcend the disciplinary silos from which the ‘explainability’ concerns originate. I’ve written my opinions about this quite a bit on this blog and I won’t reiterate them.

Instead, I’ve changed what I’m working now. Now I am contributing to open source software libraries for computational methods in macroeconomics, such as the Heterogeneous Agents Resources and toolKit (HARK). This is challenging and rewarding work. One reason why it is challenging and rewarding is how it bumps up against many key issues in the way computational methods are changing social sciences education. This is in many ways related to the explainable AI problem, though it’s in some sense the opposite side of the coin.

I’ll try to explain. Macroeconomic theory, which deals with such problems as how the economy as a whole reacts to changing trends in saving, consumption, and employment, and how agents within the economy react to those aggregate phenomena, has a long history associated with some major heavyweight economists: Keynes, Mankiw, etc. It is a deeply mathematical field that is taken seriously by central banks around the world and, by extension, private banks as well. Regulating the economy is an important job that requires expertise and is an intrinsically quantitatively understood operation; whatever one may think about the field of economics in general or its specific manifestations in history, it’s undeniable that the world needs economists of one kind or another.

So we have here a form of public policy expertise that is not discursive in the same sense that lawyerly practice is discursive. Economics has always imagined itself to be a science, however hotly contested that claim may be. It is also a field that does not shy away from having specialized disciplinary knowledge that must be accessed through demanding training. So economics would seem to be a good domain for computational methods to take root.

I’m finding that there are still challenges of interpretation in this field, but that they are somewhat different. Consider for now only the class of economic models that are built from a priori assumptions without any fitting to empirical data. Classically, economic models were constrained by their analytic tractability, meaning the ability of the economist to derive the results of the model through symbolic manipulation of the model’s mathematical terms. This led to the adoption of many assumptions of questionable realism, which have arguably led to some of the discrediting of economic theory since. But it also led to models that had closed form solutions, which have the dual advantage of being easy to compute (in terms of computational cost) and being easy to interpret, because the relationship between variables is explicit.

With computational models, the modeler has more flexibility. They can plug in the terms of the model and run a simulation to compute the result. But while the relationships between the input and output of the simulation may be observable in some sense in this case, the relationship is not proven. The simulation is not as good for purposes of exposition, or teaching, or explanation.

This is quite interesting, as it is a case where the explainability of a computational system is problematic but not because of a numeric or technical illiteracy on the part of the model reader, or of any intentional secrecy, but rather because of the complexity of the simulation (Burrell, 2016). For the purposes of this discussion, I’ve been discussing model building only, not model fitting, so the complexity in this case does not come from the noisiness of reality and the data it provides. Rather, the complexity results entirely from the internals of the model.

It is now a true word often spoken in jest that most machine learning today is some form of glorified (generalized) linear regression. The class of models considered by machine learning methods today is infinitely wide but ultimately shallow. Even when a need to understand the underlying phenomenon is abandoned, the available range of algorithms and hardware constraints limits machine-learnt models to those that are tractable by, say, a GPU.

But something else can be known.

References

Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society 3.1 (2016): 2053951715622512.

Hildebrandt, Mireille. Smart technologies and the end (s) of law: Novel entanglements of law and technology. Edward Elgar Publishing, 2015.

Selbst, Andrew D., and Solon Barocas. “The intuitive appeal of explainable machines.” Fordham L. Rev. 87 (2018): 1085.

Why disorganized heavy tail distributions?

I wrote too soon.

Miller and Page (2009) do indeed address “fat tail” distributions explicitly in the same chapter on Emergence discussed in my last post.

However, they do not touch on the possibility that fat tail distributions might be log normal distributions generated by the Central Limit Theorem, as is well-documented by Mitzenmacher (2004).

Instead, they explicitly make a different case. They argue that there are two kinds of complexity:

disorganized complexity, complexity where extreme values balance each other out to create average aggregate behavior according to the Law of Large Numbers and Central Limit Theorem.
organized complexity, where positive and negative feedback can result in extreme outcomes, best characterized by power law or “heavy tail” distributions. Preferential attachment is an example of a feedback based mechanism for generating power law distributions (in the specific case of network degrees).

Indeed, this rough breakdown of possible scientific explanations (the relatively orderly null-hypothesis world of normal distributions, and the chaotic, more accurately rendered world of heavy tail distributions) was the one I had before I started studying complex systems and statistics more seriously in grad school.

Only later did I come to the conclusion that this is a pervasive error, because of the ease with which log normal distributions (which may be “disorganized”) can be confused with power law distributions (which tend to be explained by “organized” processes). I am a bit disappointed that Miller and Page repeat this error, but then again their book is written in 2009. I wonder whether the methodological realization (which I assume I’m not alone in, as I hear it confirmed informally in conversations with smart people sometimes) is relatively recent.

Because this is something so rarely discussed in focus, I think it may be worth pondering exactly why disorganized heavy tail distributions are not favored in the literature. There are several reasons I can think of, which I’ll offer informally here as possibilities or hypotheses.

One reason that I’ve argued for before here is that organized processes are more satisfying as explanations than disorganized processes. Most people are not very good at thinking about probabilities (Tetlock and Gardner (2016) have a great, accessible discussion of why this is the case). So to the extent that the Law of Large Numbers or Central Limit Theorem have true explanatory power, it may not be the kind of explanation most people are willing to entertain. This apparently includes scientists. Rather, a simple explanation in terms of feedback may be the kind of thing that feels like a robust scientific finding, even if there’s something spurious about it when viewed rigorously. (This is related, I think, to arguments about the end of narrative in social science.)

Another reason why disorganized heavy tail distributions may be underutilized as scientific explanations is that it is counter-intuitive that a disorganized process can produce such extreme inequality in outcomes.

This has to do with the key transformation that is the difference between a normal and a log normal distribution. A normal distribution is a bell-shaped distribution one gets when one adds a large number of independent random variables.

The log normal distribution is a heavy tail distribution one gets by multiplying a large number of positively valued independent random variables. While it does have a bell or hump, the top of the bell is not at the arithmetic mean, because the sides of the bell are skewed in size. But this is not necessarily because of the dominance of any particular factor (as would be expected if, for example, a single factor were involved in a positive feedback loop). Rather, it is the mathematical fact of many factors multiplied creating extraordinarily high values which creates the heavy right-hand side of the bell.

One way to put it is that rather than having a “deep” positive feedback loop where a single factor amplifies itself many times over, disorganized heavy tails have “shallow” positive feedback where each of many factors has a single and simultaneous amplifying effect on the impact of all the others. This amplification effect is, like multiplication itself, commutative, which means that no single factor can be considered to be causally prior to the others.

Once again, this defies specificity in an explanation, which may be for some people an explanatory desideratum.

But these extreme values are somehow ones that people demand specific explanations for. This is related, I believe, at the desire for a causal lever with which people can change outcomes, especially their own personal outcomes.

There’s an important political question implicated by all this, which is: why is wealth and power concentrated in the hands of the very few?

One explanation that must be considered is the possibility that society is accumulated history, and over thousands of years an innumerable number of independent factors have affected the distribution of wealth and power. Though rather disorganized, these factors amplify each other multiplicatively, resulting in the distribution that we see today.

The problem with this explanation is that it seems there is little to be done about this state of affairs. A person can effect a handful of the factors that contribute to their own wealth or the wealth of another, but if there are thousands of them then it’s hard to get a grip. One must view the other as simply lucky or unlucky. How can one politically mobilize around that?

References

Miller, John H., and Scott E. Page. Complex adaptive systems: An introduction to computational models of social life. Princeton university press, 2009

Mitzenmacher, Michael. “A brief history of generative models for power law and lognormal distributions.” Internet mathematics 1.2 (2004): 226-251.

Tetlock, Philip E., and Dan Gardner. Superforecasting: The art and science of prediction. Random House, 2016.

Digifesto

Tag: scientific explanation

Digifesto

Tag: scientific explanation

November 4, 2019

Explainable AI and computational approaches to macroeconomic theory

July 11, 2017

Why disorganized heavy tail distributions?