complex adaptive systems

September 1, 2020

On Cilliers on “complex systems”

Mireille Hildebrandt has been surfacing the work of Cilliers on complex systems.

Paul Cilliers on the difference between rule based systems and complex systems. The latter can be compressed into a model, but never in an objective way. Let’s face it, mind, self and society are complex systems, there is always framing going on and always uncertainty pic.twitter.com/mae77P9FUX
— mireillemoret (@mireillemoret) August 30, 2020

I’ve had a longstanding interest in the modeling of what are variously called “complex systems” or “complex adaptive systems” to study the formation of social structure, particularly as it might inform technical and policy design. I’m thrilled to be working on projects along these lines now, at last.

So naturally I’m intrigued by Hildebrandt’s use of Cillier’s to, it seems, humble if not delegitimize the aspirations of complex systems modeling. But what, precisely, is Cillier’s argument? Let’s look at the accessible “What can we learn from a theory of complexity?“.

First, what is a “complex system” to Cilliers?

I will not provide a detailed description of complexity here, but only summarize the general characteristics of complex systems as I see them.
1. Complex systems consist of a large number of elements that in
themselves can be simple.
2. The elements interact dynamically by exchanging energy or information. These interactions are rich. Even if specific elements only interact with a few others, the effects of these interactions are propagated throughout the system. The interactions are nonlinear.
3. There are many direct and indirect feedback loops.
4. Complex systems are open systems—they exchange energy or information with their environment—and operate at conditions far from
equilibrium.
5. Complex systems have memory, not located at a specific place, but
distributed throughout the system. Any complex system thus has a
history, and the history is of cardinal importance to the behavior of the
system.
6. The behavior of the system is determined by the nature of the interactions, not by what is contained within the components. Since the
interactions are rich, dynamic, fed back, and, above all, nonlinear, the
behavior of the system as a whole cannot be predicted from an inspection of its components. The notion of “emergence” is used to describe this aspect. The presence of emergent properties does not provide an argument against causality, only against deterministic forms of prediction.
7. Complex systems are adaptive. They can (re)organize their internal
structure without the intervention of an external agent.

Certain systems may display some of these characteristics more prominently than others. These characteristics are not offered as a definition of complexity, but rather as a general, low-level, qualitative description. If we accept this description (which from the literature on complexity theory appears to be reasonable), we can investigate the implications it would have for social or organizational systems.

This all looks quite standard at first glance except for point (6), which pointedly contains not only a description (the system exhibits emergent properties) but also a point about “deterministic forms of prediction”.

Cilliers hedges against actually defining his terms. This it seems, consistent with his position (expressed later).

The next section then presents some thematic, but tentative, consequences of thinking of organizations as complex systems. Presumably, this is all material he justifies elsewhere, as these are very loose arguments.

The point which Hildebrandt is gesturing towards is in the section “WHAT WE CANNOT LEARN FROM A THEORY OF COMPLEXITY”. This is Cilliers’s Negative Argument. There is indeed an argument here:

Looking at the positive aspects we discussed above, you will notice that none is specific. They are all heuristic, in the sense that they provide a general set of guidelines or constraints. Perhaps the best way of putting it is to say that a theory of complexity cannot help us to take in specific positions, to make accurate predictions. This conclusion follows inevitably from the basic characteristics discussed above.

At this point, Cilliers has twice (here, and in point (6) of the definition earlier) mentioned “prediction”, which is a very mathematically understood concept within the field of statistics. There is a red flag here. It is the use of the term “accurate predictions” or “deterministic forms of prediction”. What do these mean? Would a statistician say that any prediction is, strictly speaking, accurate or deterministic? Likely not. They could provide a confidence interval, or a Bayesian posterior odds–how much they would be willing to bet on an outcome. But these may not be what Cilliers means by “prediction”. There is a folk theoretic, qualitative sense of “prediction” which is sometimes used talismanicly by those who engage in foresight but eschew statistical prediction–scenario planners for example.

This is a different semantic world. On the formal modeling side, when one constructs a model and, as one does today, runs it as a simulation, what one does is run it multiple times over with different input parameters in order to get a sense of the probability distribution of outcomes. “Prediction”, to the formal science community, is the discovery of this distribution, not the pinpointing of a particular outcome. Very often, this distribution might have a high variance–meaning, a wide range of possible values, and therefore a very small amount of confidence that it will land on any particular. This is nevertheless considered a satisfactory outcome for a model. The model “predicts” that the result will be from a high-variance distribution.

For example, consider the toss of a single six-sided die. Nobody can predict the outcome deterministically. The “prediction” one can make is that it will land on 1, 2, 3, 4, 5, or 6, with equal odds.

So, already, we see Cilliers disconnected from mainstream statistical practice. If “deterministic prediction” is not what statisticians mean by prediction in any case, even simple ones, then they certainly do not believe they could make such a prediction about a complex system.

This is not the only time when Cilliers appears unfamiliar with the mathematical grounds that his argument gestures at. The following paragraph is quite disturbing:

In order to predict the behavior of a system accurately, we need a detailed understanding of that system, i.e., a model. Since the nature of a complex system is the result of the relationships distributed all over the system, such a model will have to reflect all these relationships. Since they are nonlinear, no set of interactions can be represented by a set smaller than the set itself—superposition does not hold. This is one way of saying that complexity is not compressible. Moreover, we cannot accurately determine the boundaries of the system, because it is open. In order to model a system precisely, we therefore have to model each and every interaction in the system, each and every interaction with the environment—which is of course also complex—as well as each and every interaction in the history of the system. In short, we will have to model life, the universe and everything. There is no practical way of doing this.

This paragraph’s validity hinges on its notion of “precision” which, as I’ve explained, is not something any statistically informed modeler is going for. A few comments on the terminology here:

In formal algorithmic information theory (which has, through Solomonoff induction and the minimum description length principle, a close underlying connection with statistical inference) “compressibility” refers to whether or not a representation of some kind–such as a written description or set of data–can be, given some some language of interpretation or computation, represented in a briefer or more compressed form. When one writes a computational description of a model, that is likely its most compressed form. (If not, it is not written very well). When the model is executed, it simulates the relations of objects within the system dynamically. These relations may well be non-linear. All of this is very conventional. If the model is stochastic–meaning, containing randomized behavior, as many are–then there will of course be a broad distribution of outcomes. And it’s true that any particular distribution will not be compressible to the source code alone: the compression will need to include also all the random draws used in making the stochastic decisions of the simulation. However, only the source code and the random draws will be needed. So it is still quite possible for the specific state of the model to be compressible to something much less than a full description of the state!
Typically, a model of a complex system will designate some objects and relations as endogenous, meaning driven by the internals of the model, and other factors as exogenous, meaning coming from outside of the boundary of the model. If the exogenous factors are unknown, and they almost always are, then they will be modeled as a set of all possible inputs, possibly with a probability distribution of some kind. This distribution can be attained through, for example, random sampling as well as adjusting it to take into account what is unknown. (Probability distributions that express that we don’t know about something are sometimes called “maximum entropy distributions”, for reasons that are clear from information theory.)

So in this paragraph, which seems to reflect the core part of Cilliers’s argument, it’s frankly, not clear that he knows what he’s talking about. The most charitable interpretation, I believe, is this: Cilliers is not satisfied with probabilistic prediction, as almost anybody else doing computational modeling of complex systems is bound to be. Rather, he believes the kind of prediction that matters is the prediction of a specific outcome with absolute certainty. This, truly, is not possible to get for a complex enough system. Indeed, even simple stochastic systems cannot be predicted in this way.

Let’s call this Cilliers’s Best Negative Argument. What are the implications of it?

What does this amount to in practice? It means that we have to make decisions without having a model or a method that can predict the exact outcome of those decisions. … Does this mean we should avoid decisions, hoping that they will make themselves? Most definitely not. … Not to make a decision is of course also a decision. What, then, are the nature of our decisions? Because we cannot base them on calculation only—calculation would eliminate the need for choice—we have to acknowledge that our decisions have an ethical nature.

This argument is, to a scientist, weird. Suppose, as is likely the case, that we can never make exact predictions but only, at best, probabilistic predictions. In other words, suppose all of our decisions are decisions under uncertainty, which is a claim anybody trained in, say, machine learning is bound to agree with for reasons that have nothing to do with Cilliers. Does this mean that: (a) these decisions cannot be based on calculation, and (b) these decisions have an ethical nature?

Prima facie, (a) is false. Decisions under uncertainty are made based on calculation all the time. This is what decision theory, a well-developed branch of mathematics and philosophy used in economics, for example, is all about. Simply: one calculates the action that maximizes the expected value of the action, meaning the value of the possible outcomes weighted according to their probability.

It is surprising that somebody writing about “complex systems” in the year 2000 working in the field of management science would not address this point, as the von Neumann-Morgenstern theory of utility was developed in 1947 and is not at all a secret.

Perhaps, then, Cilliers is downplaying this because his real mission is to revitalize the ethical. So far, it seems he is saying: decision-making under uncertainty is always, unlike decision-making under conditions of certainty, ethical in some sense. Is that what he’s saying?

I do not take it to mean being nice or being altruistic. It has nothing to do with middleclass values, nor can it be reduced to some interpretation of current social norms. I use the word in a rather lean sense: it refers to the inevitability of choices that cannot be backed up scientifically or objectively.

…. What?

Why call it ethics? First, because the nature of the system or organization in question is determined by the collection of choices made in it.

Ok, this looks fine.

Secondly, since there is no final objective or calculable ground for our decisions, we cannot shift the responsibility for the decision on to something else—“Don’t blame me, the genetic algorithm said we should sell!” We know that all of our choices to some extent, even if only in a small way, incorporate a step in the dark. Therefore we cannot but be responsible for them.

We are getting to the crux of the argument. Decision-making under uncertainty, Cilliers argues, carries responsibility.

There are two parts to this argument. The first is, I find, the most interesting. Decision-making within a complex system is much more difficult and existentially defining than decision-making about a complex system. And precisely because it is existentially defining, I could see how it would carry special responsibility, or ethical weight.

However, for the aforementioned reasons, his hinging this argument on calculability is confusing and uncompelling. There may be many situations where the most responsible or ethical decision is one based on calculated expected results. For example, consider the decision to implement an economic lockdown policy in response to a pandemic. One could listen to political interests of various stripes and appease one or the other. But perhaps in such a situation is it most responsible to calculate, to the best of one’s ability, the probably outcome of one’s choices before implementing them.

And it seems like Cilliers would agree with this:

It may appear at this stage as if I am arguing against any kind of calculation, that I am dismissing the importance of modeling complex systems. Nothing is further from the truth. The important point I want to make is that calculation will never be sufficient. The last thing this could mean is that calculation is unnecessary. On the contrary, we have to do all the calculation we possibly can. That is the first part of our responsibility as scientists and managers. Calculation and modeling will provide us with a great deal of vital information.

This is a point of happy agreement!

It will just not provide us with all the information.

This is a truism nobody doing computational modeling work would argue with.

The problem would remain, however, that this information has to be
interpreted.
All the models we construct—whether they are formal, mathematical models, or qualitative, descriptive models—have to be limited. We cannot model life, the universe, and everything. There may not be any
explicit ethical component contained within the model itself, but ethics (in the sense in which I use the term) has already played its part when the limits of the model were determined, when the selection was made of what would be included in the frame of the investigation. The results produced by the model can never be interpreted independently of that frame. This is no revelation, it is something every scientist knows, or at least should know. Unfortunately, less scrupulous people, often the popularizers of some scientific idea or technique, extend the field of applicability of that idea way beyond the framework that gives it sense and meaning.

Well, this is quite frustrating. It turns out Cilliers is not writing this article for scientists working on computational modeling of complex systems. He’s writing this article, I guess, to warn people off of listening to charlatans. This is a worthy goal. But then why would he write in a way that is so misleading about the nature of computational decision-making? Once again, the insight Cilliers is missing is that the difference between a deterministic model and a probabilistic model is not a difference that makes the latter less “calculable” or “objective” or “scientific”, even though it may (quantitatively) have less information about the system it describes.

Cilliers goes on:

My position could be interpreted as an argument that contains some mystical or metaphysical component, slipped in under the name “ethics.” In order to forestall such an interpretation, I will digress briefly. It is often useful to distinguish between the notions “complex” and “complicated.” A jumbo jet is complicated, a mayonnaise is complex (a least for the French). A complicated system is something we can model accurately (at least in principle). Following this line of thought, one may argue that the notion “complex” is merely a term we use for something we cannot yet model. I have much sympathy for this argument. If one maintains that there is nothing metaphysical about a complex system, and that the notion of causality has to be retained, then perhaps a complex system is ultimately nothing more than extremely complicated. It should therefore be possible to model complex systems in principle, even though it may not be practical.

In conversations about this material, it seems that some are under the impression that a difference between a “complicated” and a “complex” system is a difference in kind. It is clear from this paragraph that for Cilliers, this is not the case. This would accord with all the mathematical theory of complexity which would identify how levels of complexity can be quantitatively measured. Missing from this paragraph, still, is any notion of probability or statistical accuracy. Which is too bad.

In the end, my assessment is that Cilliers is making a good try here and if he’s influential, as I suppose he might by in South Africa, then he’s done so by popularizing some of the work of mathematicians, physicists, etc. But because of some key omissions, his argument is confusing if not misleading. In particular, it is prone to be misinterpreted, as it does not deal with precision about the underlying technical material. I would not rely on it to teach students about complex systems and the limitations of modeling them. I would certainly not draw any clear, ontological lines between “complicated” and “complex” systems as Cilliers does not do this himself.

References

Cilliers, Paul (2000). “What can we learn from a theory of complexity?” (PDF). Emergence. 2.1: 23-33. doi:10.1207/S15327000EM0201_03.

3 Comments

September 22, 2019

Ashby’s Law and AI control

I’ve recently discovered Ashby’s Law, also know as the First Law of Cybernetics, by reading Stafford Beer’s “Designing Freedom” lectures. Ashby’s Law is a powerful idea, one I’ve been grasping at intuitively for some time. For example, here I was looking for something like it and thought I could get it from the Data Processing Inequality in information theory. I have not yet grokked the mathematical definition of Ashby’s Law, which I gather is in Ross Ashby’s An Introduction to Cybernetics. Though I am not sure yet, I expect the formulation there can use an update. But if I am right about its main claims, I think the argument of this post will stand.

Ashby’s Law is framed in terms of ‘variety’, which is the number of states that it is possible for a system to be in. A six-sided die has six possible states (if you’re just looking at the top of it). A laptop has many more. A brain has many more even than that. A complex organization with many people in it, all with laptops, has even more. And so on.

The law can be stated in many ways. One of them is that:

When the variety or complexity of the environment exceeds the capacity of a system (natural or artificial) the environment will dominate and ultimately destroy that system.

The law is about the relationship between a system and its environment. Or, in another sense, it is about a system to be controlled and a different system that tries to control that system. The claim is that the control unit needs to have at least as much variety as the system to be controlled for it to be effective.

This reminds me of an argument I had with a superintelligence theorist back when I was thinking about such things. The Superintelligence people, recall, worry about an AI getting the ability to improve itself recursively and causing an “intelligence explosion”. Its own intelligence, so to speak, explodes, surpassing all other intelligent life and giving it total domination over the fate of humanity.

Here is the argument that I posed a few years ago, reframed in terms of Ashby’s Law:

The AI in question is a control unit, C, and the world it would control is the system, S.
For the AI to have effective domination over S, C would need at least as much variety as S.
But S includes C within it. The control unit is part of the larger world.
Hence, no C can perfectly control S.

Superintelligence people will no doubt be unsatisfied by this argument. The AI need not be effective in the sense dictated by Ashby’s Law. It need only be capable of outmaneuvering humans. And so on.

However, I believe the argument gets at why it is difficult for complex control systems to ever truly master the world around them. It is very difficult for a control system to have effective control over itself, let alone itself in a larger systemic context, without some kind of order constraining the behavior of the total system (the system including the control unit) imposed from without. The idea that it is possible to gain total mastery or domination through an AI or better data systems is a fantasy because the technical controls adds their own complexity to the world that is to be controlled.

This is a bit of a paradox, as it raises the question of how any control unites work at all. I’ll leave this for another day.

July 10, 2017

The Law: Miller and Page on Emergence, and statistics in social science

I’m working now through Complex Adaptive Systems by Miller and Page and have been deeply impressed with the clarity with which they lay out key scientific principles.

In their chapter on “Emergence”, they discuss the key problem in science of accounting for how some phenomena emerge from lower level phenomena. In the hard sciences, examples include how the laws and properties of chemistry emerge from the laws and properties of particles as determined by physics. It has been suggested that the psychological states of the mind emerge from the physical states of the brain. In social sciences, there is the open question of how social forms emerge from individual behavior.

Miller and Page acknowledge that “unfortunately, emergence is one of those complex systems ideas that exists in a well-trodden, but relatively untracked, bog of discussions”. Epstein’s (2006) treatment of it is particular aggressive, as he takes aim at early emergence theorists who used the term in a kind of mystifying sense and then attempts to replace this usage with his own much more concrete one.

So far in my reading on the subject there has been a lack of mathematical rigor in the treatment of the subject, but I’ve been impressed now with what Miller and Page specifically bring to bear on the problem.

Miller and Page provide two clear criteria for an emergent phenomenon:

“Emergence is a phenomenon whereby well-formulated aggregate behavior arises from localized, individual behavior.
“Such aggregate behavior should be immune to reasonable variations in the individual behavior.”

Significantly, their first example of such an effect comes from statistics: it’s the Law of Large Numbers and related theorems like the Central Limit Theorem.

These are basic theorems in statistics about the properties of a sample of random variables. The Law of Large Numbers states that the average of a large number of samples will converge on the expected value of the expected value of one sample. The Central Limit Theorem states that the distribution of the sum of many identical and independent random variables will tends towards a normal (or Gaussian) distribution whatever the distribution of the underlying variables are.

Though mathematically statements about random variables and their aggregate value, Miller and Page correctly generalize from this to say that these Laws apply to the relationship between individual behavior and aggregate patterns. The emergent phenomena here (the mean or distribution of outcomes) fulfill their criteria for emergent properties: they are well formed and depend less and less on individual behavior the more individuals there are involved.

These Laws are taught in Statistics 101. What is under-emphasized, in my experience, is the extent to which these Laws are determinative of social phenonema. Miller and Page cite an intriguing short story by Robert Coates, entitled “The Law” (1956), that explores the idea of what would happen if the Law of Large Numbers gave out. Suddenly traffic patterns would be radically unpredictable as the number of people on the road, or in a shopping mall, or outdoors enjoying nature, would be far from average far more often than we’re used to. Absurdly, the short story ends when the statistical law is at last adopted by Congress. This is absurd because of course this is one Law that affects all social and physical reality all the time.

Where this fact crops up less frequently than it should is in discussions of the origins of distributions of wide inequality. Physicists have for a couple decades been promoting the idea that the highly unequal “long tail” distributions found in society are likely power law distributions. Clauset, Shalizi, and Newman have developed a statistical test which, when applied, demonstrates that the empirical support for many of these claims isn’t truly there. Often these distributions are empirically closer to a log normal distribution, which can be explained by the Central Limit Theorem when one combines variables through multiplication rather than addition. My own small and flawed contribution to this long and significant line of research is here.

As far as explanatory hypotheses go, the immutable laws of statistics have advantages and disadvantages. Their advantage is that they are always correct. The disadvantage of these Laws in particular is that they do not lend themselves to narrative explanation, which means they are in principle excluded from those social sciences that hold themselves to argument via narration. Narration, it is argued, is more interesting and compelling for audiences not well-versed in the general science of statistics. Since many social sciences are interested in discussion of inequality in society, this seems to put these disciplines at odds with each other. Some disciplines, the ones converging now into computational social science, will use these Laws and be correct, but uninteresting. Other disciplines will ignore these laws and be incorrect but more compelling to popular audiences.

This is a disturbing conclusion, one that I believe strikes deeply at the heart of the epistemic crisis affecting politics today. No wonder we have “post-truth” media and “fake news” when our social scientists can’t even bring themselves to accept the inconvenience of learning basic statistics. I’m not speaking out of abstract concern here. I’ve encountered this problem personally and quite dramatically myself through my early dissertation work. Trying to make this very point proved so anathema to the way social sciences have been constructed that I had to abandon the project for lack of comprehending faculty support. This is despite The Law, as Coates refers to it whimsically, being well known and “on the books” for a very, very long time.

It is perhaps disconcerting to social scientists that their fields of expertise may be characterized well by the same kind of laws, grounded in mathematics, that determine chemical interactions that the evolution of biological ecosystems. And indeed there is a strong discourse around downward causation in social systems that discusses the ways in which individuals in society may be different from individuals random variables in a large sample. However, a clear understanding of statistical generative processes must be brought to bear on the understanding of social phenomena as a kind of null hypothesis. These statistical laws are due high prior probability, in the Bayesian sense. I hope to discover one day how to formalize this intuitively clear conclusion in more authoritative, mathematical terms.

References

Benthall, S. “Testing Generative Models of Online Collaboration with BigBang (pp. 182–189).” Proceedings of the 14th Python in Science Conference. Available at https://conference. scipy. org/proceedings/scipy2015/sebastian_benthall. html. 2015.

Benthall, Sebastian. “Philosophy of computational social science.” Cosmos and History: The Journal of Natural and Social Philosophy 12.2 (2016): 13-30.

Coates, Robert M. 1956. “The Law.” In The World of Mathematics, Vol. 4, edited by James R. Newman, 2268-71. New York: Simon and Schuster.

Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. “Power-law distributions in empirical data.” SIAM review 51.4 (2009): 661-703.

Epstein, Joshua M. Generative social science: Studies in agent-based computational modeling. Princeton University Press, 2006.

Miller, John H., and Scott E. Page. Complex adaptive systems: An introduction to computational models of social life. Princeton university press, 2009.

Sawyer, R. Keith. “Simulating emergence and downward causation in small groups.” Multi-agent-based simulation. Springer Berlin Heidelberg, 2000. 49-67.

Digifesto

Category: complex adaptive systems

September 1, 2020

On Cilliers on “complex systems”

September 22, 2019

Ashby’s Law and AI control