data processing inequality

September 22, 2019

Ashby’s Law and AI control

I’ve recently discovered Ashby’s Law, also know as the First Law of Cybernetics, by reading Stafford Beer’s “Designing Freedom” lectures. Ashby’s Law is a powerful idea, one I’ve been grasping at intuitively for some time. For example, here I was looking for something like it and thought I could get it from the Data Processing Inequality in information theory. I have not yet grokked the mathematical definition of Ashby’s Law, which I gather is in Ross Ashby’s An Introduction to Cybernetics. Though I am not sure yet, I expect the formulation there can use an update. But if I am right about its main claims, I think the argument of this post will stand.

Ashby’s Law is framed in terms of ‘variety’, which is the number of states that it is possible for a system to be in. A six-sided die has six possible states (if you’re just looking at the top of it). A laptop has many more. A brain has many more even than that. A complex organization with many people in it, all with laptops, has even more. And so on.

The law can be stated in many ways. One of them is that:

When the variety or complexity of the environment exceeds the capacity of a system (natural or artificial) the environment will dominate and ultimately destroy that system.

The law is about the relationship between a system and its environment. Or, in another sense, it is about a system to be controlled and a different system that tries to control that system. The claim is that the control unit needs to have at least as much variety as the system to be controlled for it to be effective.

This reminds me of an argument I had with a superintelligence theorist back when I was thinking about such things. The Superintelligence people, recall, worry about an AI getting the ability to improve itself recursively and causing an “intelligence explosion”. Its own intelligence, so to speak, explodes, surpassing all other intelligent life and giving it total domination over the fate of humanity.

Here is the argument that I posed a few years ago, reframed in terms of Ashby’s Law:

The AI in question is a control unit, C, and the world it would control is the system, S.
For the AI to have effective domination over S, C would need at least as much variety as S.
But S includes C within it. The control unit is part of the larger world.
Hence, no C can perfectly control S.

Superintelligence people will no doubt be unsatisfied by this argument. The AI need not be effective in the sense dictated by Ashby’s Law. It need only be capable of outmaneuvering humans. And so on.

However, I believe the argument gets at why it is difficult for complex control systems to ever truly master the world around them. It is very difficult for a control system to have effective control over itself, let alone itself in a larger systemic context, without some kind of order constraining the behavior of the total system (the system including the control unit) imposed from without. The idea that it is possible to gain total mastery or domination through an AI or better data systems is a fantasy because the technical controls adds their own complexity to the world that is to be controlled.

This is a bit of a paradox, as it raises the question of how any control unites work at all. I’ll leave this for another day.

January 15, 2018

social structure and the private sector

The Human Cell

Academic social scientists leaning towards the public intellectual end of the spectrum love to talk about social norms.

This is perhaps motivated by the fact that these intellectual figures are prominent in the public sphere. The public sphere is where these norms are supposed to solidify, and these intellectuals would like to emphasize their own importance.

I don’t exclude myself from this category of persons. A lot of my work has been about social norms and technology design (Benthall, 2014; Benthall, Gürses and Nissenbaum, 2017)

But I also work in the private sector, and it’s striking how differently things look from that perspective. It’s natural for academics who participate more in the public sphere than the private sector to be biased in their view of social structure. From the perspective of being able to accurately understand what’s going on, you have to think about both at once.

That’s challenging for a lot of reasons, one of which is that the private sector is a lot less transparent than the public sphere. In general the internals of actors in the private sector are not open to the scrutiny of commentariat onlookers. Information is one of the many resources traded in pairwise interactions; when it is divulged, it is divulged strategically, introducing bias. So it’s hard to get a general picture of the private sector, even though accounts for a much larger proportion of the social structure that’s available than the public sphere. In other words, public spheres are highly over-represented in analysis of social structure due to the available of public data about them. That is worrisome from an analytic perspective.

It’s well worth making the point that the public/private dichotomy is problematic. Contextual integrity theory (Nissenbaum, 2009) argues that modern society is differentiated among many distinct spheres, each bound by its own social norms. Nissenbaum actually has a quite different notion of norm formation from, say, Habermas. For Nissenbaum, norms evolve over social history, but may be implicit. Contrast this with Habermas’s view that norms are the result of communicative rationality, which is an explicit and linguistically mediated process. The public sphere is a big deal for Habermas. Nissenbaum, a scholar of privacy, reject’s the idea of the ‘public sphere’ simpliciter. Rather, social spheres self-regulate and privacy, which she defines as appropriate information flow, is maintained when information flows according to these multiple self-regulatory regimes.

I believe Nissenbaum is correct on this point of societal differentiation and norm formation. This nuanced understanding of privacy as the differentiated management of information flow challenges any simplistic notion of the public sphere. Does it challenge a simplistic notion of the private sector?

Naturally, the private sector doesn’t exist in a vacuum. In the modern economy, companies are accountable to the law, especially contract law. They have to pay their taxes. They have to deal with public relations and are regulated as to how they manage information flows internally. Employees can sue their employers, etc. So just as the ‘public sphere’ doesn’t permit a total free-for-all of information flow (some kinds of information flow in public are against social norms!), so too does the ‘private sector’ not involve complete secrecy from the public.

As a hypothesis, we can posit that what makes the private sector different is that the relevant social structures are less open in their relations with each other than they are in the public sphere. We can imagine an autonomous social entity like a biological cell. Internally it may have a lot of interesting structure and organelles. Its membrane prevents this complexity leaking out into the aether, or plasma, or whatever it is that human cells float around in. Indeed, this membrane is necessary for the proper functioning of the organelles, which in turn allows the cell to interact properly with other cells to form a larger organism. Echoes of Francisco Varela.

It’s interesting that this may actually be a quantifiable difference. One way of modeling the difference between the internal and external-facing complexity of an entity is using information theory. The more complex internal state of the entity has higher entropy than the membrane. The fact that the membrane causally mediates interactions between the internals and the environment prevents information flow between them; this is captured by the Data Processing Inequality. The lack of information flow between the system internals and externals is quantified as lower mutual information between the two domains. At zero mutual information, the two domains are statistically independent of each other.

I haven’t worked out all the implications of this.

References

Benthall, Sebastian. (2015) Designing Networked Publics for Communicative Action. Jenny Davis & Nathan Jurgenson (eds.) Theorizing the Web 2014 [Special Issue]. Interface 1.1. (link)

Sebastian Benthall, Seda Gürses and Helen Nissenbaum (2017), “Contextual Integrity through the Lens of Computer Science”, Foundations and Trends® in Privacy and Security: Vol. 2: No. 1, pp 1-69. http://dx.doi.org/10.1561/3300000016

Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press.

December 18, 2017

The Data Processing Inequality and bounded rationality

I have long harbored the hunch that information theory, in the classic Shannon sense, and social theory are deeply linked. It has proven to be very difficult to find an audience for this point of view or an opportunity to work on it seriously. Shannon’s information theory is widely respected in engineering disciplines; many social theorists who are unfamiliar with it are loathe to admit that something from engineering should carry essential insights for their own field. Meanwhile, engineers are rarely interested in modeling social systems.

I’ve recently discovered an opportunity to work on this problem through my dissertation work, which is about privacy engineering. Privacy is a subtle social concept but also one that has been rigorously formalized. I’m working on formal privacy theory now and have been reminded of a theorem from information theory: the Data Processing Theorem. What strikes me about this theorem is that is captures an point that comes up again and again in social and political problems, though it’s a point that’s almost never addressed head on.

The Data Processing Inequality (DPI) states that for three random variables, X, Y, and Z, arranged in Markov Chain such that $X \rightarrow Y \rightarrow Z$ , then $I(X,Z) \leq I(X,Y)$ , where here $I$ stands for mutual information. Mutual information is a measure of how much two random variables carry information about each other. If $I(X,Y) = 0$, that means the variables are independent. $I(X,Y) \geq 0$ always–that’s just a mathematical fact about how it’s defined.

The implications of this for psychology, social theory, and artificial intelligence are I think rather profound. It provides a way of thinking about bounded rationality in a simple and generalizable way–something I’ve been struggling to figure out for a long time.

Suppose that there’s a big world out the, $W$ and there’s am organism, or a person, or a sociotechnical organization within it, $Y$ . The world is big and complex, which implies that it has a lot of informational entropy, $H(W)$ . Through whatever sensory apparatus is available to $Y$ , it acquires some kind of internal sensory state. Because this organism is much small than the world, its entropy is much lower. There are many fewer possible states that the organism can be in, relative to the number of states of the world. $H(W) >> H(Y)$ . This in turn bounds the mutual information between the organism and the world: $I(W,Y) \leq H(Y)$

Now let’s suppose the actions that the organism takes, $Z$ depend only on its internal state. It is an agent, reacting to its environment. Well whatever these actions are, they can only be so calibrated to the world as the agent had capacity to absorb the world’s information. I.e., $I(W,Z) \leq H(Y) << H(W)$ . The implication is that the more limited the mental capacity of the organism, the more its actions will be approximately independent of the state of the world that precedes it.

There are a lot of interesting implications of this for social theory. Here are a few cases that come to mind.

I've written quite a bit here (blog links) and here (arXiv) about Bostrom’s superintelligence argument and why I’m generally not concerned with the prospect of an artificial intelligence taking over the world. My argument is that there are limits to how much an algorithm can improve itself, and these limits put a stop to exponential intelligence explosions. I’ve been criticized on the grounds that I don’t specify what the limits are, and that if the limits are high enough then maybe relative superintelligence is possible. The Data Processing Inequality gives us another tool for estimating the bounds of an intelligence based on the range of physical states it can possibly be in. How calibrated can a hegemonic agent be to the complexity of the world? It depends on the capacity of that agent to absorb information about the world; that can be measured in information entropy.

A related case is a rendering of Scott’s Seeing Like a State arguments. Why is it that “high modernist” governments failed to successfully control society through scientific intervention? One reason is that the complexity of the system they were trying to manage vastly outsized the complexity of the centralized control mechanisms. Centralized control was very blunt, causing many social problems. Arguably, behavioral targeting and big data centers today equip controlling organizations with more informational capacity (more entropy), but they
still get it wrong sometimes, causing privacy violations, because they can’t model the entirety of the messy world we’re in.

The Data Processing Inequality is also helpful for explaining why the world is so messy. There are a lot of different agents in the world, and each one only has so much bandwidth for taking in information. This means that most agents are acting almost independently from each other. The guiding principle of society isn’t signal, it’s noise. That explains why there are so many disorganized heavy tail distributions in social phenomena.

Importantly, if we let the world at any time slice be informed by the actions of many agents acting nearly independently from each other in the slice before, then that increases the entropy of the world. This increases the challenge for any particular agent to develop an effective controlling strategy. For this reason, we would expect the world to get more out of control the more intelligence agents are on average. The popularity of the personal computer perhaps introduced a lot more entropy into the world, distributed in an agent-by-agent way. Moreover, powerful controlling data centers may increase the world’s entropy, rather than redtucing it. So even if, for example, Amazon were to try to take over the world, the existence of Baidu would be a major obstacle to its plans.

There are a lot of assumptions built into these informal arguments and I’m not wedded to any of them. But my point here is that information theory provides useful tools for thinking about agents in a complex world. There’s potential for using it for modeling sociotechnical systems and their limitations.

Digifesto

Tag: data processing inequality