Digifesto

Tag: superintelligence

Existentialism in Design: Comparison with “Friendly AI” research

Turing Test [xkcd]

I made a few references to Friendly AI research in my last post on Existentialism in Design. I positioned existentialism as an ethical perspective that contrasts with the perspective taken by the Friendly AI research community, among others. This prompted a response by a pseudonymous commenter (in a sadly condescending way, I must say) who linked me to a a post, “Complexity of Value” on what I suppose you might call the elite rationalist forum Arbital. I’ll take this as an invitation to elaborate on how I think existentialism offers an alternative to the Friendly AI perspective of ethics in technology, and particularly the ethics of artificial intelligence.

The first and most significant point of departure between my work on this subject and Friendly AI research is that I emphatically don’t believe the most productive way to approach the problem of ethics in AI is to consider the problem of how to program a benign Superintelligence. This is for reasons I’ve written up in “Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument”, which sums up arguments made in several blog posts about Nick Bostrom’s book on the subject. This post goes beyond the argument in the paper to address further objections I’ve heard from Friendly AI and X-risk enthusiasts.

What superintelligence gives researchers is a simplified problem. Rather than deal with many of the inconvenient contingencies of humanity’s technically mediated existence, superintelligence makes these irrelevant in comparison to the limiting case where technology not only mediates, but dominates. The question asked by Friendly AI researchers is how an omnipotent computer should be programmed so that it creates a utopia and not a dystopia. It is precisely because the computer is omnipotent that it is capable of producing a utopia and is in danger of creating a dystopia.

If you don’t think superintelligences are likely (perhaps because you think there are limits to the ability of algorithms to improve themselves autonomously), then you get a world that looks a lot more like the one we have now. In our world, artificial intelligence has been incrementally advancing for maybe a century now, starting with the foundations of computing in mathematical logic and electrical engineering. It proceeds through theoretical and engineering advances in fits and starts, often through the application of technology to solve particular problems, such as natural language processing, robotic control, and recommendation systems. This is the world of “weak AI”, as opposed to “strong AI”.

It is also a world where AI is not the great source of human bounty or human disaster. Rather, it is a form of economic capital with disparate effects throughout the total population of humanity. It can be a source of inspiring serendipity, banal frustration, and humor.

Let me be more specific, using the post that I was linked to. In it, Eliezer Yudkowsky posits that a (presumeably superintelligent) AI will be directed to achieve something, which he calls “value”. The post outlines a “Complexity of Value” thesis. Roughly, this means that the things that we want AI to do cannot be easily compressed into a brief description. For an AI to not be very bad, it will need to either contain a lot of information about what people really want (more than can be easily described) or collect that information as it runs.

That sounds reasonable to me. There’s plenty of good reasons to think that even a single person’s valuations are complex, hard to articulate, and contingent on their circumstances. The values appropriate for a world dominating supercomputer could well be at least as complex.

But so what? Yudkowsky argues that this thesis, if true, has implications for other theoretical issues in superintelligence theory. But does it address any practical questions of artificial intelligence problem solving or design? That it is difficult to mathematically specify all of values or normativity, and that to attempt to do so one would need to have a lot of data about humanity in its particularity, is a point that has been apparent to ethical philosophy for a long time. It’s a surprise or perhaps disappointment only to those who must mathematize everything. Articulating this point in terms of Kolmogorov complexity does not particularly add to the insight so much as translate it into an idiom used by particular researchers.

Where am I departing from this with “Existentialism in Design”?

Rather than treat “value” as a wholly abstract metasyntactic variable representing the goals of a superintelligent, omniscient machine, I’m approaching the problem more practically. First, I’m limiting myself to big sociotechnical complexes wherein a large number of people have some portion of their interactions mediated by digital networks and data centers and, why not, smartphones and even the imminent dystopia of IoT devices. This may be setting my work up for obsolescence, but it also grounds the work in potential action. Since these practical problems rely on much of the same mathematical apparatus as the more far-reaching problems, there is a chance that a fundamental theorem may arise from even this applied work.

That restriction on hardware may seem banal; but it’s a particular philosophical question that I am interested in. The motivation for considering existentialist ethics in particular is that it suggests new kinds of problems that are relevant to ethics but which have not been considered carefully or solved.

As I outlined in a previous post, many ethical positions are framed either in terms of consequentialism, evaluating the utility of a variety of outcomes, or deontology, concerned with the consistency of behavior with more or less objectively construed duties. Consequentialism is attractive to superintelligence theorists because they imagine their AI’s to have to ability to cause any consequence. The critical question is how to give it a specification the leads to the best or adequate consequences for humanity. This is a hard problem, under their assumptions.

Deontology is, as far as I can tell, less interesting to superintelligence theorists. This may be because deontology tends to be an ethics of human behavior, and for superintelligence theorists human behavior is rendered virtually insignificant by superintelligent agency. But deontology is attractive as an ethics precisely because it is relevant to people’s actions. It is intended as a way of prescribing duties to a person like you and me.

With Existentialism in Design (a term I may go back and change in all these posts at some point; I’m not sure I love the phrase), I am trying to do something different.

I am trying to propose an agenda for creating a more specific goal function for a limited but still broad-reaching AI, assigning something to its ‘value’ variable, if you will. Because the power of the AI to bring about consequences is limited, its potential for success and failure is also more limited. Catastrophic and utopian outcomes are not particularly relevant; performance can be evaluated in a much more pedestrian way.

Moreover, the valuations internalized by the AI are not to be done in a directly consequentialist way. I have suggested that an AI could be programmed to maximize the meaningfulness of its choices for its users. This is introducing a new variable, one that is more semantically loaded than “value”, though perhaps just as complex and amorphous.

Particular to this variable, “meaningfulness”, is that it is a feature of the subjective experience of the user, or human interacting with the system. It is only secondarily or derivatively an objective state of the world that can be evaluated for utility. To unpack in into a technical specification, we will require a model (perhaps a provisional one) of the human condition and what makes life meaningful. This very well may include such things as the autonomy, or the ability to make one’s own choices.

I can anticipate some objections along the lines that what I am proposing still looks like a special case of more general AI ethics research. Is what I’m proposing really fundamentally any different than a consequentialist approach?

I will punt on this for now. I’m not sure of the answer, to be honest. I could see it going one of two different ways.

The first is that yes, what I’m proposing can be thought of as a narrow special case of a more broadly consequentialist approach to AI design. However, I would argue that the specificity matters because of the potency of existentialist moral theory. The project of specify the latter as a kind of utility function suitable for programming into an AI is in itself a difficult and interesting problem without it necessarily overturning the foundations of AI theory itself. It is worth pursuing at the very least as an exercise and beyond that as an ethical intervention.

The second case is that there may be something particular about existentialism that makes encoding it different from encoding a consequentialist utility function. I suspect, but leave to be shown, that this is the case. Why? Because existentialism (which I haven’t yet gone into much detail describing) is largely a philosophy about how we (individually, as beings thrown into existence) come to have values in the first place and what we do when those values or the absurdity of circumstances lead us to despair. Existentialism is really a kind of phenomenological metaethics in its own right, one that is quite fluid and resists encapsulation in a utility calculus. Most existentialists would argue that at the point where one externalizes one’s values as a utility function as opposed to living as them and through them, one has lost something precious. The kinds of things that existentialism derives ethical imperatives from, such as the relationship between one’s facticity and transcendence, or one’s will to grow in one’s potential and the inevitability of death, are not the kinds of things a (limited, realistic) AI can have much effect on. They are part of what has been perhaps quaintly called the human condition.

To even try to describe this research problem, one has to shift linguistic registers. The existentialist and AI research traditions developed in very divergent contexts. This is one reason to believe that their ideas are new to each other, and that a synthesis may be productive. In order to accomplish this, one needs a charitably considered, working understanding of existentialism. I will try to provide one in my next post in this series.

Advertisements

More assessment of AI X-risk potential

I’m been stimulated by Luciano Floridi’s recent article in Aeon “Should we be afraid of AI?”. I’m surprised that this issue hasn’t been settled yet, since it seems like “we” have the formal tools necessary to solve the problem decisively. But nevertheless this appears to be the subject of debate.

I was referred to Kaj Sotala’s rebuttal of an earlier work by Floridi which his Aeon article was based on. The rebuttal appears in this APA Newsletter on Philosophy and Computers. It is worth reading.

The issue that I’m most interested in is whether or not AI risk research should constitute a special, independent branch of research, or whether it can be approached just as well by pursuing a number of other more mainstream artificial intelligence research agendas. My primary engagement with these debates has so far been an analysis of Nick Bostrom’s argument in his book Superintelligence, which tries to argue in particular that there is an existential risk (or X-risk) to humanity from artificial intelligence. “Existential risk” means a risk to the existence of something, in this case humanity. And the risk Bostrom has written about is the risk of eponymous superintelligence: an artificial intelligence that gets smart enough to improve its own intelligence, achieve omnipotence, and end the world as we know it.

I’ve posted my rebuttal to this argument on arXiv. The one-sentence summary of the argument is: algorithms can’t just modify themselves into omnipotence because they will hit performance bounds due to data and hardware.

A number of friends have pointed out to me that this is not a decisive argument. They say: don’t you just need the AI to advance fast enough and far enough to be an existential threat?

There are a number of reasons why I don’t believe this is likely. In fact, I believe that it is provably vanishingly unlikely. This is not to say that I have a proof, per se. I suppose it’s incumbent on me to work it out and see if the proof is really there.

So: Herewith is my Sketch Of A Proof of why there’s no significant artificial intelligence existential risk.

Lemma: Intelligence advances due to purely algorithmic self-modificiation will always plateau due to data and hardware constraints, which advance more slowly.

Proof: This paper.

As a consequence, all artificial intelligence explosions will be sigmoid. That is, starting slow, accelerating, then decelerating, the growing so slowly as to be asymptotic. Let’s call the level of intelligence at which an explosion asymptotes the explosion bound.

There’s empirical support for this claim. Basically, we have never had a really big intelligence explosion due to algorithmic improvement alone. Looking at the impressive results of the last seventy years, most of the impressiveness can be attributed to advances in hardware and data collection. Notoriously, Deep Learning is largely just decades old artificial neural network technology repurposed to GPU’s on the cloud. Which is awesome and a little scary. But it’s not an algorithmic intelligence explosion. It’s a consolidation of material computing power and sensor technology by organizations. The algorithmic advances fill those material shoes really quickly, it’s true. This is precisely the point: it’s not the algorithms that’s the bottleneck.

Observation: Intelligence explosions are happening all the time. Most of them are small.

Once we accept the idea that intelligence explosions are all bounded, it becomes rather arbitrary where we draw the line between an intelligence explosion and some lesser algorithmic intelligence advance. There is a real sense in which any significant intelligence advance is a sigmoid expansion in intelligence. This would include run-of-the-mill scientific discoveries and good ideas.

If intelligence explosions are anything like virtually every other interesting empirical phenomenon, then they are distributed according to a heavy tail distribution. This means a distribution with a lot of very small values and a diminishing probability of higher values that nevertheless assigns some probability to very high values. Assuming intelligence is something that can be quantified and observed empirically (a huge ‘if’ taken for granted in this discussion), we can (theoretically) take a good hard look at the ways intelligence has advanced. Look around you. Do you see people and computers getting smarter all the time, sometimes in leaps and bounds but most of the time minutely? That’s a confirmation of this hypothesis!

The big idea here is really just to assert that there is a probability distribution over intelligence explosion bounds that all actual intelligence explosions are being drawn from. This follows more or less directly from the conclusion that all intelligence explosions are bounded. Once we posit such a distribution, it becomes possible to take expected values of functions of its values and functions of its values.

Empirical claim: Hardware and sensing advances diffuse rapidly relative to their contribution to intelligence gains.

There’s an material, socio-technical analog to Bostrom’s explosive superintelligence. We could imagine a corporation that is working in secret on new computing infrastructure. Whenever it has an advance in computing infrastructure, the AI people (or increasingly, the AI-writing-AI) develops programming that maximizes its use of this new technology. Then it uses that technology to enrich its own computer-improving facilities. When it needs more…minerals…or whatever it needs to further its research efforts, it finds a way to get them. It proceeds to take over the world.

This may presently be happening. But evidence suggests that this isn’t how the technology economy really works. No doubt Amazon (for example) is using Amazon Web Services internally to do its business analytics. But also it makes its business out of selling out its computing infrastructure to other organizations as a commodity. That’s actually the best way it can enrich itself.

What’s happening here is the diffusion of innovation, which is a well-studied phenomenon in economics and other fields. Ideas spread. Technological designs spread. I’d go so far as to say that it is often (perhaps always?) the best strategy for some agent that has locally discovered a way to advance its own intelligence to figure out how to trade that intelligence to other agents. Almost always that trade involves the diffusion of the basis of that intelligence itself.

Why? Because since there are independent intelligence advances of varying sizes happening all the time, there’s actually a very competitive market for innovation that quickly devalues any particular gain. A discovery, if hoarded, will likely be discovered by somebody else. The race to get credit for any technological advance at all motivates diffusion and disclosure.

The result is that the distribution of innovation, rather than concentrating into very tall spikes, is constantly flattening and fattening itself. That’s important because…

Claim: Intelligence risk is not due to absolute levels of intelligence, but relative intelligence advantage.

The idea here is that since humanity is composed of lots of interacting intelligence sociotechnical organizations, any hostile intelligence is going to have a lot of intelligent adversaries. If the game of life can be won through intelligence alone, then it can only be won with a really big intelligence advantage over other intelligent beings. It’s not about absolute intelligence, it’s intelligence inequality we need to worry about.

Consequently, the more intelligence advances (i.e, technologies) diffuse, the less risk there is.

Conclusion: The chance of an existential risk from an intelligence explosion is small and decreasing all the time.

So consider this: globally, there’s tons of investment in technologies that, when discovered, allow for local algorithmic intelligence explosions.

But even if we assume these algorithmic advances are nearly instantaneous, they are still bounded.

Lots of independent bounded explosions are happening all the time. But they are also diffusing all the time.

Since the global intelligence distribution is always fattening, that means that the chance of any particular technological advance granting a decisive advantage over others is decreasing.

There is always the possibility of a fluke, of course. But if there was going to be a humanity destroying technological discovery, it would probably have already been invented and destroyed us. Since it hasn’t, we have a lot more resilience to threats from intelligence explosions, not to mention a lot of other threats.

This doesn’t mean that it isn’t worth trying to figure out how to make AI better for people. But it does diminish the need to think about artificial intelligence as an existential risk. It makes AI much more comparable to a biological threat. Biological threats could be really bad for humanity. But there’s also the organic reality that life is very resilient and human life in general is very secure precisely because it has developed so much intelligence.

I believe that thinking about the risks of artificial intelligence as analogous to the risks from biological threats is helpful for prioritizing where research effort in artificial intelligence should go. Just because AI doesn’t present an existential risk to all of humanity doesn’t mean it doesn’t kill a lot of people or make their lives miserable. On the contrary, we are in a world with both a lot of artificial and non-artificial intelligence and a lot of miserable and dying people. These phenomena are not causally disconnected. A good research agenda for AI could start with an investigation of these actually miserable people and what their problems are, and how AI is causing that suffering or alternatively what it could do to improve things. That would be an enormously more productive research agenda than one that aims primarily to reduce the impact of potential explosions which are diminishingly unlikely to occur.

arXiv preprint of Refutation of Bostrom’s Superintelligence Argument released

I’ve written a lot of blog posts about Nick Bostrom’s book Superintelligence, presented what I think is a refutation of his core argument.

Today I’ve released an arXiv preprint with a more concise and readable version of this argument. Here’s the abstract:

Don’t Fear the Reaper: Refuting Bostrom’s Superintelligence Argument

In recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become “superintelligent” and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent’s ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage.

I invite any feedback on this work.

The recalcitrance of prediction

We have identified how Bostrom’s core argument for superintelligence explosion depends on a crucial assumption. An intelligence explosion will happen only if the kinds of cognitive capacities involved in instrumental reason are not recalcitrant to recursive self-improvement. If recalcitrance rises comparably with the system’s ability to improve itself, then the takeoff will not be fast. This significantly decreases the probability of decisively strategic singleton outcomes.

In this section I will consider the recalcitrance of intelligent prediction, which is one of the capacities that is involved in instrumental reason (another being planning). Prediction is a very well-studied problem in artificial intelligence and statistics and so is easy to characterize and evaluate formally.

Recalcitrance is difficult to formalize. Recall that in Bostrom’s formulation:

\frac{dI}{dt} = \frac{O(I)}{R(I)}

One difficulty in analyzing this formula is that the units are not specified precisely. What is a “unit” of intelligence? What kind of “effort” is the unit of optimization power? And how could one measure recalcitrance?

A benefit of looking at a particular intelligent task is that it allows us to think more concretely about what these terms mean. If we can specify which tasks are important to consider, then we can take the level of performance on those well-specified class of problems as measures of intelligence.

Prediction is one such problem. In a nutshell, prediction comes down to estimating a probability distribution over hypotheses. Using the Bayesian formulation of statistical influence, we can represent the problem as:

P(H|D) = \frac{P(D|H) P(H)}{P(D)}

Here, P(H|D) is the posterior probability of a hypothesis H given observed data D. If one is following statistically optimal procedure, one can compute this value by taking the prior probability of the hypothesis P(H), multiplying it by the likelihood of the data given the hypothesis P(D|H), and then normalizing this result by dividing by the probability of the data over all models, P(D) = \sum_{i}P(D|H_i)P(H_i).

Statisticians will justifiably argue whether this is the best formulation of prediction. And depending on the specifics of the task, the target value may well be some function of posterior (such as the hypothesis with maximum likelihood) and the overall distribution may be secondary. These are valid objections that I would like to put to one side in order to get across the intuition of an argument.

What I want to point out is that if we look at the factors that affect performance on prediction problems, there a very few that could be subject to algorithmic self-improvement. If we think that part of what it means for an intelligent system to get more intelligent is to improve its ability of prediction (which Bostrom appears to believe), but improving predictive ability is not something that a system can do via self-modification, then that implies that the recalcitrance of prediction, far from being constant or lower, actually approaches infinity with respect the an autonomous system’s capacity for algorithmic self-improvement.

So, given the formula above, in what ways can an intelligent system improve its capacity to predict? We can enumerate them:

  • Computational accuracy. An intelligent system could be better or worse at computing the posterior probabilities. Since most of the algorithms that do this kind of computation do so with numerical approximation, there is the possibility of an intelligent system finding ways to improve the accuracy of this calculation.
  • Computational speed. There are faster and slower ways to compute the inference formula. An intelligent system could come up with a way to make itself compute the answer faster.
  • Better data. The success of inference is clearly dependent on what kind of data the system has access to. Note that “better data” is not necessarily the same as “more data”. If the data that the system learns from is from a biased sample of the phenomenon in question, then a successful Bayesian update could make its predictions worse, not better. Better data is data that is informative with respect to the true process that generated the data.
  • Better prior. The success of inference depends crucially on the prior probability assigned to hypotheses or models. A prior is better when it assigns higher probability to the true process that generates observable data, or models that are ‘close’ to that true process. An important point is that priors can be bad in more than one way. The bias/variance tradeoff is well-studied way of discussing this. Choosing a prior in machine learning involves a tradeoff between:
    1. Bias. The assignment of probability to models that skew away from the true distribution. An example of a biased prior would be one that gives positive probability to only linear models, when the true phenomenon is quadratic. Biased priors lead to underfitting in inference.
    2. Variance.The assignment of probability to models that are more complex than are needed to reflect the true distribution. An example of a high-variance prior would be one that assigns high probability to cubic functions when the data was generated by a quadratic function. The problem with high variance priors is that they will overfit data by inferring from noise, which could be the result of measurement error or something else less significant than the true generative process.

    In short, there best prior is the correct prior, and any deviation from that increases error.

Now that we have enumerate the ways in which an intelligent system may improve its power of prediction, which is one of the things that’s necessary for instrumental reason, we can ask: how recalcitrant are these factors to recursive self-improvement? How much can an intelligent system, by virtue of its own intelligence, improve on any of these factors?

Let’s start with computational accuracy and speed. An intelligent system could, for example, use some previously collected data and try variations of its statistical inference algorithm, benchmark their performance, and then choose to use the most accurate and fastest ones at a future time. Perhaps the faster and more accurate the system is at prediction generally, the faster and more accurately it would be able to engage in this process of self-improvement.

Critically, however, there is a maximum amount of performance that one can get from improvements to computational accuracy if you hold the other factors constant. You can’t be more accurate than perfectly accurate. Therefore, at some point recalcitrance of computational accuracy rises to infinity. Moreover, we would expect that effort made at improving computational accuracy would exhibit diminishing returns. In other words, recalcitrance of computational accuracy climbs (probably close to exponentially) with performance.

What is the recalcitrance of computational speed at inference? Here, performance is limited primarily by the hardware on which the intelligent system is implemented. In Bostrom’s account of superintelligence explosion, he is ambiguous about whether and when hardware development counts as part of a system’s intelligence. What we can say with confidence, however, is that for any particular piece of hardware there will be a maximum computational speed attainable with with, and that recursive self-improvement to computational speed can at best approach and attain this maximum. At that maximum, further improvement is impossible and recalcitrance is again infinite.

What about getting better data?

Assuming an adequate prior and the computational speed and accuracy needed to process it, better data will always improve prediction. But it’s arguable whether acquiring better data is something that can be done by an intelligent system working to improve itself. Data collection isn’t something that the intelligent system can do autonomously, since it has to interact with the phenomenon of interest to get more data.

If we acknowledge that data collection is a critical part of what it takes for an intelligent system to become more intelligent, then that means we should shift some of our focus away from “artificial intelligence” per se and onto ways in which data flows through society and the world. Regulations about data locality may well have more impact on the arrival of “superintelligence” than research into machine learning algorithms now that we have very faster, very accurate algorithms already. I would argue that the recent rise in interest in artificial intelligence is due mainly to availability of vast amounts of new data through sensors and the Internet. Advances in computational accuracy and speed (such as Deep Learning) have to catch up to this new availability of data and use new hardware, but data is the rate limiting factor.

Lastly, we have to ask: can a system improve its own prior, if data, computational speed, and computational accuracy are constant?

I have to argue that it can’t do this in any systematic way, if we are looking at the performance of the system at the right level of abstraction. Potentially a machine learning algorithm could modify its prior if it sees itself as underperforming in some ways. But there is a sense in which any modification to the prior made by the system that is not a result of a Bayesian update is just part of the computational basis of the original prior. So recalcitrance of the prior is also infinite.

We have examined the problem of statistical inference and ways that an intelligent system could improve its performance on this task. We identified four potential factors on which it could improve: computational accuracy, computational speed, better data, and a better prior. We determined that contrary to the assumption of Bostrom’s hard takeoff argument, the recalcitrance of prediction is quite high, approaching infinity in the cases of computational accuracy, computational speed, and the prior. Only data collections to be flexibly recalcitrant. But data collection is not a feature of the intelligent system alone but also depends on its context.

As a result, we conclude that the recalcitrance of prediction is too high for an intelligence explosion that depends on it to be fast. We also note that those concerned about superintelligent outcomes should shift their attention to questions about data sourcing and storage policy.

The relationship between Bostrom’s argument and AI X-Risk

One reason why I have been writing about Bostrom’s superintelligence argument is because I am acquainted with what could be called the AI X-Risk social movement. I think it is fair to say that this movement is a subset of Effective Altruism (EA), a laudable movement whose members attempt to maximize their marginal positive impact on the world.

The AI X-Risk subset, which is a vocal group within EA, sees the emergence of a superintelligent AI as one of several risks that is notably because it could ruin everything. AI is considered to be a “global catastrophic risk” unlike more mundane risks like tsunamis and bird flu. AI X-Risk researchers argue that because of the magnitude of the consequences of the risk they are trying to anticipate, they must raise more funding and recruit more researchers.

While I think this is noble, I think it is misguided for reasons that I have been outlining in this blog. I am motivated to make these arguments because I believe that there are urgent problems/risks that are conceptually adjacent (if you will) to the problem AI X-Risk researchers study, but that the focus on AI X-Risk in particular diverts interest away from them. In my estimation, as more funding has been put into evaluating potential risks from AI many more “mainstream” researchers have benefited and taken on projects with practical value. To some extent these researchers benefit from the alarmism of the AI X-Risk community. But I fear that their research trajectory is thereby distorted from where it could truly provide maximal marginal value.

My reason for targeting Bostrom’s argument for the existential threat of superintelligent AI is that I believe it’s the best defense of the AI X-Risk thesis out there. In particular, if valid the argument should significantly raise the expected probability of an existentially risky AI outcome. For Bostrom, it is likely a natural consequence of advancement in AI research more generally because of recursive self-improvement and convergent instrumental values.

As I’ve informally work shopped this argument I’ve come upon this objection: Even if it is true that a superintelligent system would not for systematic reasons become a existentially risky singleton, that does not mean that somebody couldn’t develop such a superintelligent system in an unsystematic way. There is still an existential risk, even if it is much lower. And because existential risks are so important, surely we should prepare ourselves for even this low probability event.

There is something inescapable about this logic. However, the argument applies equally well to all kinds of potential apocalypses, such as enormous meteors crashing into the earth and biowarfare produced zombies. Without some kind of accounting of the likelihood of these outcomes, it’s impossible to do a rational budgeting.

Moreover, I have to call into question the rationality of this counterargument. If Bostrom’s arguments are used in defense of the AI X-Risk position but then the argument is dismissed as unnecessary when it is challenged, that suggests that the AI X-Risk community is committed to their cause for reasons besides Bostrom’s argument. Perhaps these reasons are unarticulated. One could come up with all kinds of conspiratorial hypotheses about why a group of people would want to disingenuously spread the idea that superintelligent AI poses an existential threat to humanity.

The position I’m defending on this blog (until somebody convinces me otherwise–I welcome all comments) is that a superintelligent AI singleton is not a significantly likely X-Risk. Other outcomes that might be either very bad or very good, such as ones with many competing and cooperating superintelligences, are much more likely. I’d argue that it’s more or less what we have today, if you consider sociotechnical organizations as a form of collective superintelligence. This makes research into this topic not only impactful in the long run, but also relevant to problems faced by people now and in the near future.

Bostrom and Habermas: technical and political moralities, and the God’s eye view

An intriguing chapter that follows naturally from Nick Bostrom’s core argument is his discussion of machine ethics writ large. He asks: suppose one could install into an omnipotent machine ethical principles, trusting it with the future of humanity. What principles should we install?

What Bostrom accomplishes by positing his Superintelligence (which begins with something simply smarter than humans, and evolves over the course of the book into something that takes over the galaxy) is a return to what has been called “the God’s eye view”. Philosophers once attempted to define truth and morality according to perspective of an omnipotent–often both transcendent and immanent–god. Through the scope of his work, Bostrom has recovered some of these old themes. He does this not only through his discussion of Superintelligence (and positing its existence in other solar systems already) but also through his simulation arguments.

The way I see it, one thing I am doing by challenging the idea of an intelligence explosion and its resulting in a superintelligent singleton is problematizing this recovery of the God’s Eye view. If your future world is governed by many sovereign intelligent systems instead of just one, then ethics are something that have to emerge from political reality. There is something irreducibly difficult about interacting with other intelligences and it’s from this difficulty that we get values, not the other way around. This sort of thinking is much more like Habermas’s mature ethical philosophy.

I’ve written about how to apply Habermas to the design of networked publics that mediate political interactions between citizens. What I built and offer as toy example in that paper, @TheTweetserve, is simplistic but intended just as a proof of concept.

As I continue to read Bostrom, I expect a convergence on principles. “Coherent extrapolated volition” sounds a lot like a democratic governance structure with elected experts at first pass. The question of how to design a governance structure or institution that leverages artificial intelligence appropriately while legitimately serving its users motivates my dissertation research. My research so far has only scratched the surface of this problem.

Recalcitrance examined: an analysis of the potential for superintelligence explosion

To recap:

  • We have examined the core argument from Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies regarding the possibility of a decisively strategic superintelligent singleton–or, more glibly, an artificial intelligence that takes over the world.
  • With an eye to evaluating whether this outcome is particularly likely relative to other futurist outcomes, we have distilled the argument and in so doing have reduced it to a simpler problem.
  • That problem is to identify bounds on the recalcitrance of the capacities that are critical for instrumental reasoning. Recalcitrance is defined as the inverse of the rate of increase to intelligence per time per unit of effort put into increasing that intelligence. It is meant to capture how hard it is to make an intelligent system smarter, and in particular how hard it is for an intelligent system to make itself smarter. Bostrom’s argument is that if an intelligent system’s recalcitrance is constant or lower, then it is possible for the system to undergo an “intelligence explosion” and take over the world.
  • By analyzing how Bostrom’s argument depends only on the recalcitrance of instrumentality, and not of the recalcitrance of intelligence in general, we can get a firmer grip on the problem. In particular, we can focus on such tasks as prediction and planning. If we discover that these tasks are in fact significantly recalcitrant that should reduce our expected probability of an AI singleton and consequently cause us to divert research funds to problems that anticipate other outcomes.

In this section I will look in further depth at the parts of Bostrom’s intelligence explosion argument about optimization power and recalcitrance. How recalcitrant must a system be for it to not be susceptible to an intelligence explosion?

This section contains some formalism. For readers uncomfortable with that, trust me: if the system’s recalcitrance is roughly proportional to the amount that the system is able to invest in its own intelligence, then the system’s intelligence will not explode. Rather, it will climb linearly. If the system’s recalcitrance is significantly greater than the amount that the system can invest in its own intelligence, then the system’s intelligence won’t even climb steadily. Rather, it will plateau.

To see why, recall from our core argument and definitions that:

Rate of change in intelligence = Optimization power / Recalcitrance.

Optimization power is the amount of effort that is put into improving the intelligence of system. Recalcitrance is the resistance of that system to improvement. Bostrom presents this as a qualitative formula then expands it more formally in subsequent analysis.

\frac{dI}{dt} = \frac{O(I)}{R}

Bostrom’s claim is that for instrumental reasons an intelligent system is likely to invest some portion of its intelligence back into improving its intelligence. So, by assumption we can model O(I) = \alpha I + \beta for some parameters \alpha and \beta, where 0 < \alpha < 1 and \beta represents the contribution of optimization power by external forces (such as a team of researchers). If recalcitrance is constant, e.g R = k, then we can compute:

\Large \frac{dI}{dt} = \frac{\alpha I + \beta}{k}

Under these conditions, I will be exponentially increasing in time t. This is the “intelligence explosion” that gives Bostrom’s argument so much momentum. The explosion only gets worse if recalcitrance is below a constant.

In order to illustrate how quickly the “superintelligence takeoff” occurs under this model, I’ve plotted the above function plugging in a number of values for the parameters \alpha, \beta and k. Keep in mind that the y-axis is plotted on a log scale, which means that a roughly linear increase indicates exponential growth.

Plot of exponential takeoff rates

Modeled superintelligence takeoff where rate of intelligence gain is linear in current intelligence and recalcitrance is constant. Slope in the log scale is determine by alpha and k values.

It is true that in all the above cases, the intelligence function is exponentially increasing over time. The astute reader will notice that by my earlier claim \alpha cannot be greater than 1, and so one of the modeled functions is invalid. It’s a good point, but one that doesn’t matter. We are fundamentally just modeling intelligence expansion as something that is linear on the log scale here.

However, it’s important to remember that recalcitrance may also be a function of intelligence. Bostrom does not mention the possibility of recalcitrance being increasing in intelligence. How sensitive to intelligence would recalcitrance need to be in order to prevent exponential growth in intelligence?

Consider the following model where recalcitrance is, like optimization power, linearly increasing in intelligence.

\frac{dI}{dt} = \frac{\alpha_o I + \beta_o}{\alpha_r I + \beta_r}

Now there are four parameters instead of three. Note this model is identical to the one above it when \alpha_r = 0. Plugging in several values for these parameters and plotting again with the y-scale on the log axis, we get:

Plot of takeoff when both optimization power and recalcitrance are linearly increasing in intelligence. Only when recalcitrance is unaffected by intelligence level is there an exponential takeoff. In the other cases, intelligence quickly plateaus on the log scale. No matter how much the system can invest in its own optimization power as a proportion of its total intelligence, it still only takes off at a linear rate.

Plot of takeoff when both optimization power and recalcitrance are linearly increasing in intelligence. Only when recalcitrance is unaffected by intelligence level is there an exponential takeoff. In the other cases, intelligence quickly plateaus on the log scale. No matter how much the system can invest in its own optimization power as a proportion of its total intelligence, it still only takes off at a linear rate.

The point of this plot is to illustrate how easily exponential superintelligence takeoff might be stymied by a dependence of recalcitrance on intelligence. Even in the absurd case where the system is able to invest a thousand times as much intelligence that it already has back into its own advancement, and a large team steadily commits a million “units” of optimization power (whatever that means–Bostrom is never particularly clear on the definition of this), a minute linear dependence of recalcitrance on optimization power limits the takeoff to linear speed.

Are the reasons to think that recalcitrance might increase as intelligence increases? Prima facie, yes. Here’s a simple thought experiment: What if there is some distribution of intelligence algorithm advances that are available in nature and that some of them are harder to achieve than others. A system that dedicates itself to advancing its own intelligence, knowing that it gets more optimization power as it gets more intelligent, might start by finding the “low hanging fruit” of cognitive enhancement. But as it picks the low hanging fruit, it is left with only the harder discoveries. Therefore, recalcitrance increases as the system grows more intelligent.

This is not a decisive argument against fast superintelligence takeoff and the possibility of a decisively strategic superintelligent singleton. Above is just an argument about why it is important to consider recalcitrance carefully when making claims about takeoff speed, and to counter what I believe is a bias in Bostrom’s work towards considering unrealistically low recalcitrance levels.

In future work, I will analyze the kinds of instrumental intelligence tasks, like prediction and planning, that we have identified as being at the core of Bostrom’s superintelligence argument. The question we need to ask is: does the recalcitrance of prediction tasks increase as the agent performing them becomes better at prediction? And likewise for planning. If prediction and planning are the two fundamental components of means-ends reasoning, and both have recalcitrance that increases significantly with the intelligence of the agent performing them, then we have reason to reject Bostrom’s core argument and assign a very low probability to the doomsday scenario that occupies much of Bostrom’s imagination in Superintelligence. If this is the case, that suggests we should be devoting resources to anticipating what he calls multipolar scenarios, where no intelligent system has a decisive strategic advantage, instead.

Instrumentality run amok: Bostrom and Instrumentality

Narrowing our focus onto the crux of Bostrom’s argument, we can see how tightly it is bound to a much older philosophical notion of instrumental reason. This comes to the forefront in his discussion of the orthogonality thesis (p.107):

The orthogonality thesis
Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.

Bostrom goes on to clarify:

Note that the orthogonality thesis speaks not of rationality or reason, but of intelligence. By “intelligence” we here mean something like skill at prediction, planning, and means-ends reasoning in general. This sense of instrumental cognitive efficaciousness is most relevant when we are seeking to understand what the causal impact of a machine superintelligence might be.

Bostrom maintains that the generality of instrumental intelligence, which I would argue is evinced by the generality of computing, gives us a way to predict how intelligent systems will act. Specifically, he says that an intelligent system (and specifically a superintelligent) might be predictable because of its design, because of its inheritance of goals from a less intelligence system, or because of convergent instrumental reasons. (p.108)

Return to the core logic of Bostrom’s argument. The existential threat posed by superintelligence is simply that the instrumental intelligence of an intelligent system will invest in itself and overwhelm any ability by us (its well-intentioned creators) to control its behavior through design or inheritance. Bostrom thinks this is likely because instrumental intelligence (“skill at prediction, planning, and means-ends reasoning in general”) is a kind of resource or capacity that can be accumulated and put to other uses more widely. You can use instrumental intelligence to get more instrumental intelligence; why wouldn’t you? The doomsday prophecy of a fast takeoff superintelligence achieving a decisive strategic advantage and becoming a universe-dominating singleton depends on this internal cycle: instrumental intelligence investing in itself and expanding exponentially, assuming low recalcitrance.

This analysis brings us to a significant focal point. The critical missing formula in Bostrom’s argument is (specifically) the recalcitrance function of instrumental intelligence. This is not the same as recalcitrance with respect to “general” intelligence or even “super” intelligence. Rather, what’s critical is how much a process dedicated to “prediction, planning, and means-ends reasoning in general” can improve its own capacities at those things autonomously. The values of this recalcitrance function will bound the speed of superintelligence takeoff. These bounds can then inform the optimal allocation of research funding towards anticipation of future scenarios.


In what I hope won’t distract from the logical analysis of Bostrom’s argument, I’d like to put it in a broader context.

Take a minute to think about the power of general purpose computing and the impact it has had on the past hundred years of human history. As the earliest digital computers were informed by notions of artificial intelligence (c.f. Alan Turing), we can accurately say that the very machine I use to write this text, and the machine you use to read it, are the result of refined, formalized, and materialized instrumental reason. Every programming language is a level of abstraction over a machine that has no ends in itself, but which serves the ends of its programmer (when it’s working). There is a sense in which Bostrom’s argument is not about a near future scenario but rather is just a description of how things already are.

Our very concepts of “technology” and “instrument” are so related that it can be hard to see any distinction at all. (c.f. Heidegger, “The Question Concerning Technology“) Bostrom’s equating of instrumentality with intelligence is a move that makes more sense as computing becomes ubiquitously part of our experience of technology. However, if any instrumental mechanism can be seen as a form of intelligence, that lends credence to panpsychist views of cognition as life. (c.f. the Santiago theory)

Meanwhile, arguably the genius of the market is that it connects ends (through consumption or “demand”) with means (through manufacture and services, or “supply”) efficiently, bringing about the fruition of human desire. If you replace “instrumental intelligence” with “capital” or “money”, you get a familiar critique of capitalism as a system driven by capital accumulation at the expense of humanity. The analogy with capital accumulation is worthwhile here. Much as in Bostrom’s “takeoff” scenarios, we can see how capital (in the modern era, wealth) is reinvested in itself and grows at an exponential rate. Variable rates of return on investment lead to great disparities in wealth. We today have a “multipolar scenario” as far as the distribution of capital is concerned. At times people have advocated for an economic “singleton” through a planned economy.

It is striking that contemporary analytic philosopher and futurist Nick Bostrom’s contemplates the same malevolent force in his apocalyptic scenario as does Max Horkheimer in his 1947 treatise “Eclipse of Reason“: instrumentality run amok. Whereas Bostrom concerns himself primarily with what is literally a machine dominating the world, Horkheimer sees the mechanism of self-reinforcing instrumentality as pervasive throughout the economic and social system. For example, he sees engineers as loci of active instrumentalism. Bostrom never cites Horkheimer, let alone Heidegger. That there is a convergence of different philosophical sub-disciplines on the same problem suggests that there are convergent ultimate reasons which may triumph over convergent instrumental reasons in the end. The question of what these convergent ultimate reasons are, and what their relationship to instrumental reasons is, is a mystery.

Further distillation of Bostrom’s Superintelligence argument

Following up on this outline of the definitions and core argument of Bostrom’s Superintelligence, I will try to narrow in on the key mechanisms the argument depends on.

At the heart of the argument are a number of claims about instrumentally convergent values and self-improvement. It’s important to distill these claims to their logical core because their validity affects the probability of outcomes for humanity and the way we should invest resources in anticipation of superintelligence.

There are a number of ways to tighten Bostrom’s argument:

Focus the definition of superintelligence. Bostrom leads with the provocative but fuzzy definition of superintelligence as “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.” But the overall logic of his argument makes it clear that the domain of interest does not necessarily include violin-playing or any number of other activities. Rather, the domains necessary for a Bostrom superintelligence explosion are those that pertain directly to improving ones own intellectual capacity. Bostrom speculates about these capacities in two ways. In one section he discusses the “cognitive superpowers”, domains that would quicken a superintelligence takeoff. In another section he discusses convergent instrumental values, values that agents with a broad variety of goals would converge on instrumentally.

  • Cognitive Superpowers
    • Intelligence amplification
    • Strategizing
    • Social manipulation
    • Hacking
    • Technology research
    • Economic productivity
  • Convergent Instrumental Values
    • Self-preservation
    • Goal-content integrity
    • Cognitive enhancement
    • Technological perfection
    • Resource acquisition

By focusing on these traits, we can start to see that Bostrom is not really worried about what has been termed an “Artificial General Intelligence” (AGI). He is concerned with a very specific kind of intelligence with certain capacities to exert its will on the world and, most importantly, to increase its power over nature and other intelligent systems rapidly enough to attain a decisive strategic advantage. Which leads us to a second way we can refine Bostrom’s argument.

Closely analyze recalcitrance. Recall that Bostrom speculates that the condition for a fast takeoff superintelligence, assuming that the system engages in “intelligence amplification”, is constant or lower recalcitrance. A weakness in his argument is his lack of in-depth analysis of this recalcitrance function. I will argue that for many of the convergent instrumental values and cognitive superpowers at the core of Bostrom’s argument, it is possible to be much more precise about system recalcitrance. This analysis should allow us to determine to a greater extent the likelihood of singleton vs. multipolar superintelligence outcomes.

For example, it’s worth noting that a number of the “superpowers” are explicitly in the domain of the social sciences. “Social manipulation” and “economic productivity” are both vastly complex domains of research in their own right. Each may well have bounds about how effective an intelligent system can be at them, no matter how much “optimization power” is applied to the task. The capacities of those manipulated to understand instructions is one such bound. The fragility or elasticity of markets could be another such bound.

For intelligence amplification, strategizing, technological research/perfection, and cognitive enhancement in particular, there is a wealth of literature in artificial intelligence and cognitive science that addresses the technical limits of these domains. Such technical limitations are a natural source of recalcitrance and an impediment to fast takeoff.

Bostrom’s Superintelligence: Definitions and core argument

I wanted to take the opportunity to spell out what I see as the core definitions and argument of Bostrom’s Superintelligence as a point of departure for future work. First, some definitions:

  • Superintelligence. “We can tentatively define a superintelligence as any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.” (p.22)
  • Speed superintelligence. “A system that can do all that a human intellect can do, but much faster.” (p.53)
  • Collective superintelligence. “A system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system.” (p.54)
  • Quality superintelligence. “A system that is at least as fast as a human mind and vastly qualitatively smarter.” (p.56)
  • Takeoff. The event of the emergence of a superintelligence. The takeoff might be slow, moderate, or fast, depending on the conditions under which it occurs.
  • Optimization power and Recalcitrance. Bostrom’s proposed that we model the speed of superintelligence takeoff as: Rate of change in intelligence = Optimization power / Recalcitrance. Optimization power refers to the effort of improving the intelligence of the system. Recalcitrance refers to the resistance of the system to being optimized.(p.65, pp.75-77)
  • Decisive strategic advantage. The level of technological and other advantages sufficient to enable complete world domination. (p.78)
  • Singleton. A world order in which there is at the global level one decision-making agency. (p.78)
  • The wise-singleton sustainability threshold. “A capability set exceeds the wise-singleton threshold if and only if a patient and existential risk-savvy system with that capability set would, if it faced no intelligent opposition or competition, be able to colonize and re-engineer a large part of the accessible universe.” (p.100)
  • The orthogonality thesis. “Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.” (p.107)
  • The instrumental convergence thesis. “Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.” (p.109)

Bostrom’s core argument in the first eight chapters of the book, as I read it, is this:

  1. Intelligent systems are already being built and expanded on.
  2. If some constant proportion of a system’s intelligence is turned into optimization power, then if the recalcitrance of the system is constant or lower, then the intelligence of the system will increase at an exponential rate. This will be a fast takeoff.
  3. Recalcitrance is likely to be lower for machine intelligence than human intelligence because of the physical properties of artificial computing systems.
  4. An intelligent system is likely to invest in its own intelligence because of the instrumental convergence thesis. Improving intelligence is an instrumental goal given a broad spectrum of other goals.
  5. In the event of a fast takeoff, it is likely that the superintelligence will get a decisive strategic advantage, because of a first-mover advantage.
  6. Because of the instrumental convergence thesis, we should expect a superintelligence with a decisive strategic advantage to become a singleton.
  7. Machine superintelligences, which are more likely to takeoff fast and become singletons, are not likely to create nice outcomes for humanity by default.
  8. A superintelligent singleton is likely to be above the wise-singleton threshold. Hence the fate of the universe and the potential of humanity is at stake.

Having made this argument, Bostrom goes on to discuss ways we might anticipate and control the superintelligence as it becomes a singleton, thereby securing humanity.