### The recalcitrance of prediction

We have identified how Bostrom’s core argument for superintelligence explosion depends on a crucial assumption. An intelligence explosion will happen only if the kinds of cognitive capacities involved in instrumental reason are not recalcitrant to recursive self-improvement. If recalcitrance rises comparably with the system’s ability to improve itself, then the takeoff will not be fast. This significantly decreases the probability of decisively strategic singleton outcomes.

In this section I will consider the recalcitrance of intelligent prediction, which is one of the capacities that is involved in instrumental reason (another being planning). Prediction is a very well-studied problem in artificial intelligence and statistics and so is easy to characterize and evaluate formally.

Recalcitrance is difficult to formalize. Recall that in Bostrom’s formulation:

$\frac{dI}{dt} = \frac{O(I)}{R(I)}$

One difficulty in analyzing this formula is that the units are not specified precisely. What is a “unit” of intelligence? What kind of “effort” is the unit of optimization power? And how could one measure recalcitrance?

A benefit of looking at a particular intelligent task is that it allows us to think more concretely about what these terms mean. If we can specify which tasks are important to consider, then we can take the level of performance on those well-specified class of problems as measures of intelligence.

Prediction is one such problem. In a nutshell, prediction comes down to estimating a probability distribution over hypotheses. Using the Bayesian formulation of statistical influence, we can represent the problem as:

$P(H|D) = \frac{P(D|H) P(H)}{P(D)}$

Here, $P(H|D)$ is the posterior probability of a hypothesis $H$ given observed data $D$. If one is following statistically optimal procedure, one can compute this value by taking the prior probability of the hypothesis $P(H)$, multiplying it by the likelihood of the data given the hypothesis $P(D|H)$, and then normalizing this result by dividing by the probability of the data over all models, $P(D) = \sum_{i}P(D|H_i)P(H_i)$.

Statisticians will justifiably argue whether this is the best formulation of prediction. And depending on the specifics of the task, the target value may well be some function of posterior (such as the hypothesis with maximum likelihood) and the overall distribution may be secondary. These are valid objections that I would like to put to one side in order to get across the intuition of an argument.

What I want to point out is that if we look at the factors that affect performance on prediction problems, there a very few that could be subject to algorithmic self-improvement. If we think that part of what it means for an intelligent system to get more intelligent is to improve its ability of prediction (which Bostrom appears to believe), but improving predictive ability is not something that a system can do via self-modification, then that implies that the recalcitrance of prediction, far from being constant or lower, actually approaches infinity with respect the an autonomous system’s capacity for algorithmic self-improvement.

So, given the formula above, in what ways can an intelligent system improve its capacity to predict? We can enumerate them:

• Computational accuracy. An intelligent system could be better or worse at computing the posterior probabilities. Since most of the algorithms that do this kind of computation do so with numerical approximation, there is the possibility of an intelligent system finding ways to improve the accuracy of this calculation.
• Computational speed. There are faster and slower ways to compute the inference formula. An intelligent system could come up with a way to make itself compute the answer faster.
• Better data. The success of inference is clearly dependent on what kind of data the system has access to. Note that “better data” is not necessarily the same as “more data”. If the data that the system learns from is from a biased sample of the phenomenon in question, then a successful Bayesian update could make its predictions worse, not better. Better data is data that is informative with respect to the true process that generated the data.
• Better prior. The success of inference depends crucially on the prior probability assigned to hypotheses or models. A prior is better when it assigns higher probability to the true process that generates observable data, or models that are ‘close’ to that true process. An important point is that priors can be bad in more than one way. The bias/variance tradeoff is well-studied way of discussing this. Choosing a prior in machine learning involves a tradeoff between:
1. Bias. The assignment of probability to models that skew away from the true distribution. An example of a biased prior would be one that gives positive probability to only linear models, when the true phenomenon is quadratic. Biased priors lead to underfitting in inference.
2. Variance.The assignment of probability to models that are more complex than are needed to reflect the true distribution. An example of a high-variance prior would be one that assigns high probability to cubic functions when the data was generated by a quadratic function. The problem with high variance priors is that they will overfit data by inferring from noise, which could be the result of measurement error or something else less significant than the true generative process.

In short, there best prior is the correct prior, and any deviation from that increases error.

Now that we have enumerate the ways in which an intelligent system may improve its power of prediction, which is one of the things that’s necessary for instrumental reason, we can ask: how recalcitrant are these factors to recursive self-improvement? How much can an intelligent system, by virtue of its own intelligence, improve on any of these factors?

Let’s start with computational accuracy and speed. An intelligent system could, for example, use some previously collected data and try variations of its statistical inference algorithm, benchmark their performance, and then choose to use the most accurate and fastest ones at a future time. Perhaps the faster and more accurate the system is at prediction generally, the faster and more accurately it would be able to engage in this process of self-improvement.

Critically, however, there is a maximum amount of performance that one can get from improvements to computational accuracy if you hold the other factors constant. You can’t be more accurate than perfectly accurate. Therefore, at some point recalcitrance of computational accuracy rises to infinity. Moreover, we would expect that effort made at improving computational accuracy would exhibit diminishing returns. In other words, recalcitrance of computational accuracy climbs (probably close to exponentially) with performance.

What is the recalcitrance of computational speed at inference? Here, performance is limited primarily by the hardware on which the intelligent system is implemented. In Bostrom’s account of superintelligence explosion, he is ambiguous about whether and when hardware development counts as part of a system’s intelligence. What we can say with confidence, however, is that for any particular piece of hardware there will be a maximum computational speed attainable with with, and that recursive self-improvement to computational speed can at best approach and attain this maximum. At that maximum, further improvement is impossible and recalcitrance is again infinite.