data science is not positivist, it’s power

by Sebastian Benthall

Naively, we might assume that contemporary ‘data science’ is a form of positivist or post-positivist science. The scientist gathers data and subsumes it under logical formulae–models with fitted parameters. Indeed this is the case when data science is applied to natural phenomena, such as stars or the human genome.

The question of what kind of science ‘data science’ is becomes much more complex when we start to look at its application to social phenomena. This includes its application to the management of industrial and commercial technology–the so called “Internet of Things“. (Technology in general, and especially technology as situated socially, being a social phenomenon.)

There are (at least) two reasons why data science in these social domains is not strictly positivist.

The first is that, according to McKinsey’s Michael Chui, data science in the Internet of Things context is main about either real-time control or anomaly detection. Neither of these depends on the kind of nomothetic orientation that positivism requires. The former requires only an objective function over inputs to guide the steering of the dynamic system. The latter requires only the detection of deviation from historically observed patterns.

‘Data science’ applied in this context isn’t actually about the discovery of knowledge at all. It is not, strictly speaking, a science. Rather, it is a process through which the operations of existing technologies are related and improved by further technological interventions. Robust positivist engineering knowledge is applied to these cases. But however much the machines may ‘learn’, what they learn is not propositional.

Perhaps the best we can say is that ‘data science’ in this context is the science of techniques for making these kinds of interventions. As learning these techniques depends on mathematical rigor and empirical prototyping, we can say perhaps of the limited sense of ‘pure’ (not applied) data science that it is a positivist science.

But the second reason why data science is not positivist comes about as a result of its application. The problem is that when systems controlled by complex computational processes interact, the result is a more complex system. In adversarial cases, the interacting complex systems become the subject matter of cybersecurity research, towards which data science is one application. But as soon as on starts to study phenomena that are aware of the observer and can act in ways that respond to its presence, you get out of positivist territory.

A better way to think about data science might be to think of it in terms of perception. In, the visual system, data that comes in through the eye goes through many steps of preprocessing before it becomes the subject of attention. Visual representations feed into the control mechanisms of movement.

If we see data science not as a positivist attempt to discover natural laws, but rather as an extension of agency by expanding powers of perception and training skillful control, then we can get a picture of data science that’s consistent with theories of situated and embodied cognition.

These theories of situated and embodied cognition are perhaps the best contenders for what can displace the dominant paradigm as imagined by critics of cognitive science, economics, etc. Rather than being a rejection of explanatory power of naturalistic theories of information processing, these theories extend naive theories to embrace the complexity of how agents cognition is situated in a body in time, space, and society.

If we start to think of ‘data science’ not as a kind of natural science but as the techniques and tools for extending the information processing that is involved in ones individual or collective agency, then we can start to think about data science as what it really is: power.