programming and philosophy of science

Philosophy of science is a branch of philosophy largely devoted to the demarcation problem: what is science?

I’ve written elsewhere about why and how in the social sciences, demarcation is highly politicized and often under attack. This is becoming pertinent now especially as computational methods become dominant across many fields and challenge the bases of disciplinary distinction. Today, a lot of energy (at UC Berkeley at least) goes into maintaining the disciplinary social sciences even when this creates social fields that are less scientific than they could be in order to maintain atavistic disciplinary traits.

Other energy (also at UC Berkeley, and elsewhere) goes into using computer programs to explore data about the social world in an undisciplinary way. This isn’t to say that specific theoretical lenses don’t inform these studies. Rather, the lenses are used provisionally and not in an exclusive way. This lack of disciplinary attachment is an important aspect of data science as applied to the social world.

One reason why disciplinary lenses are not very useful for the practicing data scientist is that, much like natural scientists, data scientists are more often than not engaged in technical inquiry whose purpose is prediction and control. This is very different from, for example, engaging an academic community in a conversation in a language they understand or that pays appropriate homage to a particular scholarly canon–the sort of thing one needs to do to be successful in an academic context. For much academic work, especially in the social sciences, the process of research publication, citation, and promotion is inherently political.

These politics are more often than not not an essential function to scientific inquiry itself; rather they have to do with the allocation of what Bourdieu calls temporal capital: grant funding, access, appointments, etc. within the academic field. Scientific capital, that symbolic capital awarded to scientists based on their contributions to trans-historical knowledge, is awarded more based on the success of an idea than by, for example, brown-nosing ones superiors. However, since temporal capital in the academy is organized by disciplines as a function of university bureaucratic organization, academic researchers are required to contort themselves to disciplinary requirements in the presentation of their work.

Contrast this with the work of analysing social data using computers. The tools used by computational social scientists tend to be products of the exact sciences (mathematics, statistics, computer science) with no further disciplinary baggage. The intellectual work of scientifically devising and testing theories against the data happens in a language most academic communities would not recognize as a language at all, and certainly not their language. While this work depends on the work of thousands of others who have built vast libraries of functional code, these ubiquitous contributors are not included in an social science discipline’s scholarly canon. They are uncited, taken for granted.

However, when those libraries are made openly available (and they often are), they participate in a larger open source ecosystem of tools whose merits are judged by their practical value. Returning to our theme of the demarcation problem, the question is: is this science?

I would answer: emphatically yes. Programming is science because, as Peter Naur has argued, programming is theory building (hat tip the inimitable Spiros Eliopoulos for the reference). The more deeply we look into the demarcation problem, the more clearly software engineering practice comes into focus as an extension of a scientific method of hypothesis generation and testing. Software is an articulation of ideas, and the combined works of software engineers are a cumulative science that has extended far beyond the bounds of the university.