real talk
by Sebastian Benthall
So I am trying to write a dissertation prospectus. It is going…OK.
The dissertation is on Evaluating Data Science Environments.
But I’ve been getting very distracted by the politics of data science. I have been dealing with the politics by joking about them. But I think I’m in danger of being part of the problem, when I would rather be part of the solution.
So, where do I stand on this, really?
Here are some theses:
- There is a sense of “data science” that is importantly different from “data analytics”, though there is plenty of abuse of the term in an industrial context. That claim is awkward because industry can easily say they “own” the term. It would be useful to lay out specifically which computational methods constitute “data science” methods and which don’t.
- I think that it’s useful analytically to distinguish different kinds of truth claims because it sheds light on the value of different kinds of inquiry. There is definitely a place for rigorous interpretive inquiry and critical theory in addition to technical, predictive science. I think politicing around these divisions is lame and only talk about it to make fun of the situation.
- New computational science techniques have done and will continue to do amazing work in the physical and biological and increasingly environmental sciences. I am jealous of researchers in those fields because I think that work is awesome. For some reason I am a social scientist.
- The questions surrounding the application of data science to social systems (which can include environmental systems) are very, very interesting. Qualitative researchers get defensive about their role in “the age of data science” but I think this is unwarranted. I think it’s the quantitative social science researchers who are likely more threatened methodologically. But since I’m not well-trained as a quantitative social scientist really, I can’t be sure of that.
- The more I learn about research methods (which seems to be all I study these days, instead of actually doing research–I’m procrastinating), the more I’m getting a nuanced sense of how different methods are designed to address different problems. Jockeying about which method is better is useless. If there is a political battle I think is worth fighting any more, it’s the battle about whether or not transdisciplinary research is productive or possible. I hypothesize that it is. But I think this is an empirical question whose answer may be specific: how can different methods be combined effectively? I think this question gets quite deep and answering it requires getting into epistemology and statistics in a serious way.
- What is disruptive about data science is that some people have dug down into statistics in a serious way, come up with a valid general way of analyzing things, and then automated it. That makes it in theory cheaper to pick up and apply than the quantitative techniques used by other researchers, and usable at larger scale. On the whole this is pretty good, though it is bad when people don’t understand how the tools they are using work. Automating science is a pretty good thing over all.
- It’s really important for science, as it is automated, to be built on open tools and reproducible data because (a) otherwise there is no reason why it should have the public trust, (b) because it will remove barriers to training new scientists.
- All scientists are going to need to know how to program. I’m very fortunate to have a technical background. A technical background is not sufficient to do science well. One can use technical skills to assist in both qualitative (visualization) and quantitative work. The ability to use tools is orthogonal to the ability to study phenomena, despite the historic connection between mathematics and computer science.
- People conflate programming, which is increasingly a social and trade skill, with the ability to grasp high level mathematical concepts.
- Computers are awesome. The people that make them better deserve the credit they get.
- Sometimes I think: should I be in a computer science department? I think I would feel better about my work if I were in CS. I like the feeling of tangible progress and problem solving. I think there are a lot of really important problems to solve, and that the solutions will likely come from computer science related work. What I think I get from being in a more interdisciplinary department is a better understanding of what problems are worth solving. I don’t mean that in a way that diminishes the hard work of problem solving, which I think is really where the rubber hits the road. It is easy to complain. I don’t work as hard as computer science students. I also really like being around women. I think they are great and there aren’t enough of them in computer science departments.
- I’m interested in modeling and improving the cooperation around open scientific software because that’s where I see there some real potential value add. I’ve been and engineer and I’ve managed engineers. Managing engineers is a lot harder than engineering, IMO. That’s because management requires navigating a social system. Social systems are really absurdly complicated compared to even individual organisms.
- There are three reasons why it might be bad to apply data science to social systems. The first is that it could lead to extraordinarily terrible death robots. My karma is on the line. The second is that the scientific models might be too simplistic and lead to bad decisions that are insensitive to human needs. That is why it is very, very important that the existing wealth of social scientific understanding is not lost but rather translated into a more robust and reproducible form. The third reason is that social science might be in principle impossible due to its self-referential effects. This would make the whole enterprise a collosal waste of time. The first and third reasons frequently depress me. The second motivates me.
- Infrastructure and mechanism design are powerful means of social change, perhaps the most powerful. Movements are important but civil society is so paralyzed by the steering media now that it is more valuable to analyze movements as sociotechnical organizations alongside corporations etc. than to view them in isolation from the technical substrate. There are a variety of ideological framings of this position, each with different ideological baggage. I’m less concerned with that, ultimately, than the pragmatic application of knowledge. I wish people would stop having issues with “implications for design.”
- I said I wanted to get away from politics, but this is one other political point I actually really think is worth making, though it is generally very unpopular in academia for obvious reasons: the status differential between faculty and staff is an enormous part of the problem of the disfunction of universities. A lot of disciplinery politics are codifications of distaste for certain kinds of labor. In many disciplines, graduate students perform labor unexpertly in service of their lab’s principal investigators; this labor is a way of paying ones dues that has little to do with the intellectual work of their research expertise. Or is it? It’s entirely unclear, especially when what makes the difference between a good researcher and a great one are skills that have nothing to do with their intellectual pursuit, and when master new tools is so essential for success in ones field. But the PIs are often not able to teach these tools. What is the work of research? Who does it? Why do we consider science to be the reserve of a specialized medieval institution, and call it something else when it is done by private industry? Do academics really have a right to complain about the rise of the university administrative class?
Sorry, that got polemical again.
Sounds like you want to make data science environments instead of evaluating them…
I think you are right. Thanks for sounding that out.