By chance, last night I was at a social gathering with two STS scholars that are unaffiliated with BIDS. One of them is currently training in ethnographic methods. I explained to him some of my quandaries as a data scientist working with ethnographers studying data science. How can I be better in my role?
He talked about participant observation, and how hard it is in a scientific setting. An experienced STS ethnographer who he respected has said: participant observation means being ready for an almost constant state of humiliation. Your competence is always being questioned; you are always acting inappropriately; your questions are considered annoying or off-base. You try to learn the tacit knowledge required in the science but will always be less good at it than the scientists themselves. This is all necessary for the ethnographic work.
To be a good informant (and perhaps this is my role, being a principal informant) means patiently explaining lots of things that normally go unexplained. One must make explicit that which is obvious and tacit to the experienced practitioner.
This sort of explanation accords well with my own training in qualitative methods and reading in this area, which I have pursued alongside my data science practice. This has been a deliberate blending in my graduate studies. In one semester I took both Statistical Learning Theory with Martin Wainwright and Qualitative Research Methods with Jenna Burrell. I’ve taken Behavioral Data Mining with John Canny and a seminar taught by Jean Lave on “What Theory Matters”.
I have been trying to cover my bases, methodologically. Part of this is informed by training in Bayesian methods as an undergraduate. If you are taught to see yourself as an information processing machine and take the principles of statistical learning seriously, then if you’re like me you may be concerned about bias in the way you take in information. If you get a sense that there’s some important body of knowledge or information to which you haven’t been adequately exposed, you seek it out in order to correct the bias.
This is not unlike what is called theoretical sampling in the qualitative methods literature. My sense, after being trained in both kinds of study, is that the principles that motivate them are the same or similar enough to make reconciliation between the approaches achievable.
I choose to identify as a data scientist, not as an ethnographer. One reason for this is that I believe I understand what ethnography is, that it is a long and arduous process of cultural immersion in which one attempts to articulate the emic experience of those under study, and that I am not primarily doing this kind of activity with my research. I have tried to ethnographic work on an online community. I would argue that this was particularly bad ethnographic work. I concluded some time ago that I don’t have the right temperament to be an ethnographer per se.
Nevertheless, here I am participating in an Ethnography Group. It turns out that it is rather difficult to participate in an ethnographic context with ethnographers of science while still maintaining ones identity as the kind of scientist that is being studied. Part of this has to do with conflicts over epistemic norms. Attempting to argue on the basis of scientific authority about the validity of the method of that science to a room of STS ethnographers is not taken as useful information from an informant nor as a creatively galvanizing rocking of the boat. It is seen as unproductive and potentially disrespectful.
Rather than treating this as an impasse, I have been pondering how to use these kinds of divisions productively. As a first pass, I’m finding it helpful in coming to an understanding of what data science is by seeing, perhaps with a clarity that others might not have the privilege of, what it is not. In a sense the Ethnography and Evaluation Working Group of the Berkeley Institute of Data Science is really at the boundary of data science.
This is exciting, because as far as I can tell nobody knows what data science is. Alternative definitions of data science is a joke in industry. The other day our ethnography team was discussing a seminar about “what is data science” with a very open minded scientist and engineer and he said he got a lot out of the seminar but that it reached no conclusions as to what this nascent field is. “What is data science?” and even “is there such a thing as data science?” are still unanswered questions and may continue to be unanswered even after industry has stopped hyping the term and started calling it something else.
So, you might ask, what happens at the boundary of data science and ethnography
The answer is: an epistemic conflict that’s deeply grounded in historical, cultural, institutional, and cognitive differences. It’s also a conflict that threatens the very project of an ethnography of data science itself.
The problem, I feel qualified to say as somebody with training on both sides of the fence and quite a bit of experience teaching both technical and non-technical subject matter, is this: learning the skills and principles behind good data science does not come easily to everybody and in any case takes a lot of hard work and time. These skills and principles pertain to many deep practices and literatures that are developed self-consciously in a cumulative way. Any one sub-field within the many technical disciplines that comprise “data science” could take years to master, and to do so is probably impossible without adequate prior mathematical training that many people don’t receive, perhaps because they lack the opportunity or don’t care.
In fewer words: there is a steep learning curve, and the earlier people start to climb it, the easier it is for them to practice data science.
My point is that this is bad news for the participant observer. Something I sometimes hear ethnographers in the data science space say of people is “I just can’t talk to that person; they think so differently from me.” Often the person in question is, to my mind, exactly the sort of person I would peg as an exemplary data scientist.
Often these are people with a depth of technical understanding that I don’t have and aspire to have. I recognize that they have made the difficult choice to study more of the foundations of what I believe to be an important field, despite the fact that this is (as evinced by the reaction of ‘softer’ social sciences) alienating to a lot of people. These are the people whom I can consult on methodological questions that are integral to my work as a data scientist. It is part of data science practice to discuss epistemic norms seriously with others in order to make sure that the integrity of the science is upheld. Knowledge about statistical norms and principles is taught in classes and reading groups and practiced in, for example, computational manipulation of data. But this knowledge is also expanded and maintained through informal, often passionate and even aggressive, conversations with colleagues.
I don’t know where this leaves the project of ethnography of data science. One possibility is that it can abandon participant observation as a method because participant observation is too difficult. That would be a shame but might simply be necessary.
Another strategy, which I think is potentially more interesting, is to ask seriously: why is this so difficult? What is difficult about data science? For whom is it most difficult? Do experts experience the same difficulties, or different ones? And so on.