I’ve recently come across an interesting paper published at Scipy 2019, Dusen et al.’s “Accelerating the Advancement of Data Science Education” (2019) (link). It summarizes recent trends in data science education, as modeled by UC Berkeley’s Division of Data Science, which is now the Division of Computing, Data Science, and Society (CDSS). This is a striking piece to me as I worked at Berkeley on its data science capabilities several years ago and continue to be fascinated by my alma mater, the School of Information, as it navigates being part of CDSS.
Among other interesting points in the article, two are particularly noteworthy to me. The first is that the integration of data science into the social sciences appears to have continued apace. The article mentions that data science’s integration into the social science has continued apace. Economics, in particular, is well represented and supported in the extended data science curriculum.
The other interesting point is the emphasis on data science ethics as an essential pillar of the educational program. The writing in this piece is consistent with what I’ve come to expect from Berkeley on this topic, and I believe it’s indicative of broad trends in academia.
The authors of this piece are explicit about their “theory of change”. What is data science ethics education supposed to accomplish?
Including training in ethical considerations at all levels of society and all steps of the data science workflow in undergraduate data science curricula could play an important role in stimulating change in industry as our students enter the workforce, perhaps encouraging companies to add ethical standards to their mission statements or to hire chief ethics officers to oversee not only day-to-day operations but also the larger social consequences of their work.
The theory of change articulated by the paper is that industry will change if ethically educated students enter the workforce. They see a future where companies change their mission statements in accord with what has been taught in data science ethics courses, or hire oversight officials.
This is, it must be noted, broadly speculative, and implies that the leadership of the firms who hire these Berkeley grads will be responsive to their employees. However, unlike in some countries in Europe, the United States does not give employees a lot of say in the governance of firms. Technology firms, such as Amazon and Google, have recently proven to be rather unfriendly to employees that attempt to organize in support of “ethics”. This is for highly conventional reasons: the management of these firms tends to be oriented towards the goal of maximizing shareholder profits, and having organized employees advocating for ethical issues that interfere with business is an obstacle to that goal.
This would be understood plainly if economics, or economic history, was taught as part of “data science ethics”. But it’s not for some reason. Information economics, which would presumably be where one would start to investigate the way incentives drive data science institutions, is perhaps too complex to be included in the essential undergraduate curriculum, despite its being perhaps critical to understanding the “data intensive” social world we all live in now.
We forget today, often, that the original economists (Adam Smith, Alfred Marshall, etc.) were all originally moral philosophers. Economics has begun to be seen as a field designed to be in instrumental support of business practice or ideology rather than an investigation into the ethical consequences of social and material structure. That’s too bad.
Instead of teaching economic history, which would be a great way of showing students the ethical implications of technology, instead Berkeley is teaching Science and Technology Studies (STS) and algorithmic fairness! I’ll quote at length:
A recent trend in incorporating such ethical practices includes
incorporating anti-bias algorithms in the workplace. Starting from
the beginning of their undergraduate education, UC Berkeley students can take History 184D: Introduction to Science, Technology, and Society: Human Contexts and Ethics of Data, which covers the implications of computing, such as algorithmic bias. Additionally, students can take Computer Science 294: Fairness in Machine Learning, which spends a semester in resisting racial, political, and physical discrimination. Faculty have also come together to create the Algorithmic Fairness and Opacity Working Group at Berkeley’s School of Information that brainstorms methods to improve algorithms’ fairness, interpretability, and accountability. Implementing such courses and interdisciplinary groups is key to start the conversation within academic institutions, so students
can mitigate such algorithmic bias when they work in industry or
Databases and algorithms are socio-technical objects; they emerge and evolve in tandem with the societies in which they operate [Latour90]. Understanding data science in this way and recognizing its social implications requires a different kind of critical thinking that is taught in data science courses. Issues such as computational agency [Tufekci15], the politics of data classification and statistical inference [Bowker08], [Desrosieres11], and the perpetuation of social injustice through algorithmic decision making [Eubanks19], [Noble18], [ONeil18] are well known to scholars in the interdisciplinary field of science and technology
studies (STS), who should be invited to participate in the development of data science curricula. STS or other courses in the social sciences and humanities dealing specifically with topics related to data science may be included in data science programs.
This is all very typical. The authors are correct that algorithmic fairness and STS have been trendy ways of teaching data science ethics. It is perhaps too cynical to say that these are trendy approaches to “data science ethics” because they are the data science ethics that Microsoft will pay for. Let that slip as a joke.
However, it is unfortunate if students have no better intellectual equipment for dealing with “data science ethics” than this. Algorithmic fairness is a fascinating field of study with many interesting technical results. However, as has been broadly noted by STS scholars, among others, the successful use of “algorithmic fairness” technology depends on the social context in which it is deployed. Often, “fairness” is achieved through greater scientific and technical integrity: for example, properly deducing cause and effect rather than lazily applying techniques that find correlation. But the ethical challenges in the workplace are often not technical challenges. They are the challenges of managing the economic incentives of the firm, and how these effect the power structures within the firm. (Metcalf et al., 2019) This is apparently not material that is being taught at Berkeley to data science students.
This more careful look at the social context in which technology is being used is supposed to be what STS is teaching. But, all too often, this is not what it’s doing. I’ve written elsewhere why STS is not the solution to “tech ethics”. Part of (e.g. Latourian) STS training is a methodological, if not intellectual, relativistic skepticism about science and technology itself (Carroll, 2006). As a consequence, it requires, of itself, to be a humanistic or anthropological field, using “interpretivist” methods, with weak claims to generalizability. It is, first and foremost, an academic field, not an applied one. The purpose of STS is to generate fascinating critiques.
There are many other social sciences that have different aims, such as the aim of building consensus around what social and economic conditions are in order to motivate political change. These social sciences have ethical import. But they are built around a different theory of change. They are aimed at the student as a citizen in a democracy, not as an employee at a company. And while I don’t underestimate the challenges of advocating for designing education to empower students as public citizens in this economic climate, it must nevertheless be acknowledge, as an ethical matter, that a “data science ethics” curriculum that does not address the politics behind those difficulties will be an anemic one, at best.
There is a productive way forward. It requires, however, interdisciplinary thinking that may be uncomfortable or, in the end, impossible for many established institutions. If students are taught a properly historicized and politically substantive “data science ethics”, not in the mode of an STS-based skepticism about technology and science, but rather as economic history that is informed by data science (computational and inferential thinking) as an intellectual foundation, then ethical considerations would need not be relegated to a hopeful afterthought invested in a theory of corporate change that is ultimately a fantasy. Rather, it would put “data science ethics” on a scientific foundation and help civic education justify itself as a matter of social fact.
Addendum: Since the social sciences aren’t doing this work, it looks like some computer scientists are doing it instead. This report by Narayanan provides a recent economic history of “dark patterns” since the 1970’s–an example of how historical research can put “data science ethics” in context.
Carroll, P. (2006). Science of Science and Reflexivity. Social Forces, 85(1), 583-585.
Metcalf, J., & Moss, E. (2019). Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics. Social Research: An International Quarterly, 86(2), 449-476.
Van Dusen, E., Suen, A., Liang, A., & Bhatnagar, A. (2019). Accelerating the Advancement of Data Science Education. Proceedings of the 18th Python in Science Conference (SciPy 2019)