A cornerstone of all computer science research is the analysis of the difficulty of solving problems. As is well known, some problems, like sorting a list of numbers, are relatively easy. Other problems, like the knapsack problem, are hard. Here, “easy” and “hard” are defined by computational complexity classes: the amount of processing time it takes to solve the problem as a function of the size of the input.
Statistics has its own internal understanding of the difficulty of solving problems. When doing statistical inference properly, you cannot do better than your data and the validity of your assumptions (c.f. no free lunch theorem). You cannot solve a high dimensional problem with low dimensional data (c.f. the curse of dimensionality).
“AI”, or machine learning, or data science, in its current form is the combination of statistics and computer science. Serious researchers in either domain know that the problems they are solving are often hard. (Deep learning perhaps has allowed the AI research community to suspend their disbelief for a time.)
Consider two problems:
- A: The problem of predicting Y from input data X, such that the decision whose value depends on the accuracy of the estimate of Y can be made well.
- A’: The problem of predicting the consequences of deploying the system that solves A in a complex sociotechnical world.
Which problem is harder?
However hard problem A is, A’ will be harder. To solve A, you need training data for X and Y, and sound inference and optimization algorithms. To solve B, you need not only training data for X and Y (in order to understand the behavior of A), but also training data from which to learn the structure of the sociotechnical world in which the system is deployed. This will be much higher dimensional data than those used to solve A’. (Simulating the total system and getting a distribution over its outcomes may also prove to be complex in terms of runtime–more complex than the original optimization problem involved in solving A).
Considering this argument, its clear why the difficulty with computer scientist’s solving AI ethics problems is not their use of abstraction as a disciplinary problem (see Selbst et al. 2019). Rather, it’s because the AI ethics problem (A’) is, for abstractly understandable reasons, much harder than the AI problem (A).
There is a great deal of humanistic discussion of AI ethics coming from law, anthropology, and so on. Qualitative research and humanistic understanding are wonderful in part because they allow for a high-dimensional understanding of their phenomena. But they are not free from the laws of logic; rather, their powers and limitations can be better understood by showing how they fit within the formally understood mathematics of learning (Benthall, 2016). When “interpretevist” researchers write about AI ethics, they are often doing important work of raising awareness about the consequences of technical systems. This is, it must be said, somewhat easier to do after the fact. They are not solving the AI ethics problem as it confronts the technology designer originally. For these, the principles of computer science apply.
One last point: any model of a sociotechnical system, internalized within an AI component of that system, will be yet-another-AI with potentially undesirable social consequences. We have discussed problem A, and also problem A’. But we can equally consider problem, A”, the problem of predicting the consequences of deployed system A’. And A”’, A””, A^(n), on into an infinite regress. It’s an interesting question whether the complexity of the problem leaps or plateaus after multiple applications of this operation.
Benthall, S. (2016) The Human is the Data Science. Workshop on Developing a Research Agenda for Human-Centered Data Science. CSCW 2016. (link)
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019, January). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59-68). ACM.