The Crevasse: a meditation on accountability of firms in the face of opacity as the complexity of scale
by Sebastian Benthall
(A1) Beneath corporate secrecy and user technical illiteracy, a fundamental source of opacity in “algorithms” and “machine learning” is the complexity of scale, especially scale of data inputs. (Burrell, 2016)
(A2) The opacity of the operation of companies using consumer data makes those consumers unable to engage with them as informed market actors. The consequence has been a “free fall” of market failure (Strandburg, 2013).
(A3) Ironically, this “free” fall has been “free” (zero price) for consumers; they appear to get something for nothing without knowing what has been given up or changed as a consequence (Hoofnagle and Whittington, 2013).
(B1) The above line of argument conflates “algorithms”, “machine learning”, “data”, and “tech companies”, as is common in the broad discourse. That this conflation is possible speaks to the ignorance of the scholarly position on these topics, and ignorance that is implied by corporate secrecy, technical illiteracy, and complexity of scale simultaneously. We can, if we choose, distinguish between these factors analytically. But because, from the standpoint of the discourse, the internals are unknown, the general indication of a ‘black box’ organization is intuitively compelling.
(B1a) Giving in to the lazy conflation is an error because it prevents informed and effective praxis. If we do not distinguish between a corporate entity and its multiple internal human departments and technical subsystems, then we may confuse ourselves into thinking that a fair and interpretable algorithm can give us a fair and interpretable tech company. Nothing about the former guarantees the latter because tech companies operate in a larger operational field.
(B2) The opacity as the complexity of scale, a property of the functioning of machine learning algorithms, is also a property of the functioning of sociotechnical organizations more broadly. Universities, for example, are often opaque to themselves, because of their own internal complexity and scale. This is because the mathematics governing opacity as a function of complexity and scale are the same in both technical and sociotechnical systems (Benthall, 2016).
(B3) If we discuss the complexity of firms, as opposed the the complexity of algorithms, we should conclude that firms that are complex due to scale of operations and data inputs (including number of customers) will be opaque and therefore have strategic advantage in the market against less complex market actors (consumers) with stiffer bounds on rationality.
(B4) In other words, big, complex, data rich firms will be smarter than individual consumers and outmaneuver them in the market. That’s not just “tech companies”. It’s part of the MO of every firm to do this. Corporate entities are “artificial general intelligences” and they compete in a complex ecosystem in which consumers are a small and vulnerable part.
(C1) Another source of opacity in data is that the meaning of data come from the causal context that generates it. (Benthall, 2018)
(C2) Learning causal structure from observational data is hard, both in terms of being data-intensive and being computationally complex (NP). (c.f. Friedman et al., 1998)
(C3) Internal complexity, for a firm, is not sufficient to be “all-knowing” about the data that is coming it; the firm has epistemic challenges of secrecy, illiteracy, and scale with respect to external complexity.
(C4) This is why many applications of machine learning are overrated and so many “AI” products kind of suck.
(C5) There is, in fact, an epistemic crevasse between all autonomous entities, each containing its own complexity and constituting a larger ecological field that is the external/being/environment for any other autonomy.
The most promising direction based on this analysis is a deeper read into transaction cost economics as a ‘theory of the firm’. This is where the formalization of the idea that what the Internet changed most are search costs (a kind of transaction cost) should be.
It would be nice if those insights could be expressed in the mathematics of “AI”.
There’s still a deep idea in here that I haven’t yet found the articulation for, something to do with autopoeisis.
Benthall, Sebastian. (2016) The Human is the Data Science. Workshop on Developing a Research Agenda for Human-Centered Data Science. Computer Supported Cooperative Work 2016. (link)
Sebastian Benthall. Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics. Ph.D. dissertation. Advisors: John Chuang and Deirdre Mulligan. University of California, Berkeley. 2018.
Burrell, Jenna. “How the machine ‘thinks’: Understanding opacity in machine learning algorithms.” Big Data & Society 3.1 (2016): 2053951715622512.
Friedman, Nir, Kevin Murphy, and Stuart Russell. “Learning the structure of dynamic probabilistic networks.” Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 1998.
Hoofnagle, Chris Jay, and Jan Whittington. “Free: accounting for the costs of the internet’s most popular price.” UCLA L. Rev. 61 (2013): 606.
Strandburg, Katherine J. “Free fall: The online market’s consumer preference disconnect.” U. Chi. Legal F. (2013): 95.