Digifesto

Category: Uncategorized

We need a theory of collective agency to guide data intermediary design

Last week Jake Goldenfein and I presented some work-in-progress to the Centre for Artificial Intelligence and Digital Ethics (CAIDE) at the University of Melbourne. The title of the event was “Data science and the need for collective law and ethics”; perhaps masked by that title is the shift we’re taking to dive into the problem of data intermediaries. I wanted to write a bit about how we’re thinking about these issues.

This work builds on our work “Data Science and the Decline of Liberal Law and Ethics“, which was accepted by a conference that was then canceled due to COVID-19. In retrospect, it’s perhaps for the best that the conference was canceled. The “decline of liberalism” theme fit the political moment when we wrote the piece, when Trump and Sanders were contenders for the presidency of the U.S, and authoritarian regimes appeared to be providing a new paradigm for governance. Now, Biden is the victor and it doesn’t look like liberalism is going anywhere. We must suppose that our project will take place in a (neo)liberal context.

Our argument in that work was that many of the ideas animating the (especially Anglophone) liberalism we see in the U.S., the U.K., and Australia legal systems have been inadequate to meaningfully regulate artificial intelligence. This is because liberalism imagines a society of rational individuals appropriating private property through exchanges on a public market and acting autonomously, whereas today we have a wide range of agents with varying levels of bounded rationality, many of which are “artificial” in Herbert Simon’s sense of being computer-enabled firms, tied together in networks of control, not least of these being privately owned markets (the platforms). Essentially, loopholes in liberalism have allowed a quite different form of sociotechnical ordering to emerge because that political theory did not take into account a number of rather recently discovered scientific truths about information, computing, and control. Our project is to tackle this disconnect between theory and actuality, and to try to discover what’s next in terms of a properly cybernetic political theory that advances the goal of human emancipation.

Picking up where our first paper left off, this has gotten us looking at data intermediaries. This is an area where there has been a lot of work! We were particularly inspired by Mozilla’s Data Futures review of different forms of data intermediary institutions, including data coops, data trusts, data marketplaces, and so on. There is a wide range of ongoing experiments with alternative forms of “data stewardship” or “data governance”.

Our approach has been to try to frame and narrow down the options based on normative principles, legal options, and technical expertise. Rather than asking empirically what forms of data governance have been attempted, we are wondering: what ought the goals of a data intermediary be, given the facts about cybernetic agency in the world we live? How could such an institution accomplish what has been lost by the inadequacies of liberalism?

Our thinking has led us to the position that what has prevented liberalism from regulating the digital economy is its emphasis on individual autonomy. We draw on the new consensus in privacy scholarship that individual “notice and choice” is an ineffective way to guarantee consumer protection in the digital economy. Not only are bounded rationality constraints on consumers preventing them from understanding what they are agreeing to, but also the ability of firms to control consumer’s choice architecture has dwarfed the meaningfulness of whatever rationality individuals do have. Meanwhile, it is now well understood (perhaps most recently by Pistor (2020)) that personal data is valuable only when it is cleaned and aggregated. This makes the locus of economic agency around personal data necessarily a collective one.

This line of inquiry leads us to a deep question to which we do not yet have a ready answer, which is “What is collective emancipation in the paradigm of control?” Meaning, given what we know about the “sciences of the artificial”, control theory, theory of computation and information, etc., with all of its challenges to the historical idea of the autonomous liberal agent, what does it mean for a collective of individuals to be free and autonomous?

We got a lot of good feedback on our talk, especially from discussant Seth Lazar, who pointed out that there are many communitarian strands of liberalism that we could look to for normative guides. He mentioned, for example, Elizabeth Anderson’s relational egalitarianism. We asked Seth whether he thought that the kind of institution that guaranteed the collective autonomy of its members would have to be a state, and he pointed out that that was a question of whether or not such a system would be entitled to use coercion.

There’s a lot to do on this project. While it is quite heady and philosophical, I do not think that it is necessarily only an abstract or speculative project. In a recent presentation by Vincent Southerland, he proposed that one solution to the problematic use of algorithms in criminal sentencing would be if “the community” of those advocating for equity in the criminal justice system operated their own automated decision systems. This raises an important question: how could and should a community govern its own a technical systems, in order to support what in Southerland’s case is an abolitionist agenda. I see this as a very aligned project.

There is also a technical component to the problem. Because of economies of scale and the legal climate, more and more computation is moving onto proprietary cloud systems. Most software now is provided “as a service”. It’s unclear what this means for organizations that would try to engage in self-governance, even when these organizations are autonomous state entities such as municipalities. In some conversations, we have considered what modifications of the technical ideas of the “user agent”, security firewalls and local networks, and hybrid cloud infrastructure would enable collective self-governance. This is the pragmatic “how?” that follows our normative “what?” and “why?” question but it is no less important to implementing a prototype solution.

References

Benthall, Sebastian and Goldenfein, Jake, Data Science and the Decline of Liberal Law and Ethics (June 22, 2020). Available at SSRN: https://ssrn.com/abstract=3632577 or http://dx.doi.org/10.2139/ssrn.3632577

Narayanan, A., Toubiana, V., Barocas, S., Nissenbaum, H., & Boneh, D. (2012). A critical look at decentralized personal data architectures. arXiv preprint arXiv:1202.4503.

Pistor, K. (2020). Rule by data: The end of markets?. Law & Contemp. Probs.83, 101.

Sources of the interdisciplinary hierarchy

Lyotard’s 1979 treatise The Postmodern Condition tells a prescient story about the transformation of the university. He discusses two “metanarratives” used for the organization of universities: the German Humboldt model of philosophy as the central discipline, with all other fields of knowledge radiating out from it; and the French model of the university as the basis of education of the modern democratic citizen. Lyotard argues (perhaps speciously) that because of what the late Wittgenstein had to say about the autonomy of language games (there are no facts; there are only social rules) and because of cybernetics (the amalgamation of exact and applied sciences that had been turned so effectively towards control of human and machine), the metanarratives had lost their legitimacy. There was only “legitimation by performativity”, knowledge proving itself by virtue of its (technical) power, and “legitimiation by paralogy”, knowledge legitimizing itself through semantic disruption, creating pools of confusion in which one could still exist though out-of-alignment with prevailing cybernetic logics.

This duality–between cybernetics and paralogy–excludes a middle term identified in Habermas’s 1968 Knowledge and the Structure of Human Interests. Habermas identifies three “human interests” that motivate knowledge: the technical interest (corresponding to cybernetic performativity), the emancipatory interest (perhaps corresponding to the paralogic turn away from cybernetic performativity), and, thirdly, the hermeneutic interest. The latter is the interest in collective understanding that allows for collective understanding. As Habermas’s work matures, this interest emerges as the deliberative, consensual basis of law.

These frameworks for understanding knowledge and the university share an underlying pragmatism. Both Lyotard and Habermas seem to agree about the death of the Humboldt model: knowledge for its own sake is a deceased metanarrative. Knowledge for democratic citizens, the purportedly French model in Lyotard, was knowledge of shared historical narratives and agreement about norms for Habermas. Lyotard was pessimistic about the resilience of these kinds of norms under the pressure of cybernetics. Indeed, this tension between “smart technology” and “rule of law” remains today, expressed in the work of Hildebrandt. The question of whether technical knowledge threatens or delegitimizes legal/hermeneutic knowledge is still with us today.

These intellectual debates are perhaps ultimately about university politics and academic disciplines. If they are truly _ultimately_ about that, that marks their limitation. For what the pragmatist orientation towards knowledge implies is that knowledge does not exist for its own sake, but rather, in most cases, for its application. Philosophers can therefore only achieve so much by appealing to generalized interests. All real applications are contextualized.

Two questions unanswered by these sources (at least in what is assuredly this impoverished schematic of their arguments) are:

  • Whence the interests and applications that motivate the university as socially and economically situated?
  • What accounts for the tensions between the technical/performative disciplines and the hermeneutic and emancipatory ones?

In 1979, the same publication year of The Postmodern Condition, Pierre Bourdieu published Distinction: A Social Critique of the Judgement of Taste. While not in itself an epistemology, Bourdeiu’s method and conclusions provide a foundation for later studies of science, journalism, and the university. Bourdieu’s insight is that aesthetic taste–in art, in design, in hobbies, etc.–is a manifestation of socioeconomic class understood in terms of a multidimensional matrix of forms of capital–such as economic wealth, but also social status and prestigue, and social capital in knowledge and skills. Those with lots of wealth and low cultural capital–the nouveau riche–will value expensive, conspicuous consumption. Those with low wealth and high cultural capital–academics, perhaps–will value intricate works that require time and training to understand and so on. But these preferences exist to maintain the social structures of (multiply defined) capital accumulation.

A key figure in Bourdieu’s story is that of the petit bourgeoisie, the transitional middle class that has specialized their labor, created perhaps a small business, but has not accumulated capital in a way that secures them in the situation where they aspire to be. In today’s economy, these might include the entrepreneurs–those who would, by their labor, aspirationally transform themselves from laborers into capitalists. They would do this by the creation of technology–the means of productions, capital. Unlike labor applied directly to the creation of goods and services as commodities, capital technologies, commodified through the institution of intellectual property, have the potential to scale in use well beyond the effort of their creation and, through Schumpeterian disruption, make their creators wealthy enough to change their class position. On the other hand, there are those who prefer the academic lifestyle, who luxuriate in the study of literature and critique. Through the institutions of critical academia, these are also jobs that can be won through the accumulation of, in this case social and cultural, capital. By design, these are fields of knowledge that exist for their own sake. There are also, of course, law and social scientific disciplines that are helpful for the cultural formation of politicians, legislators, and government workers of various kinds.

Viewed in this way, we can start to see “human interests” not merely as transcendental features the general human condition, but rather as the expression of class and capital interests. This makes sense given the practical reality of universities getting most of their income through tuition. Students attend universities in order to prepare themselves for careers. The promise of a professional career allows universities to charge higher tuition. Where in the upper classes people choose to compete on intangible cultural capital rather than economic capital, universities maintain specialized disciplinary tracks in the humanities.

Notably, the emancipatory role of the humanities, lauded by Habermas, subtly lampooned (parhaps) by Lyotard, is in other works more closely connected to leisure. As early as 1947, Horkheimer, in Eclipse of Reason, points out that the kind of objective reason he sees as essential to the moral grounding of society that has been otherwise derailed by capitalism relies on leisure time that this a difficult class attainment. In perhaps cynical Bourdieusian terms, the ability to reflect on the world and decide, beyond the restrictions of material demands, on an independent or transcendent system of values is itself a form of cultural accumulation of the most rarified kind. However, as this form of cultural attainment is not connected directly to any means of production, it is perhaps a mystery what grounds it pragmatically.

There’s an answer. It’s philanthropy. The arts and humanities, the idealistic independent policy think tanks, and so on, are funded by those who, having accumulated economic capital and the capacity for leisurely thinking about the potential for a better word, have allocated some portion of their wealth towards “causes”. The competition for legitimacy between and among philanthropic causes is today a major site of politics and ideology. Most obviously, political parties and candidacy run on donations, which is in a sense a form of values-driven philanthropy. The appropriation of state funds, or not, for particular causes becomes a battlefield of all forms of capital at the end of the day

This is all understandable from the perspective that is now truly at the center of the modern university: the perspective of business administration. Ever since Herbert Simon, it has been widely known that the managerialist discipline and computational and cybernetic sciences are closely aligned. The economic sociology of Bourdieu is notable in that it is a successor to the sociology of Marx, but also a successor to the phenomenological approach of Kant, and yet is ultimately consistent with the managerialist view of institutions relying on skilled capital management. Disciplines or sub-disciplines that are peripheral to these core skillsets by virtue of their position in the network of capital flows are marginal by definition.

This accounts for much of interdisciplinary politics and grievance. The social structures described here account for the teleological dependency structure of different forms of knowledge: what it is possible to motivate, and with what. To the extent that a discipline as a matter of methodological commitment is unable to account for this social structure, it will be dependent on its own ability to perpetuate itself autonomously though the stupefication of its students.

There is another form of disciplinary dependency worth mentioning. It cuts the other way: it is the dependency that arises from the infrastructural needs of the knowledge institutions. This instrumental dependency is where this line of reasoning connects with Ihde’s instrumental realism as a philosophy of science. Here, too, there are disciplines that are blind to themselves. To the extent that a discipline is unable to account for the scientific advances necessary for its own work, it survives through the heroics of performative contradiction. There may be cases where an institution has developed enough teleological autonomy to reject the knowledge behind its own instrumentation, but in these cases we must be tempted to consider the knowledge claims of the former to be specious and pretensious. What purpose does fashionable nonsense have, if it rejects the authority of those that it depends on materially? “Those” here referring to those classes that embody the relevant infrastructural knowledge.

The answer is perhaps best addressed using the Bourdieusian insights already addressed: an autonomous field of discourse that denies its own infrastructure is a cultural market designed to establish a distinct form of capital, an expression of leisure. The rejection of performativity, or tenuous and ambiguous connection to it, becomes a class marker; synecdochal with leisure itself, which can then be held up as an esteemable goal. Through Lyotard’s analysis, we can see how a field so constructed might be successful through the rhetorical power of its own paralogic.

What has been lost, through this process, is the metanarrative of the university, most especially of the university as an anchor of knowledge in itself. The pragmatist cybernetic knowledge orientation entails that the university is subsumed to wider systems of capital flows, and the only true guarantee of its autonomy is philanthropic endowment which might perpetuate its ability to develop a form of capital that serves its own sake.

A philosophical puzzle: morality with complex rationality

There’s a recurring philosophical puzzle that keeps coming up as one drills into the foundational issues at the heart of technology policy. The more complete articulation of it that I know of is in a draft I’ve written with Jake Goldenfein whose publication was COVID delayed. But here is an abbreviated version of the philosophical problem, distilled perhaps from the tech policy context.

For some reason it all comes back to Kant. The categorical imperative has two versions that are supposed to imply each other:

  • Follow rules that would be agreed on as universal by rational beings.
  • Treat others as ends and not means.

This is elegant and worked quite well while the definitions of ‘rationality’ in play were simple enough that Man could stand at the top of the hierarchy.

Kant is outdated now of course but we can see the influence of this theory in Rawls’s account of liberal ethics (the ‘veil of ignorance’ being a proxy for the reasoning being who has transcended their empirical body), in Habermas’s account of democracy (communicative rationality involving the setting aside of individual interests), and so on. Social contract theories are more or less along these lines. This paradigm is still more or less the gold standard.

There’s a few serious challenges to this moral paradigm. They both relate to how the original model of rationality that it is based on is perhaps naive or so rarefied to be unrealistic. What happens if you deny that people are rational in any disinterested sense, or allow for different levels of rationality? It all breaks down.

On the one hand, there’s various forms of egoism. Sloterdijk argues that Nietzsche stood out partly because he argued for an ethics of self-advancement, which rejected deontological duty. Scandalous. The contemporary equivalent is the reputation of Ayn Rand and those inspired by her. The general idea here is the rejection of social contract. This is frustrating to those who see the social contract as serious and valuable. A key feature of this view is that reason is not, as it is for Kant, disinterested. Rather, it is self-interested. It’s instrumental reason with attendant Humean passions to steer it. The passions need not be too intellectually refined. Romanticism, blah blah.

On the other hand, the 20th century discovers scientifically the idea of bounded rationality. Herbert Simon is the pivotal figure here. Individuals, being bounded, form organizations to transcend their limits. Simon is the grand theorist of managerialism. As far as I know, Simon’s theories are amoral, strictly about the execution of instrumental reason.

Nevertheless, Simon poses a challenge to the universalist paradigm because he reveals the inadequacy of individual humans to self-determine anything of significance. It’s humbling; it also threatens the anthropocentrism that provided the grounds for humanity’s mutual self-respect.

So where does one go from here?

It’s a tough question. Some spitballing:

  • One option is to relocate the philosophical subject from the armchair (Kant) to the public sphere (Habermas) into a new kind of institution that was better equipped to support their cogitation about norms. A public sphere equipped with Bloomberg terminals? But then who provides the terminals? And what about actually existing disparities of access?
    • One implication of this option, following Habermas, is that the communications within it, which would have to include data collection and the application of machine learning, would be disciplined in ways that would prevent defections.
    • Another implication, which is the most difficult one, is that the institution that supports this kind of reasoning would have to acknowledge different roles. These roles would constitute each other relationally–there would need to be a division of labor. But those roles would need to each be able to legitimize their participation on the whole and trust the overall process. This seems most difficult to theorize let alone execute.
  • A different option, sort of the unfinished Nietzschean project, is to develop the individual’s choice to defect into something more magnanimous. Simone de Beauvoir’s widely underrated Ethics of Ambiguity is perhaps the best accomplishment along these lines. The individual, once they overcome their own solipsism and consider their true self-interests at an existential level, come to understand how the success of their projects depends on society because society will outlive them. In a way, this point echoes Simon’s in that it begins from an acknowledgment of human finitude. It reasons from there to a theory of how finite human projects can become infinite (achieving the goal of immortality to the one who initiates them) by being sufficiently prosocial.

Either of these approaches might be superior to “liberalism”, which arguably is stuck in the first paradigm (though I suppose there are many liberal theorists who would defend their position). As a thought experiment, I wonder what public policies motivated by either of these positions would look like.

some PLSC 2020 notes: one framing of the managerialism puzzle

PLSC 2020 was quite interesting this year.

There were a number of threads I’d like to follow up on. One of them has to do with managerialism and the ability of the state (U.S. in this context) to regulate industry.

I need to do some reading to fill some gaps in my understanding, but this is how I understand the puzzle so far.

Suppose the state wants to regulate industry. Congress passes a bill creating an agency with regulatory power with some broadly legislated mandate. The agency comes up with regulations. Businesses then implement policies to comply with the regulation. That’s how it’s supposed to go.

But in practice, there is a lot of translational work being done here. The broadly legislated mandate will be in a language that can get passed by Congress. It delegates elaboration on the specifics to the expert regulators in the agency; these regulators might be lawyers. But when the corporate bosses get the regulations (maybe from their policy staff, also lawyers?) they begin to work with it in a “managerialist” way. This means, I gather, that they manage the transition towards compliance, but in a way that minimizes the costs of compliance. If they can comply without adhering to the purpose of the regulation–which might be ever-so-clear to the lawyers who dreamed it up–so be it.

This seems all quite obvious. Of course it would work this way. If I gather correctly at this point (and maybe I don’t), the managerialist problem is: because of the translational work going on between legislate intent through to administrative regulation into corporate policy into implementation, there’s a lot of potential to have information “lost in translation”, and this information loss works to the advantage of the regulated corporation, because it is using all that lost regulatory bandwidth to its advantage.

We should teach economic history (of data) as “data science ethics”.

I’ve recently come across an interesting paper published at Scipy 2019, Dusen et al.’s “Accelerating the Advancement of Data Science Education” (2019) (link). It summarizes recent trends in data science education, as modeled by UC Berkeley’s Division of Data Science, which is now the Division of Computing, Data Science, and Society (CDSS). This is a striking piece to me as I worked at Berkeley on its data science capabilities several years ago and continue to be fascinated by my alma mater, the School of Information, as it navigates being part of CDSS.

Among other interesting points in the article, two are particularly noteworthy to me. The first is that the integration of data science into the social sciences appears to have continued apace. The article mentions that data science’s integration into the social science has continued apace. Economics, in particular, is well represented and supported in the extended data science curriculum.

The other interesting point is the emphasis on data science ethics as an essential pillar of the educational program. The writing in this piece is consistent with what I’ve come to expect from Berkeley on this topic, and I believe it’s indicative of broad trends in academia.

The authors of this piece are explicit about their “theory of change”. What is data science ethics education supposed to accomplish?

Including training in ethical considerations at all levels of society and all steps of the data science workflow in undergraduate data science curricula could play an important role in stimulating change in industry as our students enter the workforce, perhaps encouraging companies to add ethical standards to their mission statements or to hire chief ethics officers to oversee not only day-to-day operations but also the larger social consequences of their work.

The theory of change articulated by the paper is that industry will change if ethically educated students enter the workforce. They see a future where companies change their mission statements in accord with what has been taught in data science ethics courses, or hire oversight officials.

This is, it must be noted, broadly speculative, and implies that the leadership of the firms who hire these Berkeley grads will be responsive to their employees. However, unlike in some countries in Europe, the United States does not give employees a lot of say in the governance of firms. Technology firms, such as Amazon and Google, have recently proven to be rather unfriendly to employees that attempt to organize in support of “ethics”. This is for highly conventional reasons: the management of these firms tends to be oriented towards the goal of maximizing shareholder profits, and having organized employees advocating for ethical issues that interfere with business is an obstacle to that goal.

This would be understood plainly if economics, or economic history, was taught as part of “data science ethics”. But it’s not for some reason. Information economics, which would presumably be where one would start to investigate the way incentives drive data science institutions, is perhaps too complex to be included in the essential undergraduate curriculum, despite its being perhaps critical to understanding the “data intensive” social world we all live in now.

We forget today, often, that the original economists (Adam Smith, Alfred Marshall, etc.) were all originally moral philosophers. Economics has begun to be seen as a field designed to be in instrumental support of business practice or ideology rather than an investigation into the ethical consequences of social and material structure. That’s too bad.

Instead of teaching economic history, which would be a great way of showing students the ethical implications of technology, instead Berkeley is teaching Science and Technology Studies (STS) and algorithmic fairness! I’ll quote at length:

A recent trend in incorporating such ethical practices includes
incorporating anti-bias algorithms in the workplace. Starting from
the beginning of their undergraduate education, UC Berkeley students can take History 184D: Introduction to Science, Technology, and Society: Human Contexts and Ethics of Data, which covers the implications of computing, such as algorithmic bias. Additionally, students can take Computer Science 294: Fairness in Machine Learning, which spends a semester in resisting racial, political, and physical discrimination. Faculty have also come together to create the Algorithmic Fairness and Opacity Working Group at Berkeley’s School of Information that brainstorms methods to improve algorithms’ fairness, interpretability, and accountability. Implementing such courses and interdisciplinary groups is key to start the conversation within academic institutions, so students
can mitigate such algorithmic bias when they work in industry or
academia post-graduation.


Databases and algorithms are socio-technical objects; they emerge and evolve in tandem with the societies in which they operate [Latour90]. Understanding data science in this way and recognizing its social implications requires a different kind of critical thinking that is taught in data science courses. Issues such as computational agency [Tufekci15], the politics of data classification and statistical inference [Bowker08], [Desrosieres11], and the perpetuation of social injustice through algorithmic decision making [Eubanks19], [Noble18], [ONeil18] are well known to scholars in the interdisciplinary field of science and technology
studies (STS), who should be invited to participate in the development of data science curricula. STS or other courses in the social sciences and humanities dealing specifically with topics related to data science may be included in data science programs.

This is all very typical. The authors are correct that algorithmic fairness and STS have been trendy ways of teaching data science ethics. It is perhaps too cynical to say that these are trendy approaches to “data science ethics” because they are the data science ethics that Microsoft will pay for. Let that slip as a joke.

However, it is unfortunate if students have no better intellectual equipment for dealing with “data science ethics” than this. Algorithmic fairness is a fascinating field of study with many interesting technical results. However, as has been broadly noted by STS scholars, among others, the successful use of “algorithmic fairness” technology depends on the social context in which it is deployed. Often, “fairness” is achieved through greater scientific and technical integrity: for example, properly deducing cause and effect rather than lazily applying techniques that find correlation. But the ethical challenges in the workplace are often not technical challenges. They are the challenges of managing the economic incentives of the firm, and how these effect the power structures within the firm. (Metcalf et al., 2019) This is apparently not material that is being taught at Berkeley to data science students.

This more careful look at the social context in which technology is being used is supposed to be what STS is teaching. But, all too often, this is not what it’s doing. I’ve written elsewhere why STS is not the solution to “tech ethics”. Part of (e.g. Latourian) STS training is a methodological, if not intellectual, relativistic skepticism about science and technology itself (Carroll, 2006). As a consequence, it requires, of itself, to be a humanistic or anthropological field, using “interpretivist” methods, with weak claims to generalizability. It is, first and foremost, an academic field, not an applied one. The purpose of STS is to generate fascinating critiques.

There are many other social sciences that have different aims, such as the aim of building consensus around what social and economic conditions are in order to motivate political change. These social sciences have ethical import. But they are built around a different theory of change. They are aimed at the student as a citizen in a democracy, not as an employee at a company. And while I don’t underestimate the challenges of advocating for designing education to empower students as public citizens in this economic climate, it must nevertheless be acknowledge, as an ethical matter, that a “data science ethics” curriculum that does not address the politics behind those difficulties will be an anemic one, at best.

There is a productive way forward. It requires, however, interdisciplinary thinking that may be uncomfortable or, in the end, impossible for many established institutions. If students are taught a properly historicized and politically substantive “data science ethics”, not in the mode of an STS-based skepticism about technology and science, but rather as economic history that is informed by data science (computational and inferential thinking) as an intellectual foundation, then ethical considerations would need not be relegated to a hopeful afterthought invested in a theory of corporate change that is ultimately a fantasy. Rather, it would put “data science ethics” on a scientific foundation and help civic education justify itself as a matter of social fact.

Addendum: Since the social sciences aren’t doing this work, it looks like some computer scientists are doing it instead. This report by Narayanan provides a recent economic history of “dark patterns” since the 1970’s–an example of how historical research can put “data science ethics” in context.

References

Carroll, P. (2006). Science of Science and Reflexivity. Social Forces85(1), 583-585.

Metcalf, J., & Moss, E. (2019). Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics. Social Research: An International Quarterly86(2), 449-476.

Van Dusen, E., Suen, A., Liang, A., & Bhatnagar, A. (2019). Accelerating the Advancement of Data Science Education. Proceedings of the 18th Python in Science Conference (SciPy 2019)

Internet service providers are utilities

On Sunday, New York State is closing all non-essential brick-and-mortar businesses and ordering all workforce who are able to work from home. Zoom meetings from home are now the norm for people working for both the private sector and government.

One might reasonably want to know whether the internet service providers (ISP) are operating normally during this period. I had occasion to call up Optimum yesterday and ask. I was told, very helpfully, “Were doing business as usual because we are like a utility.”

It’s quite clear that the present humane and responsible approach to COVID-19 depends on broad and uninterrupted access to the Internet to homes. The government and businesses would cease to function without it. Zoom meetings are performing the role that simple audio telephony once did. And executive governments are recognizing this as they use their emergency powers.

There has been a strain of “technology policy” thought that some parts of “the tech sector” should be regulated as utilities. In 2015, the FCC reclassified broadband access as a utility as part of their Net Neutrality decision. In 2018, this position was reversed. This was broadly seen as a win for the telecom companies.

One plausible political consequence of COVID-19 is the reconsideration of the question of whether ISPs are utilities or not. They are.

Notes on Krussell & Smith, 1998 and macroeconomic theory

I’m orienting towards a new field through my work on HARK. A key paper in this field is Krusell and Smith, 1998 “Income and wealth heterogeneity in the macroeconomy.” The learning curve here is quite steep. These are, as usual, my notes as I work with this new material.

Krusell and Smith are approaching the problem of macroeconomic modeling on a broad foundation. Within this paradigm, the economy is imagined as a large collection of people/households/consumers/laborers. These exist at a high level of abstraction and are imagined to be intergenerationally linked. A household might be an immortal dynasty.

There is only one good: capital. Capital works in an interesting way in the model. It is produced every time period by a combination of labor and other capital. It is distributed to the households, apportioned as both a return on household capital and as a wage for labor. It is also consumed each period, for the utility of the households. So all the capital that exists does so because it was created by labor in a prior period, but then saved from immediate consumption, then reinvested.

In other words, capital in this case is essentially money. All other “goods” are abstracted way into this single form of capital. The key thing about money is that it can be saved and reinvested, or consumed for immediate utility.

Households also can labor, when they have a job. There is an unemployment rate and in the model households are uniformly likely to be employed or not, no matter how much money they have. The wage return on labor is determined by an aggregate economic productivity function. There are good and bad economic periods. These are determine exogenously and randomly. There are good times and bad times; employment rates are determined accordingly. One major impetus for saving is insurance for bad times.

The problem raised by Krusell and Smith in this, what they call their ‘baseline model’, is that because all households are the same, the equilibrium distribution of wealth is far too even compared with realistic data. It’s more normally distributed than log-normally distributed. This is implicitly a critique at all prior macroeconomics, which had used the “representative agent” assumption. All agents were represented by one agent. So all agents are approximately as wealthy as all others.

Obviously, this is not the case. This work was done in the late 90’s, when the topic of wealth inequality was not nearly as front-and-center as it is in, say, today’s election cycle. It’s interesting that one reason why it might have not been front and center was because prior to 1998, mainstream macroeconomic theory didn’t have an account of how there could be such inequality.

The Krusell-Smith model’s explanation for inequality is, it must be said, a politically conservative one. They introduce minute differences in utility discount factor. The discount factor is how much you discount future utility compared to today’s utility. If you have a big discount factor, you’re going to want to consume more today. If you have a small discount factor, you’re more willing to save for tomorrow.

Krussell and Smith show that teeny tiny differences in discount factor, even if they are subject to a random walk around a mean with some persistence within households, leads to huge wealth disparities. Their conclusion is that “Poor households are poor because they’ve chosen to be poor”, by not saving more for the future.

I’ve heard, like one does, all kinds of critiques of Economics as an ideological discipline. It’s striking to read a landmark paper in the field with this conclusion. It strikes directly against other mainstream political narratives. For example, there is no accounting of “privilege” or inter-generational transfer of social capital in this model. And while they acknowledge that in other papers there is the discussion of whether having larger amounts of household capital leads to larger rates of return, Kruselll and Smith sidestep this and make it about household saving.

The tools and methods in the paper are quite fascinating. I’m looking forward to more work in this domain.

References

Krusell, P., & Smith, Jr, A. A. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of political Economy106(5), 867-896.

ethnography is not the only social science tool for algorithmic impact assessment

Quickly responding to Selbst, Elish, and Latonero’s “Accountable Algorithmic Futures“, Data and Society’s response to the Algorithmic Accountability Act of 2019…

The bill would empower the FTC to do “automated decision systems impact assessment” (ADSIA) of automated decision-making systems. The article argues that the devil is in the details and that the way the FTC goes about these assessments will determine their effectiveness.

The point of their article, which I found notable, is to assert the appropriate intellectual discipline for these impact assessments.

This is where social science comes in. To effectively implement the regulations, we believe that engagement with empirical inquiry is critical. But unlike the environmental model, we argue that social sciences should be the primary source of information about impact. Ethnographic methods are key to getting the kind of contextual detail, also known as “thick description,” necessary to understand these dimensions of effective regulation.

I want to flag this as weird.

There is an elision here between “the social sciences” and “ethnographic methods” here, as if there were no social sciences that were not ethnographic. And then “thick description” is implied to be the only source of contextual detail that might be relevant to impact assessments.

This is a familiar mantra, but it’s also plainly wrong. There’s many disciplines and methods within “the social sciences” that aren’t ethnographic, and many ways to get at contextual detail that does not involve “thick description”. There is a worthwhile and interesting intellectual question: what are the appropriate methods for algorithmic impact assessment. The authors of this piece assume an answer to that question without argument.

A few brief notes towards “Procuring Cybersecurity”

I’m shifting research focus a bit and wanted to jot down a few notes. The context for the shift is that I have the pleasure of organizing a roundtable discussion for NYU’s Center for Cybersecurity and Information Law Institute, working closely with Thomas Streinz of NYU’s Guarini Global Law and Tech.

The context for the workshop is the steady feed of news about global technology supply chains and how they are not just relevant to “cybersecurity”, but in some respects are constitutive of cyberinfrastructure and hence the field of its security.

I’m using “global technology supply chains” rather loosely here, but this includes:

  • Transborder personal data flows as used in e-commerce
  • Software- (and Infrastructure-)-as-a-Service being marketing internationally (including Google used abroad, for example)
  • Enterprise software import/export
  • Electronics manufacturing and distribution.

Many concerns about cybersecurity as a global phenomenon circulate around the imagined or actual supply chain. These are sometimes national security concerns that result in real policy, as when Australia recently banned Hauwei and ZTE from supplying 5G network equipment for fear that it would provide a vector of interference from the Chinese government.

But the nationalist framing is certainly not the whole story. I’ve heard anecdotally that after the Snowden revelations, Microsoft’s internally began to see the U.S. government as a cybersecurity “adversary“. Corporate tech vendors naturally don’t want to be known as being vectors for national surveillance, as this cuts down on their global market share.

Governments and corporations have different cybersecurity incentives and threat models. These models intersect and themselves create the dynamic cybersecurity field. For example, these Chinese government has viewed foreign software vendors as cybersecurity threats, and has responded by mandating source code disclosure. But as this is a vector of potential IP theft, foreign vendors have balked, seeing this mandate as a threat. (Ahmed and Weber, 2018).Complicating things further, a defensive “cybersecurity” measure can also serve the goal of protecting domestic technology innovation–which can be framed as providing a nationalist “cybersecurity” edge in the long run.

What, if anything, prevents a total cyberwar of all against all? One answer is trade agreements that level the playing field, or at least establish rules for the game. Another is open technology and standards, which provide an alternative field driven by the benefits of interoperability rather than proprietary interest and secrecy. Is it possible to capture any of this in accurate model or theory?

I love having the opportunity to explore these questions, as they are at the intersection of my empirical work on software supply chains (Benthall et al., 2016; Benthall, 2017) and also theoretical work on data economics in my dissertation. My hunch for some time has been that there’s a dearth of solid economics theory for the contemporary digital economy, and this is one way of getting at that.

References

Ahmed, S., & Weber, S. (2018). China’s long game in techno-nationalism. First Monday, 23(5). 

Benthall, S., Pinney, T., Herz, J. C., Plummer, K., Benthall, S., & Rostrup, S. (2016). An ecological approach to software supply chain risk management. In 15th Python in Science Conference.

Benthall, S. (2017, September). Assessing software supply chain risk using public data. In 2017 IEEE 28th Annual Software Technology Conference (STC) (pp. 1-5). IEEE.

Notes on O’Neil, Chapter 2, “Bomb Parts”

Continuing with O’Neil’s Weapons of Math Destruction on to Chapter 2, “Bomb Parts”. This is a popular book and these are quick chapters. But that’s no reason to underestimate them! This is some of the most lucid work I’ve read on algorithmic fairness.

This chapter talks about three kinds of “models” used in prediction and decision making, with three examples. O’Neil speak highly of the kinds of models used in baseball to predict the trajectory of hits and determine the optimal placement of people in the field. (Ok, I’m not so good at baseball terms). These are good, O’Neil says, because they are transparent, they are consistently adjusted with new data, and the goals are well defined.

O’Neil then very charmingly writes about the model she uses mentally to determine how to feed her family. She juggles a lot of variables: the preferences of her kids, the nutrition and cost of ingredients, and time. This is all hugely relatable–everybody does something like this. Her point, it seems, is that this form of “model” encodes a lot of opinions or “ideology” because it reflects her values.

O’Neil then discusses recidivism prediction, specifically the LSI-R (Level of Service Inventory–Revised) tool. It asks questions like “How many previous convictions have you had?” and uses that to predict likelihood of future prediction. The problem is that (a) this is sensitive to overpolicing in neighborhoods, which has little to do with actual recidivism rates (as opposed to rearrest rates), and (b) e.g. black neighborhoods are more likely to be overpoliced, meaning that the tool, which is not very good at predicting recidivism, has disparate impact. This is an example of what O’Neil calls an (eponymous) weapon of math destruction.(WMD)

She argues that the three qualities of a WMD are Scale, Opacity, and Damage. Which makes sense.

As I’ve said, I think this is a better take on algorithmic ethics than almost anything I’ve read on the subject before. Why?

First, it doesn’t use the word “algorithm” at all. That is huge, because 95% of the time the use of the word “algorithmic” in the technology-and-society literature is stupid. People use “algorithm” when they really mean “software”. Now, they use “AI System” to mean “a company”. It’s ridiculous.

O’Neil makes it clear in this chapter that what she’s talking about are different kinds of models. Models can be in ones head (as in her plan for feeding her family) or in a computer, and both kinds of models can be racist. That’s a helpful, sane view. It’s been the consensus of computer scientists, cognitive scientists, and AI types for decades.

The problem with WMDs, as opposed to other, better models, is that the WMDS models are unhinged from reality. O’Neil’s complaint is not with use of models, but rather that models are being used without being properly trained using sound sampling on data and statistics. WMDs are not artificially intelligences; they are artificial stupidities.

In more technical terms, it seems like the problem with WMDs is not that they don’t properly trade off predictive accuracy with fairness, as some computer science literature would suggest is necessary. It’s that the systems have high error rates in the first place because the training and calibration systems are poorly designed. What’s worse, this avoidable error is disparately distributed, causing more harm to some groups than others.

This is a wonderful and eye-opening account of unfairness in the models used by automated decision-making systems (note the language). Why? Because it shows that there is a connection between statistical bias, the kind of bias that creates distortions in a quantitative predictive process, and social bias, the kind of bias people worry about politically, which consistently uses the term in both ways. If there is statistical bias that is weighing against some social group, then that’s definitely, 100% a form of bias.

Importantly, this kind of bias–statistical bias–is not something that every model must have. Only badly made models have it. It’s something that can be mitigated using scientific rigor and sound design. If we see the problem the way O’Neil sees it, then we can see clearly how better science, applied more rigorously, is also good for social justice.

As a scientist and technologist, it’s been terribly discouraging in the past years to be so consistently confronted with a false dichotomy between sound engineering and justice. At last, here’s a book that clearly outlines how the opposite is the case!