Digifesto

Category: Uncategorized

Values in design and mathematical impossibility

Under pressure from the public and no doubt with sincere interest in the topic, computer scientists have taken up the difficulty task of translating commonly held values into the mathematical forms that can be used for technical design. Commonly, what these researches discover is some form of mathematical impossibility of achieving a number of desirable goals at the same time. This work has demonstrated the impossibility of having a classifier that is fair with respect to a social category without data about that very category (Dwork et al., 2012), having a fair classifier that is both statistically well calibrated for the prediction of properties of persons and equalizing the false positive and false negative rates of partitions of that population (Kleinberg et al., 2016), of preserving privacy of individuals after an arbitrary number of queries to a database, however obscured (Dwork, 2008), or of a coherent notion of proxy variable use in privacy and fairness applications that is based on program semantics (as opposed to syntax) (Datta et al., 2017).

These are important results. An important thing about them is that they transcend the narrow discipline in which they originated. As mathematical theorems, they will be true whether or not they are implemented on machines or in human behavior. Therefore, these theorems have a role comparable to other core mathematical theorems in social science, such as Arrow’s Impossibility Theorem (Arrow, 1950), a theorem about the impossibility of having a voting system with reasonable desiderata for determining social welfare.

There can be no question of the significance of this kind of work. It was significant a hundred years ago. It is perhaps of even more immediate, practical importance when so much public infrastructure is computational. For what computation is is automation of mathematics, full stop.

There are some scholars, even some ethicists, for whom this is an unwelcome idea. I have been recently told by one ethics professor that to try to mathematize core concepts in ethics is to commit a “category mistake”. This is refuted by the clearly productive attempts to do this, some of which I’ve cited above. This belief that scientists and mathematicians are on a different plane than ethicists is quite old: Hannah Arendt argued that scientists should not be trusted because their mathematical language prevented them from engaging in normal political and ethical discourse (Arendt, 1959). But once again, this recent literature (as well as much older literature in such fields as theoretical economics) demonstrates that this view is incorrect.

There are many possible explanations for the persistence of the view that mathematics and the hard sciences do not concern themselves with ethics, are somehow lacking in ethical education, or that engineers require non-technical people to tell them how to engineer things more ethically.

One reason is that the sciences are much broader in scope than the ethical results mentioned here. It is indeed possible to get a specialist’s education in a technical field without much ethical training, even in the mathematical ethics results mentioned above.

Another reason is that whereas understanding the mathematical tradeoffs inherent in certain kinds of design is an important part of ethics, it can be argued by others that what’s most important about ethics is some substantive commitment that cannot be mathematically defended. For example, suppose half the population believes that it is most ethical for members of the other half to treat them with special dignity and consideration, at the expense of the other half. It may be difficult to arrive at this conclusion from mathematics alone, but this group may advocate for special treatment out of ethical consideration nonetheless.

These two reasons are similar. The first states that mathematics includes many things that are not ethics. The second states that ethics potentially (and certainly in the minds of some people) includes much that is not mathematical.

I want to bring up a third reason, which is perhaps more profound than the other two, which is this: what distinguishes mathematics as a field is its commitment to logical non-contradiction, which means that it is able to baldly claim when goals are impossible to achieve. Acknowledging tradeoffs is part of what mathematicians and scientists do.

Acknowledging tradeoffs is not something that everybody else is trained to do, and indeed many philosophers are apparently motivated by the ability to surpass limitations. Alain Badiou, who is one of the living philosophers that I find to be most inspiring and correct, maintains that mathematics is the science of pure Being, of all possibilities. Reality is just a subset of these possibilities, and much of Badiou’s philosophy is dedicated to the Event, those points where the logical constraints of our current worldview are defeated and new possibilities open up.

This is inspirational work, but it contradicts what many mathematicians do in fact, which is identify impossibility. Science forecloses possibilities where a poet may see infinite potential.

Other ethicists, especially existentialist ethicists, see the limitation and expansion of possibility, especially in the possibility of personal accomplishment, as fundamental to ethics. This work is inspiring precisely because it states so clearly what it is we hope for and aspire to.

What mathematical ethics often tells us is that these hopes are fruitless. The desiderata cannot be met. Somebody will always get the short stick. Engineers, unable to triumph against mathematics, will always disappoint somebody, and whoever that somebody is can always argue that the engineers have neglected ethics, and demand justice.

There may be good reasons for making everybody believe that they are qualified to comment on the subject of ethics. Indeed, in a sense everybody is required to act ethically even when they are not ethicists. But the preceding argument suggests that perhaps mathematical education is an essential part of ethical education, because without it one can have unrealistic expectations of the ethics of others. This is a scary thought because mathematics education is so often so poor. We live today, as we have lived before, in a culture with great mathophobia (Papert, 1980) and this mathophobia is perpetuated by those who try to equate mathematical training with immorality.

References

Arendt, Hannah. The human condition:[a study of the central dilemmas facing modern man]. Doubleday, 1959.

Arrow, Kenneth J. “A difficulty in the concept of social welfare.” Journal of political economy 58.4 (1950): 328-346.

Benthall, Sebastian. “Philosophy of computational social science.” Cosmos and History: The Journal of Natural and Social Philosophy 12.2 (2016): 13-30.

Datta, Anupam, et al. “Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs.” arXiv preprint arXiv:1705.07807 (2017).

Dwork, Cynthia. “Differential privacy: A survey of results.” International Conference on Theory and Applications of Models of Computation. Springer, Berlin, Heidelberg, 2008.

Dwork, Cynthia, et al. “Fairness through awareness.” Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012.

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807 (2016).

Papert, Seymour. Mindstorms: Children, computers, and powerful ideas. Basic Books, Inc., 1980.

Pondering “use privacy”

I’ve been working carefully with Datta et al.’s “Use Privacy” work (link), which makes a clear case for how a programmatic, data-driven model may be statically analyzed for its use of a proxy of a protected variable, and repaired.

Their system has a number of interesting characteristics, among which are:

  • The use of a normative oracle for determining which proxy uses are prohibited.
  • A proof that there is no coherent definition of proxy use which has all of a set of very reasonable properties defined over function semantics.

Given (2), they continue with a compelling study of how a syntactic definition of proxy use, one based on the explicit contents of a function, can support a system of detecting and repairing proxies.

My question is to what extent the sources of normative restriction on proxies (those characterized by the oracle in (1)) are likely to favor syntactic proxy use restrictions, as opposed to semantic ones. Since ethicists and lawyers, who are the purported sources of these normative restrictions, are likely to consider any technical system a black box for the purpose of their evaluation, they will naturally be concerned with program semantics. It may be comforting for those responsible for a technical program to be able to, in a sense, avoid liability by assuring that their programs are not using a restricted proxy. But, truly, so what? Since these syntactic considerations do not make any semantic guarantees, will they really plausibly address normative concerns?

A striking result from their analysis which has perhaps broader implications is the incoherence of a semantic notion of proxy use. Perhaps sadly but also substantively, this result shows that a certain plausible normative is impossible for a system to fulfill in general. Only restricted conditions make such a thing possible. This seems to be part of a pattern in these rigorous computer science evaluations of ethical problems; see also Kleinberg et al. (2016) on how it’s impossible to meet several plausible definitions of “fairness” in the risk-assessment scores across social groups except under certain conditions.

The conclusion for me is that what this nobly motivated computer science work reveals is that what people are actually interested in normatively is not the functioning of any particular computational system. They are rather interested in social conditions more broadly, which are rarely aligned with our normative ideals. Computational systems, by making realities harshly concrete, are disappointing, but it’s a mistake to make that a disappointment with the computing systems themselves. Rather, there are mathematical facts that are disappointing regardless of what sorts of systems mediate our social world.

This is not merely a philosophical consideration or sociological observation. Since the the interpretation of laws are part of the process of informing normative expectations (as in a normative oracle), it is an interesting an perhaps open question how lawyers and judges, in their task of legal interpretation, make use of the mathematical conclusions about normative tradeoffs being offered up by computer scientists.

References

Datta, Anupam, et al. “Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs.” arXiv preprint arXiv:1705.07807 (2017).

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807 (2016).

Notes on fairness and nondiscrimination in machine learning

There has been a lot of work done lately on “fairness in machine learning” and related topics. It cannot be a coincidence that this work has paralleled a rise in political intolerance that is sensitized to issues of gender, race, citizenship, and so on. I more or less stand by my initial reaction to this line of work. But very recently I’ve done a deeper and more responsible dive into this literature and it’s proven to be insightful beyond the narrow problems which it purports to solve. These are some notes on the subject, ordered so as to get to the point.

The subject of whether and to what extent computer systems can enact morally objectionable bias goes back at least as far as Friedman and Nissenbaum’s 1996 article, in which they define “bias” as systematic unfairness. They mean this very generally, not specifically in a political sense (though inclusive of it). Twenty years later, Kleinberg et al. (2016) prove that there are multiple, competing notions of fairness in machine classification which generally cannot be satisfied all at once; they must be traded off against each other. In particular, a classifier that uses all available information to optimize accuracy–one that achieves what these authors call calibration–cannot also have equal false positive and false negative rates across population groups (read: race, sex), properties that Hardt et al. (2016) call “equal opportunity”. This is no doubt inspired by a now very famous ProPublica article asserting that a particular kind of commercial recidivism prediction software was “biased against blacks” because it had a higher false positive rate for black suspects than white offenders. Because bail and parole rates are set according to predicted recidivism, this led to cases where a non-recidivist was denied bail because they were black, which sounds unfair to a lot of people, including myself.

While I understand that there is a lot of high quality and well-intentioned research on this subject, I haven’t found anybody who could tell me why the solution to this problem was to stop using predicted recidivism to set bail, as opposed to futzing around with a recidivism prediction algorithm which seems to have been doing its job (Dieterich et al., 2016). Recidivism rates are actually correlated with race (Hartney and Vuong, 2009). This is probably because of centuries of systematic racism. If you are serious about remediating historical inequality, the least you could do is cut black people some slack on bail.

This gets to what for me is the most baffling aspect of this whole research agenda, one that I didn’t have the words for before reading Barocas and Selbst (2016). A point well-made by them is that the interpretation anti-discrimination law, which motivates a lot of this research, is fraught with tensions that complicate its application to data mining.

“Two competing principles have always undergirded anti-discrimination law: nondiscrimination and antisubordination. Nondiscrimination is the narrower of the two, holding that the responsibility of the law is to eliminate the unfairness individuals experience a the hands of decisionmakers’ choices due to membership in certain protected classes. Antisubordination theory, in contrast, holds that the goal of antidiscrimination law is, or at least should be, to eliminate status-based inequality due to membership in those classes, not as a matter of procedure, but substance.” (Barocas and Selbst, 2016)

More specifically, these two principles motivate different interpretations of the two pillars of anti-discrimination law, disparate treatment and disparate impact. I draw on Barocas and Selbst for my understanding of each:

A judgment of disparate treatment requires either a formal disparate treatment (across protected groups) of similarly situated people, or an intent to discriminate. Since in a large data mining application protected group membership will be proxied by many other factors, it’s not clear if the ‘formal’ requirement makes much sense here. And since machine learning applications only very rarely have racist intent, that option seems challengeable as well. While there are interpretations of these criteria that are tougher on decision-makers (i.e. unconscious intents), these seem to be motivated by antisubordination rather than the weaker nondiscrimination principle.

A judgment of disparate impact is perhaps more straightforward, but it can be mitigated in cases of “business necessity”, which (to get to the point) is vague enough to plausibly include optimization in a technical sense. Once again, there is nothing to see here from a nondiscrimination standpoint, though a nonsubordinationist would rather that these decision-makers have to take correcting for historical inequality into account.

I infer from their writing that Barocas and Selbst believe that nonsubordination is an important principle for nondiscrimination. In any case, they maintain that making the case for applying nondiscrimination laws to data mining effectively requires a commitment to “substantive remediation”. This is insightful!

Just to put my cards on the table: as much as I may like the idea of substantive remediation in principle, I personally don’t think that every application of nondiscrimination law needs to be animated by it. For many institutions, narrow nondiscrimination seems to be adequate if not preferable. I’d prefer remediation to occur through other specific policies, such as more public investment in schools in low-income districts. Perhaps for this reason, I’m not crazy about “fairness in machine learning” as a general technical practice. It seems to me to be trying to solve social problems with a technical fix, which despite being quite technical myself I don’t always see as a good idea. It seems like in most cases you could have a machine learning mechanism based on normal statistical principles (the learning step) and then use a decision procedure separately that achieves your political ends.

I wish that this research community (and here I mean more the qualitative research community surrounding it more than the technical community, which tends to define its terms carefully) would be more careful about the ways it talks about “bias”, because often it seems to encourage a conflation between statistical or technical senses of bias and political senses. The latter carry so much political baggage that it can be intimidating to try to wade in and untangle the two senses. And it’s important to do this untangling, because while bad statistical bias can lead to political bias, it can, depending on the circumstances, lead to either “good” or “bad” political bias. But it’s important, from the sake of numeracy (mathematical literacy) to understand that even if a statistically bad process has a politically “good” outcome, that is still, statistically speaking, bad.

My sense is that there are interpretations of nondiscrimination law that make it illegal to make certain judgments taking into account certain facts about sensitive properties like race and sex. There are also theorems showing that if you don’t take into account those sensitive properties, you are going to discriminate against them by accident because those sensitive variables are correlated with anything else you would use to judge people. As a general principle, while being ignorant may sometimes make things better when you are extremely lucky, in general it makes things worse! This should be a surprise to nobody.

References

Barocas, Solon, and Andrew D. Selbst. “Big data’s disparate impact.” (2016).

Dieterich, William, Christina Mendoza, and Tim Brennan. “COMPAS risk scales: Demonstrating accuracy equity and predictive parity.” Northpoint Inc (2016).

Friedman, Batya, and Helen Nissenbaum. “Bias in computer systems.” ACM Transactions on Information Systems (TOIS) 14.3 (1996): 330-347.

Hardt, Moritz, Eric Price, and Nati Srebro. “Equality of opportunity in supervised learning.” Advances in Neural Information Processing Systems. 2016.

Hartney, Christopher, and Linh Vuong. “Created equal: Racial and ethnic disparities in the US criminal justice system.” (2009).

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807 (2016).

Ulanowicz on thermodynamics as phenomenology

I’ve finally worked my way back to Ulanowicz, whose work so intrigued me when I first encountered it over four years ago. Reading a few of his papers on theoretical ecology gave the impression that he is both a serious scientist and onto something profound. Now I’m reading Growth and Development: Ecosystems Phenomenology (1986), which looked to be the most straightforwardly mathematical introduction to his theory of ecosystem ascendancy, which is his theory of how ecosystems can grow and develop over time.

I am eager to get to the hard stuff, where he cashes out the theory in terms of matrix multiplication representing networks of energy flows. I see several parallels to my own work and I’m hoping there are hints in here about how I can best proceed.

But first I must note a few interesting ways in which Ulanowicz positions his argument.

One important one is that he uses the word “phenomenology” in the title and in the opening argument about the nature of thermodynamics. Thermodynamics, he argues, is unlike many other more reductionist parts of physics because it draws general statistical laws on microscopically observed systems which can be reduced to many different configurations of microphenomena. This gives it both a kind of empirical weakness compared to the lower-level laws; nevertheless there is a compelling universality to its descriptive power that informs the application of so many other more specialized sciences.

This resonates with many of the themes I’ve been exploring through my graduate study. Ulanowicz never cites Francisco Varela though the latter is almost a contemporary and similarly interested in combining principles of cybernetics with the life sciences (in Varela’s case, biology). Both Ulanowicz and Varela come to conclusions about the phenomenological nature of the life sciences which are unusual in the hard sciences.

Naturally, the case has been made that the social sciences are phenomenological as well, though generally these claims are made without a hope of making a phenomenological social science as empirically rigorous as ecology, let alone biology. Nevertheless Ulanowicz does hint, as does Varela, at the possibility of extending his models to social systems.

This is of course fascinating given the difficult problem of the “macro-micro link” (see Sawyer). Ecosystem size and the properties Ulanowicz derives about them are “emergent” properties of an ecosystem; his theory is I gather an attempt at a universal description of how these properties emerge.

Somehow, Ulanowicz manages to take on these problems without ever invoking the murky language of “complex adaptive systems”. This is, I suspect, a huge benefit to his work as he seems to write strictly as a scientist and does not mystify things by using undefined language of ‘complexity’.

It is a deeper technical dive than I’ve been used to for some time, but I’m very gratefully in a more technical academic milieu now than I’ve been in for several years. More soon.

References

Ulanowicz, Robert E. “Growth and development: A phenomenological perspective.” (1986).

equilibrium representation

We must keep in mind not only the capacity of state simplifications to transform the world but also the capacity of the society to modify, subvert, block, and even overturn the categories imposed upon it. Here is it useful to distinguish what might be called facts on paper from facts on the ground…. Land invasions, squatting, and poaching, if successful, represent the exercise of de facto property rights which are not represented on paper. Certain land taxes and tithes have been evaded or defied to the point where they have become dead letters. The gulf between land tenure facts on paper and facts on the ground is probably greatest at moments of social turmoil and revolt. But even in more tranquil times, there will always be a shadow land-tenure system lurking beside and beneath the official account in the land-records office. We must never assume that local practice conforms with state theory. – Scott, Seeing Like a State, 1998

I’m continuing to read Seeing Like a State and am finding in it a compelling statement of a state of affairs that is coded elsewhere into the methodological differences between social science disciplines. In my experience, much of the tension between the social sciences can be explained in terms of the differently interested uses of social science. Among these uses are the development of what Scott calls “state theory” and the articulation, recognition, and transmission of “local practice”. Contrast neoclassical economics with the anthropology of Jean Lave as examples of what I’m talking about. Most scholars are willing to stop here: they choose their side and engage in a sophisticated form of class warfare.

This is disappointing from the perspective of science per se, as a pursuit of truth. To see where there’s a place for such work in the social sciences, we only have to the very book in front of us, Seeing Like a State, which stands outside of both state theory and local practices to explain a perspective that is neither but rather informed by a study of both.

In terms of the ways that knowledge is used in support of human interests, in the Habermasian sense (see some other blog posts), we can talk about Scott’s “state theory” as a form of technical knowledge, aimed at facilitating power over the social and natural world. What he discusses is the limitation of technical knowledge in mastering the social, due to complexity and differentiation in local practice. So much of this complexity is due to the politicization of language and representation that occurs in local practice. Standard units of measurement and standard terminology are tools of state power; efforts to guarantee them are confounded again and again in local interest. This disagreement is a rejection of the possibility of hermeneutic knowledge, which is to say linguistic agreement about norms.

In other words, Scott is pointing to a phenomenon where because of the interests of different parties at different levels of power, there’s a strategic local rejection of inter-subjective agreement. Implicitly, agreeing even on how to talk with somebody with power over you is conceding their power. The alternative is refusal in some sense. A second order effect of the complexity caused by this strategic disagreement is the confounding of technical mastery over the social. In Scott’s terminology, a society that is full of strategic lexical disagreement is not legible.

These are generalizations reflecting tendencies in society across history. Nevertheless, merely by asserting them I am arguing that they have a kind of special status that is not itself caught up in the strategic subversions of discourse that make other forms of expertise foolish. There must be some forms of representation that persist despite the verbal disagreements and differently motivated parties that use them.

I’d like to call these kinds of representations, which somehow are technically valid enough to be useful and robust to disagreement, even politicized disagreement, as equilibrium representations. The idea here is that despite a lot of cultural and epistemic churn, there are still attractor states in the complex system of knowledge production. At equilibrium, these representations will be stable and serve as the basis for communication between different parties.

I’ve posited equilibrium representations hypothetically, without having a proof or example yet on one that actually exists. My point is to have a useful concept that acknowledges the kinds of epistemic complexities raised by Scott but that acknowledges the conditions for which a modernist epistemology could prevail despite those complexities.

 

appropriate information flow

Contextual integrity theory defines privacy as appropriate information flow.

Whether or not this is the right way to define privacy (which might, for example, be something much more limited), and whether or not contextual integrity as it is currently resourced as a theory is capable of capturing all considerations needed to determine the appropriateness of information flow, the very idea of appropriate information flow is a powerful one. It makes sense to strive to better our understanding of which information flows are appropriate, which others are inappropriate, to whom, and why.

 

Ohm and Post: Privacy as threats, privacy as dignity

I’m reading side by side two widely divergent law review articles about privacy.

One is Robert Post‘s “The Social Foundations of Privacy: Community and Self in Common Law Tort” (1989) (link)

The other is Paul Ohm‘s “Sensitive Information” (2014) (link)

They are very notably different. Post’s article diverges sharply from the intellectual millieu I’m used to. It starts with an exposition of Goffman’s view of the personal self as being constituted by ceremonies and rituals of human relationships. Privacy tort law is, in Post’s view, about repairing tears in the social fabric. The closest thing to this that I have ever encountered is Fingarette’s book on Confucianism.

Ohm’s article is much more recent and is in large part a reaction to the Snowden leaks. It’s an attempt to provide an account of privacy that can limit the problems associated with massive state (and corporate?) data collection. It attempts to provide a legally informed account of what information is sensitive, and then suggests that threat modeling strategies from computer security can be adapted to the privacy context. Privacy can be protected by identifying and mitigated privacy threats.

As I get deeper into the literature on Privacy by Design, and observe how privacy-related situations play out in the world and in my own life, I’m struck by the adaptability and indifference of the social world to shifting technological infrastructural conditions. A minority of scholars and journalists track major changes in it, but for the most part the social fabric adapts. Most people, probably necessarily, have no idea what the technological infrastructure is doing and don’t care to know. It can be coopted, or not, into social ritual.

If the swell of scholarship and other public activity on this topic was the result of surprising revelations or socially disruptive technological innovations, these same discomforts have also created an opportunity for the less technologically focused to reclaim spaces for purely social authority, based on all the classic ways that social power and significance play out.

Protected: What’s going on?

This content is password-protected. To view it, please enter the password below.

For “Comments on Haraway”, see my “Philosophy of Computational Social Science”

One of my most frequently visited blog posts is titled “Comments on Haraway: Situated knowledge, bias, and code”.  I have decided to password protect it.

If you are looking for a reference with the most important ideas from that blog post, I refer you to my paper, “Philosophy of Computational Social Science”. In particular, its section on “situated epistemology” discusses how I think computational social scientists should think about feminist epistemology.

I have decided to hide the original post for a number of reasons.

  • I wrote it pointedly. I think all the points have now been made better elsewhere, either by me or by the greater political zeitgeist.
  • Because it was written pointedly (even a little trollishly), I am worried that it may be easy to misread my intention in writing it. I’m trying to clean up my act :)
  • I don’t know who keeps reading it, though it seems to consistently get around thirty or more hits a week. Who are these people? They won’t tell me! I think it matters who is reading it.

I’m willing to share the password with anybody who contacts me about it.