Digifesto

Category: Law

Trade secrecy, “an FDA for algorithms”, a software bills of materials (SBOM) #SecretAlgos

At the Conference on Trade Secrets and Algorithmic Systems at NYU today, the target of most critiques is the use of trade secrecy by proprietary technology providers to prevent courts and the public from seeing the inner workings of algorithms that determine people’s credit scores, health care, criminal sentencing, and so on. The overarching theme is that sometimes companies will use trade secrecy to hide the ways that their software is bad, and that that is a problem.

In one panel, the question of whether an “FDA for Algorithms” is on the table–referring the Food and Drug Administration’s approval of pharmaceuticals. It was not dealt with in too much depth, which is too bad, because it is a nice example of how government oversight of potentially dangerous technology is managed in a way that respects trade secrecy.

According to this article, when filing for FDA approval, a company can declare some of their ingredients to be trade secrets. The upshot of that is that those trade secrets are not subject to FOIA requests. However, these ingredients are still considered when approval is granted by the FDA.

It so happens that in the cybersecurity policy conversation (more so than in privacy) the question of openness of “ingredients” to inspection has been coming up in a serious way. NTIA has been hosting multistakeholder meetings about standards and policy around Software Component Transparency. In particular they are encouraging standardizations of Software Bills of Materials (SBOM) like the Linux Foundation’s Software Package Data Exchange (SPDX). SPDX (and SBOM’s more generally) describe the “ingredients” in a software package at a higher level of resolution than exposing the full source code, but at a level specific enough useful for security audits.

It’s possible that a similar method could be used for algorithmic audits with fairness (i.e., nondiscrimination compliance) and privacy (i.e., information sharing to third-parties) in mind. Particular components could be audited (perhaps in a way that protects trade secrecy), and then those components could be listed as “ingredients” by other vendors.

Advertisements

What proportion of data protection violations are due to “dark data” flows?

“Data protection” refers to the aspect of privacy that is concerned with the use and misuse of personal data by those that process it. Though widely debated, scholars continue to converge (e.g.) on ideal data protection consisting of alignment between the purposes the data processor will use the data for and the expectations of the user, along with collection limitations that reduce exposure to misuse. Through its extraterritorial enforcement mechanism, the GDPR has threatened to make these standards global.

The implication of these trends is that there will be a global field of data flows regulated by these kinds of rules. Many of the large and important actors that process user data can be held accountable to the law. Privacy violations by these actors will be due to a failure to act within the bounds of the law that applies to them.

On the other hand, there is also cybercrime, an economy of data theft and information flows that exists “outside the law”.

I wonder what proportion of data protection violations are due to dark data flows–flows of personal data that are handled by organizations operating outside of any effective regulation.

I’m trying to draw an analogy to a global phenomenon that I know little about but which strikes me as perhaps more pressing than data protection: the interrelated problems of money laundering, off-shore finance, and dark money contributions to election campaigns. While surely oversimplifying the issue, my impression is that the network of financial flows can be divided into those that are more and less regulated by effective global law. Wealth seeks out these opportunities in the dark corners.

How much personal data flows in these dark networks? And how much is it responsible for privacy violations around the world? Versus how much is data protection effectively in the domain of accountable organizations (that may just make mistakes here and there)? Or is the dichotomy false, with truly no firm boundary between licit and illicit data flow networks?

the resilience of agonistic control centers of global trade

This post is merely notes; I’m fairly confident that I don’t know what I’m writing about. However, I want to learn more. Please recommend anything that could fill me in about this! I owe most of this to discussion with a colleague who I’m not sure would like to be acknowledged.

Following the logic of James Beniger, an increasingly integrated global economy requires more points of information integration and control.

Bourgeois (in the sense of ‘capitalist’) legal institutions exist precisely for the purpose of arbitrating between merchants.

Hence, on the one hand we would expect international trade law to be Habermasian. However, international trade need not rest on a foundation of German idealism (which increasingly strikes me as the core of European law). Rather, it is an evolved mechanism.

A key part of this mechanism, as I’ve heard, is that it is decentered. Multiple countries compete to be the sites of transnational arbitration, much like multiple nations compete to be tax havens. Sovereignty and discretion are factors of production in the economy of control.

This means, effectively, that one cannot defeat capitalism by chopping off its head. It is rather much more like a hydra: the “heads” are the creation of two-sided markets. These heads have no internalized sense of the public good. Rather, they are optimized to be attractive to the transnational corporations in bilateral negotiation. The plaintiffs and defendants in these cases are corporations and states–social forms and institutions of complexity far beyond that of any individual person. This is where, so to speak, the AI’s clash.

For a more ethical Silicon Valley, we need a wiser economics of data

Kara Swisher’s NYT op-ed about the dubious ethics of Silicon Valley and Nitasha Tiku’s WIRED article reviewing books with alternative (and perhaps more cynical than otherwise stated) stories about the rise of Silicon Valley has generated discussion and buzz among the tech commentariat.

One point of debate is whether the focus should be on “ethics” or on something more substantively defined, such as human rights. Another point is whether the emphasis should be on “ethics” or on something more substantively enforced, like laws which impose penalties between 1% and 4% of profits, referring of course to the GDPR.

While I’m sympathetic to the European approach (laws enforcing human rights with real teeth), I think there is something naive about it. We have not yet seen whether it’s ever really possible to comply with the GDPR could wind up being a kind of heavy tax on Big Tech companies operating in the EU, but one that doesn’t truly wind up changing how people’s data are used. In any case, the broad principles of European privacy are based on individual human dignity, and so they do not take into account the ways that corporations are social structures, i.e. sociotechnical organizations that transcend individual people. The European regulations address the problem of individual privacy while leaving mystified the question of why the current corporate organization of the world’s personal information is what it is. This sets up the fight over ‘technology ethics’ to be a political conflict between different kinds of actors whose positions are defined as much by their social habitus as by their intellectual reasons.

My own (unpopular!) view is that the solution to our problems of technology ethics are going to have to rely on a better adapted technology economics. We often forget today that economics was originally a branch of moral philosophy. Adam Smith wrote The Theory of Moral Sentiments (1759) before An Inquiry into the Nature and Causes of the Wealth of Nations (1776). Since then the main purpose of economics has been to intellectually grasp the major changes to society due to production, trade, markets, and so on in order to better steer policy and business strategy towards more fruitful equilibria. The discipline has a bad reputation among many “critical” scholars due to its role in supporting neoliberal ideology and policies, but it must be noted that this ideology and policy work is not entirely cynical; it was a successful centrist hegemony for some time. Now that it is under threat, partly due to the successes of the big tech companies that benefited under its regime, it’s worth considering what new lessons we have to learn to steer the economy in an improved direction.

The difference between an economic approach to the problems of the tech economy and either an ‘ethics’ or a ‘law’ based approach is that it inherently acknowledges that there are a wide variety of strategic actors co-creating social outcomes. Individual “ethics” will not be able to settle the outcomes of the economy because the outcomes depend on collective and uncoordinated actions. A fundamentally decent person may still do harm to others due to their own bounded rationality; “the road to hell is paved with good intentions”. Meanwhile, regulatory law is not the same as command; it is at best a way of setting the rules of a game that will be played, faithfully or not, by many others. Putting regulations in place without a good sense of how the game will play out differently because of them is just as irresponsible as implementing a sweeping business practice without thinking through the results, if not more so because the relationship between the state and citizens is coercive, not voluntary as the relationship between businesses and customers is.

Perhaps the biggest obstacle to shifting the debate about technology ethics to one about technology economics is that it requires a change in register. It drains the conversation of the pathos which is so instrumental in surfacing it as an important political topic. Sound analysis often ruins parties like this. Nevertheless, it must be done if we are to progress towards a more just solution to the crises technology gives us today.

The California Consumer Privacy Act of 2018: a deep dive

I have given the California Consumer Privacy Act of 2018 a close read.

In summary, the act grants consumers a right to request that businesses disclose the categories of information about them that it collects and sells, and gives consumers the right to businesses to delete their information and opt out of sale.

What follows are points I found particularly interesting. Quotations from the Act (that’s what I’ll call it) will be in bold. Questions (meaning, questions that I don’t have an answer to at the time of writing) will be in italics.

Privacy rights

SEC. 2. The Legislature finds and declares that:
(a) In 1972, California voters amended the California Constitution to include the right of privacy among the “inalienable” rights of all people. …

I did not know that. I was under the impression that in the United States, the ‘right to privacy’ was a matter of legal interpretation, derived from other more explicitly protected rights. A right to privacy is enumerated in Article 12 of the Universal Declaration of Human Rights, adopted in 1948 by the United Nations General Assembly. There’s something like a right to privacy in Article 8 of the 1950 European Convention on Human Rights. California appears to have followed their lead on this.

In several places in the Act, it specifies that exceptions may be made in order to be compliant with federal law. Is there an ideological or legal disconnect between privacy in California and privacy nationally? Consider the Snowden/Schrems/Privacy Shield issue: exchanges of European data to the United States are given protections from federal surveillance practices. This presumably means that the U.S. federal government agrees to respect EU privacy rights. Can California negotiate for such treatment from the U.S. government?

These are the rights specifically granted by the Act:

[SEC. 2.] (i) Therefore, it is the intent of the Legislature to further Californians’ right to privacy by giving consumers an effective way to control their personal information, by ensuring the following rights:

(1) The right of Californians to know what personal information is being collected about them.

(2) The right of Californians to know whether their personal information is sold or disclosed and to whom.

(3) The right of Californians to say no to the sale of personal information.

(4) The right of Californians to access their personal information.

(5) The right of Californians to equal service and price, even if they exercise their privacy rights.

It has been only recently that I’ve been attuned to the idea of privacy rights. Perhaps this is because I am from a place that apparently does not have them. A comparison that I believe should be made more often is the comparison of privacy rights to property rights. Clearly privacy rights have become as economically relevant as property rights. But currently, property rights enjoy a widespread acceptance and enforcement that privacy rights do not.

Personal information defined through example categories

“Information” is a notoriously difficult thing to define. The Act gets around the problem of defining “personal information” by repeatedly providing many examples of it. The examples are themselves rather abstract and are implicitly “categories” of personal information. Categorization of personal information is important to the law because under several conditions businesses must disclose the categories of personal information collected, sold, etc. to consumers.

SEC. 2. (e) Many businesses collect personal information from California consumers. They may know where a consumer lives and how many children a consumer has, how fast a consumer drives, a consumer’s personality, sleep habits, biometric and health information, financial information, precise geolocation information, and social networks, to name a few categories.

[1798.140.] (o) (1) “Personal information” means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following:

(A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers.

(B) Any categories of personal information described in subdivision (e) of Section 1798.80.

(C) Characteristics of protected classifications under California or federal law.

(D) Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.

Note that protected classifications (1798.140.(o)(1)(C)) includes race, which is socially constructed category (see Omi and Winant on racial formation). The Act appears to be saying that personal information includes the race of the consumer. Contrast this with information as identifiers (see 1798.140.(o)(1)(A)) and information as records (1798.140.(o)(1)(D)). So “personal information” in one case is the property of a person (and a socially constructed one at that); in another case it is the specific syntactic form; in another case it is a document representing some past action. The Act is very ontologically confused.

Other categories of personal information include (continuing this last section):


(E) Biometric information.

(F) Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an Internet Web site, application, or advertisement.

Devices and Internet activity will be discussed in more depth in the next section.


(G) Geolocation data.

(H) Audio, electronic, visual, thermal, olfactory, or similar information.

(I) Professional or employment-related information.

(J) Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. section 1232g, 34 C.F.R. Part 99).

(K) Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer’s preferences, characteristics, psychological trends, preferences, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.

Given that the main use of information is to support inferences, it is notable that inferences are dealt with here as a special category of information, and that sensitive inferences are those that pertain to behavior and psychology. This may be narrowly interpreted to exclude some kinds of inferences that may be relevant and valuable but not so immediately recognizable as ‘personal’. For example, one could infer from personal information the ‘position’ of a person in an arbitrary multi-dimensional space that compresses everything known about a consumer, and use this representation for targeted interventions (such as advertising). Or one could interpret it broadly: since almost all personal information is relevant to ‘behavior’ in a broad sense, and inference from it is also ‘about behavior’, and therefore protected.

Device behavior

The Act focuses on the rights of consumers and deals somewhat awkwardly with the fact that most information collected about consumers is done indirectly through machines. The Act acknowledges that sometimes devices are used by more than one person (for example, when they are used by a family), but it does not deal easily with other forms of sharing arrangements (i.e., an open Wifi hotspot) and the problems associated with identifying which person a particular device’s activity is “about”.

[1798.140.] (g) “Consumer” means a natural person who is a California resident, as defined in Section 17014 of Title 18 of the California Code of Regulations, as that section read on September 1, 2017, however identified, including by any unique identifier. [SB: italics mine.]

[1798.140.] (x) “Unique identifier” or “Unique personal identifier” means a persistent identifier that can be used to recognize a consumer, a family, or a device that is linked to a consumer or family, over time and across different services, including, but not limited to, a device identifier; an Internet Protocol address; cookies, beacons, pixel tags, mobile ad identifiers, or similar technology; customer number, unique pseudonym, or user alias; telephone numbers, or other forms of persistent or probabilistic identifiers that can be used to identify a particular consumer or device. For purposes of this subdivision, “family” means a custodial parent or guardian and any minor children over which the parent or guardian has custody.

Suppose you are a business that collects traffic information and website behavior connected to IP addresses, but you don’t go through the effort of identifying the ‘consumer’ who is doing the behavior. In fact, you may collect a lot of traffic behavior that is not connected to any particular ‘consumer’ at all, but is rather the activity of a bot or crawler operated by a business. Are you on the hook to disclose personal information to consumers if they ask for their traffic activity? If they do, or if they do not, provide their IP address?

Incidentally, while the Act seems comfortable defining a Consumer as a natural person identified by a machine address, it also happily defines a Person as “proprietorship, firm, partnership, joint venture, syndicate, business trust, company, corporation, …” etc. in addition to “an individual”. Note that “personal information” is specifically information about a consumer, not a Person (i.e., business).

This may make you wonder what a Business is, since these are the entities that are bound by the Act.

Businesses and California

The Act mainly details the rights that consumers have with respect to businesses that collect, sell, or lose their information. But what is a business?

[1798.140.] (c) “Business” means:
(1) A sole proprietorship, partnership, limited liability company, corporation, association, or other legal entity that is organized or operated for the profit or financial benefit of its shareholders or other owners, that collects consumers’ personal information, or on the behalf of which such information is collected and that alone, or jointly with others, determines the purposes and means of the processing of consumers’ personal information, that does business in the State of California, and that satisfies one or more of the following thresholds:

(A) Has annual gross revenues in excess of twenty-five million dollars ($25,000,000), as adjusted pursuant to paragraph (5) of subdivision (a) of Section 1798.185.

(B) Alone or in combination, annually buys, receives for the business’ commercial purposes, sells, or shares for commercial purposes, alone or in combination, the personal information of 50,000 or more consumers, households, or devices.

(C) Derives 50 percent or more of its annual revenues from selling consumers’ personal information.

This is not a generic definition of a business, just as the earlier definition of ‘consumer’ is not a generic definition of consumer. This definition of ‘business’ is a sui generis definition for the purposes of consumer privacy protection, as it defines businesses in terms of their collection and use of personal information. The definition explicitly thresholds the applicability of the law to businesses over certain limits.

There does appear to be a lot of wiggle room and potential for abuse here. Consider: the Mirai botnet had by one estimate 2.5 million devices compromised. Say you are a small business that collects site traffic. Suppose the Mirai botnet targets your site with a DDOS attack. Suddenly, your business collects information of millions of devices, and the Act comes into effect. Now you are liable for disclosing consumer information. Is that right?

An alternative reading of this section would recall that the definition (!) of consumer, in this law, is a California resident. So maybe the thresholds in 1798.140.(c)(B) and 1798.140.(c)(C) refer specifically to Californian consumers. Of course, for any particular device, information about where that device’s owner lives is personal information.

Having 50,000 California customers or users is a decent threshold for defining whether or not a business “does business in California”. Given the size and demographics of California, you would expect that many of the, just for example, major Chinese technology companies like Tencent to have 50,000 Californian users. This brings up the question of extraterritorial enforcement, which gave the GDPR so much leverage.

Extraterritoriality and financing

In a nutshell, it looks like the Act is intended to allow Californians to sue foreign companies. How big a deal is this? The penalties for noncompliance are civil penalties and a price per violation (presumably individual violation), not a ratio of profit, but you could imagine them adding up:

[1798.155.] (b) Notwithstanding Section 17206 of the Business and Professions Code, any person, business, or service provider that intentionally violates this title may be liable for a civil penalty of up to seven thousand five hundred dollars ($7,500) for each violation.

(c) Notwithstanding Section 17206 of the Business and Professions Code, any civil penalty assessed pursuant to Section 17206 for a violation of this title, and the proceeds of any settlement of an action brought pursuant to subdivision (a), shall be allocated as follows:

(1) Twenty percent to the Consumer Privacy Fund, created within the General Fund pursuant to subdivision (a) of Section 1798.109, with the intent to fully offset any costs incurred by the state courts and the Attorney General in connection with this title.

(2) Eighty percent to the jurisdiction on whose behalf the action leading to the civil penalty was brought.

(d) It is the intent of the Legislature that the percentages specified in subdivision (c) be adjusted as necessary to ensure that any civil penalties assessed for a violation of this title fully offset any costs incurred by the state courts and the Attorney General in connection with this title, including a sufficient amount to cover any deficit from a prior fiscal year.

1798.160. (a) A special fund to be known as the “Consumer Privacy Fund” is hereby created within the General Fund in the State Treasury, and is available upon appropriation by the Legislature to offset any costs incurred by the state courts in connection with actions brought to enforce this title and any costs incurred by the Attorney General in carrying out the Attorney General’s duties under this title.

(b) Funds transferred to the Consumer Privacy Fund shall be used exclusively to offset any costs incurred by the state courts and the Attorney General in connection with this title. These funds shall not be subject to appropriation or transfer by the Legislature for any other purpose, unless the Director of Finance determines that the funds are in excess of the funding needed to fully offset the costs incurred by the state courts and the Attorney General in connection with this title, in which case the Legislature may appropriate excess funds for other purposes.

So, just to be concrete: suppose a business collects personal information on 50,000 Californians and does not disclose that information. California could then sue that business for $7,500 * 50,000 = $375 million in civil penalties, that then goes into the Consumer Privacy Fund, whose purpose is to cover the cost of further lawsuits. The process funds itself. If it makes any extra money, it can be appropriated for other things.

Meaning, I guess this Act basically sustains a very sustained bunch of investigations and fines. You could imagine that this starts out with just some lawyers responding to civil complaints. But consider the scope of the Act, and how it means that any business in the world not properly disclosing information about Californians is liable to be fined. Suppose that some kind of blockchain or botnet based entity starts committing surveillance in violation of this act on a large scale. What kinds of technical investigative capacity is necessary to enforce this kind of thing worldwide? Does this become a self-funding cybercrime investigative unit? How are foreign actors who are responsible for such things brought to justice?

This is where it’s totally clear that I am not a lawyer. I am still puzzling over the meaning of [1798.155.(c)(2), for example.

“Publicly available information”

There are more weird quirks to this Act than I can dig into in this post, but one that deserves mention (as homage to Helen Nissenbaum, among other reasons) is the stipulation about publicly available information, which does not mean what you think it means:

(2) “Personal information” does not include publicly available information. For these purposes, “publicly available” means information that is lawfully made available from federal, state, or local government records, if any conditions associated with such information. “Publicly available” does not mean biometric information collected by a business about a consumer without the consumer’s knowledge. Information is not “publicly available” if that data is used for a purpose that is not compatible with the purpose for which the data is maintained and made available in the government records or for which it is publicly maintained. “Publicly available” does not include consumer information that is deidentified or aggregate consumer information.

The grammatical error in the second sentence (the phrase beginning with “if any conditions” trails off into nowhere…) indicates that this paragraph was hastily written and never finished, as if in response to an afterthought. There’s a lot going on here.

First, the sense of ‘public’ used here is the sense of ‘public institutions’ or the res publica. Amazingly and a bit implausibly, government records are considered publicly available only when they are used for purposes compatible with their maintenance. So if a business takes a public record and uses it differently that it was originally intended when it was ‘made available’, it becomes personal information that must be disclosed? As somebody who came out of the Open Data movement, I have to admit I find this baffling. On the other hand, it may be the brilliant solution to privacy in public on the Internet that society has been looking for.

Second, the stipulation that “publicly available” does not mean biometric information collected by a business about a consumer without the consumer’s knowledge” is surprising. It appears to be written with particular cases in mind–perhaps IoT sensing. But why specifically biometric information, as opposed to other kinds of information collected without consumer knowledge?

There is a lot going on in this paragraph. Oddly, it is not one of the ones explicitly flagged for review and revision in the section of soliciting public participation on changes before the Act goes into effect on 2020.

A work in progress

1798.185. (a) On or before January 1, 2020, the Attorney General shall solicit broad public participation to adopt regulations to further the purposes of this title, including, but not limited to, the following areas:

This is a weird law. I suppose it was written and passed to capitalize on a particular political moment and crisis (Sec. 2 specifically mentions Cambridge Analytica as a motivation), drafted to best express its purpose and intent, and given the horizon of 2020 to allow for revisions.

It must be said that there’s nothing in this Act that threatens the business models of any American Big Tech companies in any way, since storing consumer information in order to provide derivative ad targeting services is totally fine as long as businesses do the right disclosures, which they are now all doing because of GDPR anyway. There is a sense that this is California taking the opportunity to start the conversation about what U.S. data protection law post-GDPR will be like, which is of course commendable. As a statement of intent, it is great. Where it starts to get funky is in the definitions of its key terms and the underlying theory of privacy behind them. We can anticipate some rockiness there and try to unpack these assumptions before adopting similar policies in other states.

“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics” <– My dissertation

In the last two weeks, I’ve completed, presented, and filed my dissertation, and commenced as a doctor of philosophy. In a word, I’ve PhinisheD!

The title of my dissertation is attention-grabbing, inviting, provocative, and impressive:

“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics”

If you’re reading this, you are probably wondering, “How can I drop everything and start reading that hot dissertation right now?”

Look no further: here is a link to the PDF.

You can also check out this slide deck from my “defense”. It covers the highlights.

I’ll be blogging about this material as I break it out into more digestible forms over time. For now, I’m obviously honored by any interest anybody takes in this work and happy to answer questions about it.

Robert Post on Data vs. Dignitary Privacy

I was able to see Robert Post present his article, “Data Privacy and Dignitary Privacy: Google Spain, the Right to Be Forgotten, and the Construction of the Public Sphere”, today. My other encounter with Post’s work was quite positive, and I was very happy to learn more about his thinking at this talk.

Post’s argument was based off of the facts of the Google Spain SL v. Agencia Española de Protección de Datos (“Google Spain”) case in the EU, which set off a lot of discussion about the right to be forgotten.

I’m not trained as a lawyer, and will leave the legal analysis to the verbatim text. There were some broader philosophical themes that resonate with topics I’ve discussed on this blog andt in my other research. These I wanted to note.

If I follow Post’s argument correctly, it is something like this:

  • According to EU Directive 95/46/EC, there are two kinds of privacy. Data privacy rules over personal data, establishing control and limitations on use of it. The emphasis is on the data itself, which is property reasoned about analogously to. Dignitary privacy is about maintaining appropriate communications between people and restricting those communications that may degrade, humiliate, or mortify them.
  • EU rules about data privacy are governed by rules specifying the purpose for which data is used, thereby implying that the use of this data must be governed by instrumental reason.
  • But there’s the public sphere, which must not be governed by instrumental reason, for Habermasian reasons. The public sphere is, by definition, the domain of communicative action, where actions must be taken with the ambiguous purpose of open dialogue. That is why free expression is constitutionally protected!
  • Data privacy, formulated as an expression of instrumental reason, is incompatible with the free expression of the public sphere.
  • The Google Spain case used data privacy rules to justify the right to be forgotten, and in this it developed an unconvincing and sloppy precedent.
  • Dignitary privacy is in tension with free expression, but not incompatible with it. This is because it is based not on instrumental reason, but rather on norms of communication (which are contextual)
  • Future right to be forgotten decisions should be made on the basis of dignitary privac. This will result in more cogent decisions.

I found Post’s argument very appealing. I have a few notes.

First, I had never made the connection between what Hildebrandt (2013, 2014) calls “purpose binding” in EU data protection regulation and instrumental reason, but there it is. There is a sense in which these purpose clauses are about optimizing something that is externally and specifically defined before the privacy judgment is made (cf. Tschantz, Datta, and Wing, 2012, for a formalization).

This approach seems generally in line with the view of a government as a bureaucracy primarily involved in maintaining control over a territory or population. I don’t mean this in a bad way, but in a literal way of considering control as feedback into a system that steers it to some end. I’ve discussed the pervasive theme of ‘instrumentality run amok’ in questions of AI superintelligence here. It’s a Frankfurt School trope that appears to have made its way in a subtle way into Post’s argument.

The public sphere is not, in Habermasian theory, supposed to be dictated by instrumental reason, but rather by communicative rationality. This has implications for the technical design of networked publics that I’ve scratched the surface of in this paper. By pointing to the tension between instrumental/purpose/control based data protection and the free expression of the public sphere, I believe Post is getting at a deep point about how we can’t have the public sphere be too controlled lest we lose the democratic property of self-governance. It’s a serious argument that probably should be addressed by those who would like to strengthen rights to be forgotten. A similar argument might be made for other contexts whose purposes seem to transcend circumscription, such as science.

Post’s point is not, I believe, to weaken these rights to be forgotten, but rather to put the arguments for them on firmer footing: dignitary privacy, or the norms of communication and the awareness of the costs of violating them. Indeed, the facts behind right to be forgotten cases I’ve heard of (there aren’t many) all seem to fall under these kinds of concerns (humiliation, etc.).

What’s very interesting to me is that the idea of dignitary privacy as consisting of appropriate communication according to contextually specific norms feels very close to Helen Nissenbaum’s theory of Contextual Integrity (2009), with which I’ve become very familiar in past year through my work with Prof. Nissenbaum. Contextual integrity posits that privacy is about adherence to norms of appropriate information flow. Is there a difference between information flow and communication? Isn’t Shannon’s information theory a “mathematical theory of communication”?

The question of whether and under what conditions information flow is communication and/or data are quite deep, actually. More on that later.

For now though it must be noted that there’s a tension, perhaps a dialectical one, between purposes and norms. For Habermas, the public sphere needs to be a space of communicative action, as opposed to instrumental reason. This is because communicative action is how norms are created: through the agreement of people who bracket their individual interests to discuss collective reasons.

Nissenbaum also has a theory of norm formation, but it does not depend so tightly on the rejection of instrumental reason. In fact, it accepts the interests of stakeholders as among several factors that go into the determination of norms. Other factors include societal values, contextual purposes, and the differentiated roles associated with the context. Because contexts, for Nissenbaum, are defined in part by their purposes, this has led Hildebrandt (2013) to make direct comparisons between purpose binding and Contextual Integrity. They are similar, she concludes, but not the same.

It would be easy to say that the public sphere is a context in Nissenbaum’s sense, with a purpose, which is the formation of public opinion (which seems to be Post’s position). Properly speaking, social purposes may be broad or narrow, and specially defined social purposes may be self-referential (why not?), and indeed these self-referential social purposes may be the core of society’s “self-consciousness”. Why shouldn’t there be laws to ensure the freedom of expression within a certain context for the purpose of cultivating the kinds of public opinions that would legitimize laws and cause them to adapt democratically? We could possibly make these frameworks more precise if we could make them a little more formal and could lose some of the baggage; that would be useful theory building in line with Nissenbaum and Post’s broader agendas.

A test of this perhaps more nuanced but still teleological (indeed, instrumental, but maybe actually more properly speaking pragmatic (a la Dewey), in that it can blend several different metaethical categories) is to see if one can motivate a right to be forgotten in a public sphere by appealing to the need for communicative action, thereby especially appropriate communication norms around it, and dignitary privacy.

This doesn’t seem like it should be hard to do at all.

References

Hildebrandt, Mireille. “Slaves to big data. Or are we?.” (2013).

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer International Publishing, 2014. 31-62.

Nissenbaum, Helen. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2009.

Post, Robert, Data Privacy and Dignitary Privacy: Google Spain, the Right to Be Forgotten, and the Construction of the Public Sphere (April 15, 2017). Duke Law Journal, Forthcoming; Yale Law School, Public Law Research Paper No. 598. Available at SSRN: https://ssrn.com/abstract=2953468 or http://dx.doi.org/10.2139/ssrn.2953468

Tschantz, Michael Carl, Anupam Datta, and Jeannette M. Wing. “Formalizing and enforcing purpose restrictions in privacy policies.” Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 2012.

Notes on Posner’s “The Economics of Privacy” (1981)

Lately my academic research focus has been privacy engineering, the designing of information processing systems that preserve privacy of their users. I have been looking the problem particularly through the lens of Contextual Integrity, a theory of privacy developed by Helen Nissenbaum (2004, 2009). According to this theory, privacy is defined as appropriate information flow, where “appropriateness” is determined relative to social spheres (such as health, education, finance, etc.) that have evolved norms based on their purpose in society.

To my knowledge most existing scholarship on Contextual Integrity is comprised by applications of a heuristic process associated with Contextual Integrity that evaluates the privacy impact of new technology. In this process, one starts by identifying a social sphere (or context, but I will use the term social sphere as I think it’s less ambiguous) and its normative structure. For example, if one is evaluating the role of a new kind of education technology, one would identify the roles of the education sphere (teachers, students, guardians of students, administrators, etc.), the norms of information flow that hold in the sphere, and the disruptions to these norms the technology is likely to cause.

I’m coming at this from a slightly different direction. I have a background in enterprise software development, data science, and social theory. My concern is with the ways that technology is now part of the way social spheres are constituted. For technology to not just address existing norms but deal adequately with how it self-referentially changes how new norms develop, we need to focus on the parts of Contextual Integrity that have heretofore been in the background: the rich social and metaethical theory of how social spheres and their normative implications form.

Because the ultimate goal is the engineering of information systems, I am leaning towards mathematical modeling methods that trade well between social scientific inquiry and technical design. Mechanism design, in particular, is a powerful framework from mathematical economics that looks at how different kinds of structures change the outcomes for actors participating in “games” that involve strategy action and information flow. While mathematical economic modeling has been heavily critiqued over the years, for example on the basis that people do not act with the unbounded rationality such models can imply, these models can be a first step and valuable in a technical context especially as they establish the limits of a system’s manipulability by non-human actors such as AI. This latter standard makes this sort of model more relevant than it has ever been.

This is my roundabout way of beginning to investigate the fascinating field of privacy economics. I am a new entrant. So I found what looks like one of the earliest highly cited articles on the subject written by the prolific and venerable Richard Posner, “The Economics of Privacy”, from 1981.

Richard Posner, from Wikipedia

Wikipedia reminds me that Posner is politically conservative, though apparently he has changed his mind recently in support of gay marriage and, since the 2008 financial crisis, the laissez faire rational choice economic model that underlies his legal theory. As I have mainly learned about privacy scholarship from more left-wing sources, it was interesting reading an article that comes from a different perspective.

Posner’s opening position is that the most economically interesting aspect of privacy is the concealment of personal information, and that this is interesting mainly because privacy is bad for market efficiency. He raises examples of employers and employees searching for each other and potential spouses searching for each other. In these cases, “efficient sorting” is facilitated by perfect information on all sides. Privacy is foremost a way of hiding disqualifying information–such as criminal records–from potential business associates and spouses, leading to a market inefficiency. I do not know why Posner does not cite Akerlof (1970) on the “market for ‘lemons'” in this article, but it seems to me that this is the economic theory most reflective of this economic argument. The essential question raised by this line of argument is whether there’s any compelling reason why the market for employees should be any different from the market for used cars.

Posner raises and dismisses each objective he can find. One objection is that employers might heavily weight factors they should not, such as mental illness, gender, or homosexuality. He claims that there’s evidence to show that people are generally rational about these things and there’s no reason to think the market can’t make these decisions efficiently despite fear of bias. I assume this point has been hotly contested from the left since the article was written.

Posner then looks at the objection that privacy provides a kind of social insurance to those with “adverse personal characteristics” who would otherwise not be hired. He doesn’t like this argument because he sees it as allocating the costs of that person’s adverse qualities to a small group that has to work with that person, rather than spreading the cost very widely across society.

Whatever one thinks about whose interests Posner seems to side with and why, it is refreshing to read an article that at the very least establishes the trade offs around privacy somewhat clearly. Yes, discrimination of many kinds is economically inefficient. We can expect the best performing companies to have progressive hiring policies because that would allow them to find the best talent. That’s especially true if there are large social biases otherwise unfairly skewing hiring.

On the other hand, the whole idea of “efficient sorting” assumes a policy-making interest that I’m pretty sure logically cannot serve the interests of everyone so sorted. It implies a somewhat brutally Darwinist stratification of personnel. It’s quite possible that this is not healthy for an economy in the long term. On the other hand, in this article Posner seems open to other redistributive measures that would compensate for opportunities lost due to revelation of personal information.

There’s an empirical part of the paper in which Posner shows that percentage of black and Hispanic populations in a state are significantly correlated with existence of state level privacy statutes relating to credit, arrest, and employment history. He tries to spin this as an explanation for privacy statutes as the result of strongly organized black and Hispanic political organizations successfully continuing to lobby in their interest on top of existing anti-discrimination laws. I would say that the article does not provide enough evidence to strongly support this causal theory. It would be a stronger argument if the regression had taken into account the racial differences in credit, arrest, and employment state by state, rather than just assuming that this connection is so strong it supports this particular interpretation of the data. However, it is interesting that this variable ways more strongly correlated with the existence of privacy statutes than several other variables of interest. It was probably my own ignorance that made me not consider how strongly privacy statutes are part of a social justice agenda, broadly speaking. Considering that disparities in credit, arrest, and employment history could well be the result of other unjust biases, privacy winds up mitigating the anti-signal that these injustices have in the employment market. In other words, it’s not hard to get from Posner’s arguments to a pro-privacy position based of all things on market efficiency.

It would be nice to model that more explicitly, if it hasn’t been done yet already.

Posner is quite bullish on privacy tort, thinking that it is generally not so offensive from an economic perspective largely because it’s about preventing misinformation.

Overall, the paper is a valuable starting point for further study in economics of privacy. Posner’s economic lens swiftly and clearly puts the trade-offs around privacy statutes in the light. It’s impressively lucid work that surely bears directly on arguments about privacy and information processing systems today.

References

Akerlof, G. A. (1970). The market for” lemons”: Quality uncertainty and the market mechanism. The quarterly journal of economics, 488-500.

Nissenbaum, H. (2004). Privacy as contextual integrity. Wash. L. Rev., 79, 119.

Nissenbaum, H. (2009). Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press.

Posner, R. A. (1981). The economics of privacy. The American economic review, 71(2), 405-409. (jstor)