Digifesto

Tag: privacy

search engines and authoritarian threats

I’ve been intrigued by Daniel Griffin’s tweets lately, which have been about situating some upcoming work of his an Deirdre Mulligan’s regarding the experience of using search engines. There is a lively discussion lately about the experience of those searching for information and the way they respond to misinformation or extremism that they discover through organic use of search engines and media recommendation systems. This is apparently how the concern around “fake news” has developed in the HCI and STS world since it became an issue shortly after the 2016 election.

I do not have much to add to this discussion directly. Consumer misuse of search engines is, to me, analogous to consumer misuse of other forms of print media. I would assume to best solution to it is education in the complete sense, and the problems with the U.S. education system are, despite all good intentions, not HCI problems.

Wearing my privacy researcher hat, however, I have become interested in a different aspect of search engines and the politics around them that is less obvious to the consumer and therefore less popularly discussed, but I fear is more pernicious precisely because it is not part of the general imaginary around search. This is the aspect that is around the tracking of search engine activity, and what it means for this activity to be in the hands of not just such benevolent organizations such as Google, but also such malevolent organizations such as Bizarro World Google*.

Here is the scenario, so to speak: for whatever reason, we begin to see ourselves in a more adversarial relationship with search engines. I mean “search engine” here in the broad sense, including Siri, Alexa, Google News, YouTube, Bing, Baidu, Yandex, and all the more minor search engines embedded in web services and appliances that do something more focused than crawl the whole web. By ‘search engine’ I mean entire UX paradigm of the query into the vast unknown of semantic and semiotic space that contemporary information access depends on. In all these cases, the user is at a systematic disadvantage in the sense that their query is a data point amount many others. The task of the search engine is to predict the desired response to the query and provide it. In return, the search engine gets the query, tied to the identity of the user. That is one piece of a larger mosaic; to be a search engine is to have a picture of a population and their interests and the mandate to categorize and understand those people.

In Western neoliberal political systems the central function of the search engine is realized as commercial transaction facilitating other commercial transactions. My “search” is a consumer service; I “pay” for this search by giving my query to the adjoined advertising function, which allows other commercial providers to “search” for me, indirectly, through the ad auction platform. It is a market with more than just two sides. There’s the consumer who wants information and may be tempted by other information. There are the primary content providers, who satisfy consumer content demand directly. And there are secondary content providers who want to intrude on consumer attention in a systematic and successful way. The commercial, ad-enabled search engine reduces transaction costs for the consumer’s search and sells a fraction of that attentional surplus to the advertisers. Striking the right balance, the consumer is happy enough with the trade.

Part of the success of commercial search engines is the promise of privacy in the sense that the consumer’s queries are entrusted secretly with the engine, and this data is not leaked or sold. Wise people know not to write into email things that they would not want in the worst case exposed to the public. Unwise people are more common than wise people, and ill-considered emails are written all the time. Most unwise people do not come to harm because of this because privacy in email is a de facto standard; it is the very security of email that makes the possibility of its being leaked alarming.

So to with search engine queries. “Ask me anything,” suggests the search engine, “I won’t tell”. “Well, I will reveal your data in an aggregate way; I’ll expose you to selective advertising. But I’m a trusted intermediary. You won’t come to any harms besides exposure to a few ads.”

That is all a safe assumption until it isn’t, at which point we must reconsider the role of the search engine. Suppose that, instead of living in a neoliberal democracy where the free search for information was sanctioned as necessary for the operation of a free market, we lived in an authoritarian country organized around the principle that disloyalty to the state should be crushed.

Under these conditions, the transition of a society into one that depends for its access to information on search engines is quite troubling. The act of looking for information is a political signal. Suppose you are looking for information about an extremist, subversive ideology. To do so is to flag yourself as a potential threat of the state. Suppose that you are looking for information about a morally dubious activity. To do so is to make yourself vulnerable to kompromat.

Under an authoritarian regime, curiosity and free thought are a problem, and a problem that are readily identified by ones search queries. Further, an authoritarian regime benefits if the risks of searching for the ‘wrong’ thing are widely known, since it suppresses inquiry. Hence, the very vaguely announced and, in fact, implausible to implement Social Credit System in China does not need to exist to be effective; people need only believe it exists for it to have a chilling and organizing effect on behavior. That is the lesson of the Foucouldean panopticon: it doesn’t need a guard sitting in it to function.

Do we have a word for this function of search engines in an authoritarian system? We haven’t needed one in our liberal democracy, which perhaps we take for granted. “Censorship” does not apply, because what’s at stake is not speech but the ability to listen and learn. “Surveillance” is too general. It doesn’t capture the specific constraints on acquiring information, on being curious. What is the right term for this threat? What is the term for the corresponding liberty?

I’ll conclude with a chilling thought: when at war, all states are authoritarian, to somebody. Every state has an extremist, subversive ideology that it watches out for and tries in one way or another to suppress. Our search queries are always of strategic or tactical interest to somebody. Search engine policies are always an issue of national security, in one way or another.

The California Consumer Privacy Act of 2018: a deep dive

I have given the California Consumer Privacy Act of 2018 a close read.

In summary, the act grants consumers a right to request that businesses disclose the categories of information about them that it collects and sells, and gives consumers the right to businesses to delete their information and opt out of sale.

What follows are points I found particularly interesting. Quotations from the Act (that’s what I’ll call it) will be in bold. Questions (meaning, questions that I don’t have an answer to at the time of writing) will be in italics.

Privacy rights

SEC. 2. The Legislature finds and declares that:
(a) In 1972, California voters amended the California Constitution to include the right of privacy among the “inalienable” rights of all people. …

I did not know that. I was under the impression that in the United States, the ‘right to privacy’ was a matter of legal interpretation, derived from other more explicitly protected rights. A right to privacy is enumerated in Article 12 of the Universal Declaration of Human Rights, adopted in 1948 by the United Nations General Assembly. There’s something like a right to privacy in Article 8 of the 1950 European Convention on Human Rights. California appears to have followed their lead on this.

In several places in the Act, it specifies that exceptions may be made in order to be compliant with federal law. Is there an ideological or legal disconnect between privacy in California and privacy nationally? Consider the Snowden/Schrems/Privacy Shield issue: exchanges of European data to the United States are given protections from federal surveillance practices. This presumably means that the U.S. federal government agrees to respect EU privacy rights. Can California negotiate for such treatment from the U.S. government?

These are the rights specifically granted by the Act:

[SEC. 2.] (i) Therefore, it is the intent of the Legislature to further Californians’ right to privacy by giving consumers an effective way to control their personal information, by ensuring the following rights:

(1) The right of Californians to know what personal information is being collected about them.

(2) The right of Californians to know whether their personal information is sold or disclosed and to whom.

(3) The right of Californians to say no to the sale of personal information.

(4) The right of Californians to access their personal information.

(5) The right of Californians to equal service and price, even if they exercise their privacy rights.

It has been only recently that I’ve been attuned to the idea of privacy rights. Perhaps this is because I am from a place that apparently does not have them. A comparison that I believe should be made more often is the comparison of privacy rights to property rights. Clearly privacy rights have become as economically relevant as property rights. But currently, property rights enjoy a widespread acceptance and enforcement that privacy rights do not.

Personal information defined through example categories

“Information” is a notoriously difficult thing to define. The Act gets around the problem of defining “personal information” by repeatedly providing many examples of it. The examples are themselves rather abstract and are implicitly “categories” of personal information. Categorization of personal information is important to the law because under several conditions businesses must disclose the categories of personal information collected, sold, etc. to consumers.

SEC. 2. (e) Many businesses collect personal information from California consumers. They may know where a consumer lives and how many children a consumer has, how fast a consumer drives, a consumer’s personality, sleep habits, biometric and health information, financial information, precise geolocation information, and social networks, to name a few categories.

[1798.140.] (o) (1) “Personal information” means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. Personal information includes, but is not limited to, the following:

(A) Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier Internet Protocol address, email address, account name, social security number, driver’s license number, passport number, or other similar identifiers.

(B) Any categories of personal information described in subdivision (e) of Section 1798.80.

(C) Characteristics of protected classifications under California or federal law.

(D) Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.

Note that protected classifications (1798.140.(o)(1)(C)) includes race, which is socially constructed category (see Omi and Winant on racial formation). The Act appears to be saying that personal information includes the race of the consumer. Contrast this with information as identifiers (see 1798.140.(o)(1)(A)) and information as records (1798.140.(o)(1)(D)). So “personal information” in one case is the property of a person (and a socially constructed one at that); in another case it is the specific syntactic form; in another case it is a document representing some past action. The Act is very ontologically confused.

Other categories of personal information include (continuing this last section):


(E) Biometric information.

(F) Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an Internet Web site, application, or advertisement.

Devices and Internet activity will be discussed in more depth in the next section.


(G) Geolocation data.

(H) Audio, electronic, visual, thermal, olfactory, or similar information.

(I) Professional or employment-related information.

(J) Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. section 1232g, 34 C.F.R. Part 99).

(K) Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer’s preferences, characteristics, psychological trends, preferences, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.

Given that the main use of information is to support inferences, it is notable that inferences are dealt with here as a special category of information, and that sensitive inferences are those that pertain to behavior and psychology. This may be narrowly interpreted to exclude some kinds of inferences that may be relevant and valuable but not so immediately recognizable as ‘personal’. For example, one could infer from personal information the ‘position’ of a person in an arbitrary multi-dimensional space that compresses everything known about a consumer, and use this representation for targeted interventions (such as advertising). Or one could interpret it broadly: since almost all personal information is relevant to ‘behavior’ in a broad sense, and inference from it is also ‘about behavior’, and therefore protected.

Device behavior

The Act focuses on the rights of consumers and deals somewhat awkwardly with the fact that most information collected about consumers is done indirectly through machines. The Act acknowledges that sometimes devices are used by more than one person (for example, when they are used by a family), but it does not deal easily with other forms of sharing arrangements (i.e., an open Wifi hotspot) and the problems associated with identifying which person a particular device’s activity is “about”.

[1798.140.] (g) “Consumer” means a natural person who is a California resident, as defined in Section 17014 of Title 18 of the California Code of Regulations, as that section read on September 1, 2017, however identified, including by any unique identifier. [SB: italics mine.]

[1798.140.] (x) “Unique identifier” or “Unique personal identifier” means a persistent identifier that can be used to recognize a consumer, a family, or a device that is linked to a consumer or family, over time and across different services, including, but not limited to, a device identifier; an Internet Protocol address; cookies, beacons, pixel tags, mobile ad identifiers, or similar technology; customer number, unique pseudonym, or user alias; telephone numbers, or other forms of persistent or probabilistic identifiers that can be used to identify a particular consumer or device. For purposes of this subdivision, “family” means a custodial parent or guardian and any minor children over which the parent or guardian has custody.

Suppose you are a business that collects traffic information and website behavior connected to IP addresses, but you don’t go through the effort of identifying the ‘consumer’ who is doing the behavior. In fact, you may collect a lot of traffic behavior that is not connected to any particular ‘consumer’ at all, but is rather the activity of a bot or crawler operated by a business. Are you on the hook to disclose personal information to consumers if they ask for their traffic activity? If they do, or if they do not, provide their IP address?

Incidentally, while the Act seems comfortable defining a Consumer as a natural person identified by a machine address, it also happily defines a Person as “proprietorship, firm, partnership, joint venture, syndicate, business trust, company, corporation, …” etc. in addition to “an individual”. Note that “personal information” is specifically information about a consumer, not a Person (i.e., business).

This may make you wonder what a Business is, since these are the entities that are bound by the Act.

Businesses and California

The Act mainly details the rights that consumers have with respect to businesses that collect, sell, or lose their information. But what is a business?

[1798.140.] (c) “Business” means:
(1) A sole proprietorship, partnership, limited liability company, corporation, association, or other legal entity that is organized or operated for the profit or financial benefit of its shareholders or other owners, that collects consumers’ personal information, or on the behalf of which such information is collected and that alone, or jointly with others, determines the purposes and means of the processing of consumers’ personal information, that does business in the State of California, and that satisfies one or more of the following thresholds:

(A) Has annual gross revenues in excess of twenty-five million dollars ($25,000,000), as adjusted pursuant to paragraph (5) of subdivision (a) of Section 1798.185.

(B) Alone or in combination, annually buys, receives for the business’ commercial purposes, sells, or shares for commercial purposes, alone or in combination, the personal information of 50,000 or more consumers, households, or devices.

(C) Derives 50 percent or more of its annual revenues from selling consumers’ personal information.

This is not a generic definition of a business, just as the earlier definition of ‘consumer’ is not a generic definition of consumer. This definition of ‘business’ is a sui generis definition for the purposes of consumer privacy protection, as it defines businesses in terms of their collection and use of personal information. The definition explicitly thresholds the applicability of the law to businesses over certain limits.

There does appear to be a lot of wiggle room and potential for abuse here. Consider: the Mirai botnet had by one estimate 2.5 million devices compromised. Say you are a small business that collects site traffic. Suppose the Mirai botnet targets your site with a DDOS attack. Suddenly, your business collects information of millions of devices, and the Act comes into effect. Now you are liable for disclosing consumer information. Is that right?

An alternative reading of this section would recall that the definition (!) of consumer, in this law, is a California resident. So maybe the thresholds in 1798.140.(c)(B) and 1798.140.(c)(C) refer specifically to Californian consumers. Of course, for any particular device, information about where that device’s owner lives is personal information.

Having 50,000 California customers or users is a decent threshold for defining whether or not a business “does business in California”. Given the size and demographics of California, you would expect that many of the, just for example, major Chinese technology companies like Tencent to have 50,000 Californian users. This brings up the question of extraterritorial enforcement, which gave the GDPR so much leverage.

Extraterritoriality and financing

In a nutshell, it looks like the Act is intended to allow Californians to sue foreign companies. How big a deal is this? The penalties for noncompliance are civil penalties and a price per violation (presumably individual violation), not a ratio of profit, but you could imagine them adding up:

[1798.155.] (b) Notwithstanding Section 17206 of the Business and Professions Code, any person, business, or service provider that intentionally violates this title may be liable for a civil penalty of up to seven thousand five hundred dollars ($7,500) for each violation.

(c) Notwithstanding Section 17206 of the Business and Professions Code, any civil penalty assessed pursuant to Section 17206 for a violation of this title, and the proceeds of any settlement of an action brought pursuant to subdivision (a), shall be allocated as follows:

(1) Twenty percent to the Consumer Privacy Fund, created within the General Fund pursuant to subdivision (a) of Section 1798.109, with the intent to fully offset any costs incurred by the state courts and the Attorney General in connection with this title.

(2) Eighty percent to the jurisdiction on whose behalf the action leading to the civil penalty was brought.

(d) It is the intent of the Legislature that the percentages specified in subdivision (c) be adjusted as necessary to ensure that any civil penalties assessed for a violation of this title fully offset any costs incurred by the state courts and the Attorney General in connection with this title, including a sufficient amount to cover any deficit from a prior fiscal year.

1798.160. (a) A special fund to be known as the “Consumer Privacy Fund” is hereby created within the General Fund in the State Treasury, and is available upon appropriation by the Legislature to offset any costs incurred by the state courts in connection with actions brought to enforce this title and any costs incurred by the Attorney General in carrying out the Attorney General’s duties under this title.

(b) Funds transferred to the Consumer Privacy Fund shall be used exclusively to offset any costs incurred by the state courts and the Attorney General in connection with this title. These funds shall not be subject to appropriation or transfer by the Legislature for any other purpose, unless the Director of Finance determines that the funds are in excess of the funding needed to fully offset the costs incurred by the state courts and the Attorney General in connection with this title, in which case the Legislature may appropriate excess funds for other purposes.

So, just to be concrete: suppose a business collects personal information on 50,000 Californians and does not disclose that information. California could then sue that business for $7,500 * 50,000 = $375 million in civil penalties, that then goes into the Consumer Privacy Fund, whose purpose is to cover the cost of further lawsuits. The process funds itself. If it makes any extra money, it can be appropriated for other things.

Meaning, I guess this Act basically sustains a very sustained bunch of investigations and fines. You could imagine that this starts out with just some lawyers responding to civil complaints. But consider the scope of the Act, and how it means that any business in the world not properly disclosing information about Californians is liable to be fined. Suppose that some kind of blockchain or botnet based entity starts committing surveillance in violation of this act on a large scale. What kinds of technical investigative capacity is necessary to enforce this kind of thing worldwide? Does this become a self-funding cybercrime investigative unit? How are foreign actors who are responsible for such things brought to justice?

This is where it’s totally clear that I am not a lawyer. I am still puzzling over the meaning of [1798.155.(c)(2), for example.

“Publicly available information”

There are more weird quirks to this Act than I can dig into in this post, but one that deserves mention (as homage to Helen Nissenbaum, among other reasons) is the stipulation about publicly available information, which does not mean what you think it means:

(2) “Personal information” does not include publicly available information. For these purposes, “publicly available” means information that is lawfully made available from federal, state, or local government records, if any conditions associated with such information. “Publicly available” does not mean biometric information collected by a business about a consumer without the consumer’s knowledge. Information is not “publicly available” if that data is used for a purpose that is not compatible with the purpose for which the data is maintained and made available in the government records or for which it is publicly maintained. “Publicly available” does not include consumer information that is deidentified or aggregate consumer information.

The grammatical error in the second sentence (the phrase beginning with “if any conditions” trails off into nowhere…) indicates that this paragraph was hastily written and never finished, as if in response to an afterthought. There’s a lot going on here.

First, the sense of ‘public’ used here is the sense of ‘public institutions’ or the res publica. Amazingly and a bit implausibly, government records are considered publicly available only when they are used for purposes compatible with their maintenance. So if a business takes a public record and uses it differently that it was originally intended when it was ‘made available’, it becomes personal information that must be disclosed? As somebody who came out of the Open Data movement, I have to admit I find this baffling. On the other hand, it may be the brilliant solution to privacy in public on the Internet that society has been looking for.

Second, the stipulation that “publicly available” does not mean biometric information collected by a business about a consumer without the consumer’s knowledge” is surprising. It appears to be written with particular cases in mind–perhaps IoT sensing. But why specifically biometric information, as opposed to other kinds of information collected without consumer knowledge?

There is a lot going on in this paragraph. Oddly, it is not one of the ones explicitly flagged for review and revision in the section of soliciting public participation on changes before the Act goes into effect on 2020.

A work in progress

1798.185. (a) On or before January 1, 2020, the Attorney General shall solicit broad public participation to adopt regulations to further the purposes of this title, including, but not limited to, the following areas:

This is a weird law. I suppose it was written and passed to capitalize on a particular political moment and crisis (Sec. 2 specifically mentions Cambridge Analytica as a motivation), drafted to best express its purpose and intent, and given the horizon of 2020 to allow for revisions.

It must be said that there’s nothing in this Act that threatens the business models of any American Big Tech companies in any way, since storing consumer information in order to provide derivative ad targeting services is totally fine as long as businesses do the right disclosures, which they are now all doing because of GDPR anyway. There is a sense that this is California taking the opportunity to start the conversation about what U.S. data protection law post-GDPR will be like, which is of course commendable. As a statement of intent, it is great. Where it starts to get funky is in the definitions of its key terms and the underlying theory of privacy behind them. We can anticipate some rockiness there and try to unpack these assumptions before adopting similar policies in other states.

Pondering “use privacy”

I’ve been working carefully with Datta et al.’s “Use Privacy” work (link), which makes a clear case for how a programmatic, data-driven model may be statically analyzed for its use of a proxy of a protected variable, and repaired.

Their system has a number of interesting characteristics, among which are:

  • The use of a normative oracle for determining which proxy uses are prohibited.
  • A proof that there is no coherent definition of proxy use which has all of a set of very reasonable properties defined over function semantics.

Given (2), they continue with a compelling study of how a syntactic definition of proxy use, one based on the explicit contents of a function, can support a system of detecting and repairing proxies.

My question is to what extent the sources of normative restriction on proxies (those characterized by the oracle in (1)) are likely to favor syntactic proxy use restrictions, as opposed to semantic ones. Since ethicists and lawyers, who are the purported sources of these normative restrictions, are likely to consider any technical system a black box for the purpose of their evaluation, they will naturally be concerned with program semantics. It may be comforting for those responsible for a technical program to be able to, in a sense, avoid liability by assuring that their programs are not using a restricted proxy. But, truly, so what? Since these syntactic considerations do not make any semantic guarantees, will they really plausibly address normative concerns?

A striking result from their analysis which has perhaps broader implications is the incoherence of a semantic notion of proxy use. Perhaps sadly but also substantively, this result shows that a certain plausible normative is impossible for a system to fulfill in general. Only restricted conditions make such a thing possible. This seems to be part of a pattern in these rigorous computer science evaluations of ethical problems; see also Kleinberg et al. (2016) on how it’s impossible to meet several plausible definitions of “fairness” in the risk-assessment scores across social groups except under certain conditions.

The conclusion for me is that what this nobly motivated computer science work reveals is that what people are actually interested in normatively is not the functioning of any particular computational system. They are rather interested in social conditions more broadly, which are rarely aligned with our normative ideals. Computational systems, by making realities harshly concrete, are disappointing, but it’s a mistake to make that a disappointment with the computing systems themselves. Rather, there are mathematical facts that are disappointing regardless of what sorts of systems mediate our social world.

This is not merely a philosophical consideration or sociological observation. Since the the interpretation of laws are part of the process of informing normative expectations (as in a normative oracle), it is an interesting an perhaps open question how lawyers and judges, in their task of legal interpretation, make use of the mathematical conclusions about normative tradeoffs being offered up by computer scientists.

References

Datta, Anupam, et al. “Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs.” arXiv preprint arXiv:1705.07807 (2017).

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. “Inherent trade-offs in the fair determination of risk scores.” arXiv preprint arXiv:1609.05807 (2016).

Robert Post on Data vs. Dignitary Privacy

I was able to see Robert Post present his article, “Data Privacy and Dignitary Privacy: Google Spain, the Right to Be Forgotten, and the Construction of the Public Sphere”, today. My other encounter with Post’s work was quite positive, and I was very happy to learn more about his thinking at this talk.

Post’s argument was based off of the facts of the Google Spain SL v. Agencia Española de Protección de Datos (“Google Spain”) case in the EU, which set off a lot of discussion about the right to be forgotten.

I’m not trained as a lawyer, and will leave the legal analysis to the verbatim text. There were some broader philosophical themes that resonate with topics I’ve discussed on this blog andt in my other research. These I wanted to note.

If I follow Post’s argument correctly, it is something like this:

  • According to EU Directive 95/46/EC, there are two kinds of privacy. Data privacy rules over personal data, establishing control and limitations on use of it. The emphasis is on the data itself, which is property reasoned about analogously to. Dignitary privacy is about maintaining appropriate communications between people and restricting those communications that may degrade, humiliate, or mortify them.
  • EU rules about data privacy are governed by rules specifying the purpose for which data is used, thereby implying that the use of this data must be governed by instrumental reason.
  • But there’s the public sphere, which must not be governed by instrumental reason, for Habermasian reasons. The public sphere is, by definition, the domain of communicative action, where actions must be taken with the ambiguous purpose of open dialogue. That is why free expression is constitutionally protected!
  • Data privacy, formulated as an expression of instrumental reason, is incompatible with the free expression of the public sphere.
  • The Google Spain case used data privacy rules to justify the right to be forgotten, and in this it developed an unconvincing and sloppy precedent.
  • Dignitary privacy is in tension with free expression, but not incompatible with it. This is because it is based not on instrumental reason, but rather on norms of communication (which are contextual)
  • Future right to be forgotten decisions should be made on the basis of dignitary privac. This will result in more cogent decisions.

I found Post’s argument very appealing. I have a few notes.

First, I had never made the connection between what Hildebrandt (2013, 2014) calls “purpose binding” in EU data protection regulation and instrumental reason, but there it is. There is a sense in which these purpose clauses are about optimizing something that is externally and specifically defined before the privacy judgment is made (cf. Tschantz, Datta, and Wing, 2012, for a formalization).

This approach seems generally in line with the view of a government as a bureaucracy primarily involved in maintaining control over a territory or population. I don’t mean this in a bad way, but in a literal way of considering control as feedback into a system that steers it to some end. I’ve discussed the pervasive theme of ‘instrumentality run amok’ in questions of AI superintelligence here. It’s a Frankfurt School trope that appears to have made its way in a subtle way into Post’s argument.

The public sphere is not, in Habermasian theory, supposed to be dictated by instrumental reason, but rather by communicative rationality. This has implications for the technical design of networked publics that I’ve scratched the surface of in this paper. By pointing to the tension between instrumental/purpose/control based data protection and the free expression of the public sphere, I believe Post is getting at a deep point about how we can’t have the public sphere be too controlled lest we lose the democratic property of self-governance. It’s a serious argument that probably should be addressed by those who would like to strengthen rights to be forgotten. A similar argument might be made for other contexts whose purposes seem to transcend circumscription, such as science.

Post’s point is not, I believe, to weaken these rights to be forgotten, but rather to put the arguments for them on firmer footing: dignitary privacy, or the norms of communication and the awareness of the costs of violating them. Indeed, the facts behind right to be forgotten cases I’ve heard of (there aren’t many) all seem to fall under these kinds of concerns (humiliation, etc.).

What’s very interesting to me is that the idea of dignitary privacy as consisting of appropriate communication according to contextually specific norms feels very close to Helen Nissenbaum’s theory of Contextual Integrity (2009), with which I’ve become very familiar in past year through my work with Prof. Nissenbaum. Contextual integrity posits that privacy is about adherence to norms of appropriate information flow. Is there a difference between information flow and communication? Isn’t Shannon’s information theory a “mathematical theory of communication”?

The question of whether and under what conditions information flow is communication and/or data are quite deep, actually. More on that later.

For now though it must be noted that there’s a tension, perhaps a dialectical one, between purposes and norms. For Habermas, the public sphere needs to be a space of communicative action, as opposed to instrumental reason. This is because communicative action is how norms are created: through the agreement of people who bracket their individual interests to discuss collective reasons.

Nissenbaum also has a theory of norm formation, but it does not depend so tightly on the rejection of instrumental reason. In fact, it accepts the interests of stakeholders as among several factors that go into the determination of norms. Other factors include societal values, contextual purposes, and the differentiated roles associated with the context. Because contexts, for Nissenbaum, are defined in part by their purposes, this has led Hildebrandt (2013) to make direct comparisons between purpose binding and Contextual Integrity. They are similar, she concludes, but not the same.

It would be easy to say that the public sphere is a context in Nissenbaum’s sense, with a purpose, which is the formation of public opinion (which seems to be Post’s position). Properly speaking, social purposes may be broad or narrow, and specially defined social purposes may be self-referential (why not?), and indeed these self-referential social purposes may be the core of society’s “self-consciousness”. Why shouldn’t there be laws to ensure the freedom of expression within a certain context for the purpose of cultivating the kinds of public opinions that would legitimize laws and cause them to adapt democratically? We could possibly make these frameworks more precise if we could make them a little more formal and could lose some of the baggage; that would be useful theory building in line with Nissenbaum and Post’s broader agendas.

A test of this perhaps more nuanced but still teleological (indeed, instrumental, but maybe actually more properly speaking pragmatic (a la Dewey), in that it can blend several different metaethical categories) is to see if one can motivate a right to be forgotten in a public sphere by appealing to the need for communicative action, thereby especially appropriate communication norms around it, and dignitary privacy.

This doesn’t seem like it should be hard to do at all.

References

Hildebrandt, Mireille. “Slaves to big data. Or are we?.” (2013).

Hildebrandt, Mireille. “Location Data, Purpose Binding and Contextual Integrity: What’s the Message?.” Protection of Information and the Right to Privacy-A New Equilibrium?. Springer International Publishing, 2014. 31-62.

Nissenbaum, Helen. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2009.

Post, Robert, Data Privacy and Dignitary Privacy: Google Spain, the Right to Be Forgotten, and the Construction of the Public Sphere (April 15, 2017). Duke Law Journal, Forthcoming; Yale Law School, Public Law Research Paper No. 598. Available at SSRN: https://ssrn.com/abstract=2953468 or http://dx.doi.org/10.2139/ssrn.2953468

Tschantz, Michael Carl, Anupam Datta, and Jeannette M. Wing. “Formalizing and enforcing purpose restrictions in privacy policies.” Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 2012.

Ohm and Post: Privacy as threats, privacy as dignity

I’m reading side by side two widely divergent law review articles about privacy.

One is Robert Post‘s “The Social Foundations of Privacy: Community and Self in Common Law Tort” (1989) (link)

The other is Paul Ohm‘s “Sensitive Information” (2014) (link)

They are very notably different. Post’s article diverges sharply from the intellectual millieu I’m used to. It starts with an exposition of Goffman’s view of the personal self as being constituted by ceremonies and rituals of human relationships. Privacy tort law is, in Post’s view, about repairing tears in the social fabric. The closest thing to this that I have ever encountered is Fingarette’s book on Confucianism.

Ohm’s article is much more recent and is in large part a reaction to the Snowden leaks. It’s an attempt to provide an account of privacy that can limit the problems associated with massive state (and corporate?) data collection. It attempts to provide a legally informed account of what information is sensitive, and then suggests that threat modeling strategies from computer security can be adapted to the privacy context. Privacy can be protected by identifying and mitigated privacy threats.

As I get deeper into the literature on Privacy by Design, and observe how privacy-related situations play out in the world and in my own life, I’m struck by the adaptability and indifference of the social world to shifting technological infrastructural conditions. A minority of scholars and journalists track major changes in it, but for the most part the social fabric adapts. Most people, probably necessarily, have no idea what the technological infrastructure is doing and don’t care to know. It can be coopted, or not, into social ritual.

If the swell of scholarship and other public activity on this topic was the result of surprising revelations or socially disruptive technological innovations, these same discomforts have also created an opportunity for the less technologically focused to reclaim spaces for purely social authority, based on all the classic ways that social power and significance play out.

organizational secrecy and personal privacy as false dichotomy cf @FrankPasquale

I’ve turned from page 2 to page 3 of The Black Box Society (I can be a slow reader). Pasquale sets up the dichotomy around which the drama of the hinges like so:

But while powerful businesses, financial institutions, and government agencies hide their actions behind nondisclosure agreements, “proprietary methods”, and gag rules, our own lives are increasingly open books. Everything we do online is recorded; the only questions lft are to whom the data will be available, and for how long. Anonymizing software may shield us for a little while, but who knows whether trying to hide isn’t the ultimate red flag for watchful authorities? Surveillance cameras, data brokers, sensor networks, and “supercookies” record how fast we drive, what pills we take, what books we read, what websites we visit. The law, so aggressively protective of secrecy in the world of commerce, is increasingly silent when it comes to the privacy of persons.

That incongruity is the focus of this book.

This is a rhetorically powerful paragraph and it captures a lot of trepidation people have about the power of larger organization relative to themselves.

I have been inclined to agree with this perspective for a lot of my life. I used to be the kind of person who thought Everything Should Be Open. Since then, I’ve developed what I think is a more nuanced view of transparency: some secrecy is necessary. It can be especially necessary for powerful organizations and people.

Why?

Well, it depends on the physical properties of information. (Here is an example of how a proper understanding of the mechanics of information can support the transcendent project as opposed to a merely critical project).

Any time you interact with something or somebody else in a meaningful way, you affect the state of each other in probabilistic space. That means there has been some kind of flow of information. If an organization interacts with a lot of people, it is going to absorb information about a lot of people. Recording this information as ‘data’ is something that has been done for a long time because that is what allows organizations to do intelligent things vis a vis the people they interact with. So businesses, financial institutions, and governments recording information about people is nothing new.

Pasquale suggests that this recording is a threat to our privacy, and that the secrecy of the organizations that do the recording gives them power over us. But this is surely a false dichotomy. Why? Because if an organization records information about a lot of people, and then doesn’t maintain some kind of secrecy, then that information is no longer private! To, like, everybody else. In other words, maintaining secrecy is one way of ensuring confidentiality, which is surely an important part of privacy.

I wonder what happens if we continue to read The Black Box society with this link between secrecy, confidentiality, and privacy in mind.

Nissenbaum the functionalist

Today in Classics we discussed Helen Nissenbaum’s Privacy in Context.

Most striking to me is that Nissenbaum’s privacy framework, contextual integrity theory, depends critically on a functionalist sociological view. A context is defined by its information norms and violations of those norms are judged according to their (non)accordance with the purposes and values of the context. So, for example, the purposes of an educational institution determine what are appropriate information norms within it, and what departures from those norms constitute privacy violations.

I used to think teleology was dead in the sciences. But recently I learned that it is commonplace in biology and popular in ecology. Today I learned that what amounts to a State Philosopher in the U.S. (Nissenbaum’s framework has been more or less adopted by the FTC) maintains a teleological view of social institutions. Fascinating! Even more fascinating that this philosophy corresponds well enough to American law as to be informative of it.

From a “pure” philosophy perspective (which is I will admit simply a vice of mine), it’s interesting to contrast Nissenbaum with…oh, Horkheimer again. Nissenbaum sees ethical behavior (around privacy at least) as being behavior that is in accord with the purpose of ones context. Morality is given by the system. For Horkheimer, the problem is that the system’s purposes subsume the interests of the individual, who is alone the agent who is able to determine what is right and wrong. Horkheimer is a founder of a Frankfurt school, arguably the intellectual ancestor of progressivism. Nissenbaum grounds her work in Burke and her theory is admittedly conservative. Privacy is violated when people’s expectations of privacy are violated–this is coming from U.S. law–and that means people’s contextual expectations carry more weight than an individual’s free-minded beliefs.

The tension could be resolved when free individuals determine the purpose of the systems they participate in. Indeed, Nissenbaum quotes Burke in his approval of established conventions as being the result of accreted wisdom and rationale of past generations. The system is the way it is because it was chosen. (Or, perhaps, because it survived.)

Since Horkheimer’s objection to “the system” is that he believes instrumentality has run amok, thereby causing the system serve a purpose nobody intended for it, his view is not inconsistent with Nissenbaum’s. Nissenbaum, building on Dworkin, sees contextual legitimacy as depending on some kind of political legitimacy.

The crux of the problem is the question of what information norms comprise the context in which political legitimacy is formed, and what purpose does this context or system serve?

Privacy, trust, context, and legitimate peripheral participation

Privacy is important. For Nissenbaum, what’s essential to privacy is control over context. But what is context?

Using Luhmann’s framework of social systems–ignoring for a moment e.g. Habermas’ criticism and accepting the naturalized, systems theoretic understanding of society–we would have to see a context as a subsystem of the total social system. In so far as the social system is constituted by many acts of communication–let’s visualize this as a network of agents, whose edges are acts of communication–then a context is something preserved by configurations of agents and the way they interact.

Some of the forces that shape a social system will be exogenous. A river dividing two cities or, more abstractly, distance. In the digital domain, the barriers of interoperability between one virtual community infrastructure and another.

But others will be endogenous, formed from the social interactions themselves. An example is the gradual deepening of trust between agents based on a history of communication. Perhaps early conversations are formal, stilted. Later, an agent takes a risk, sharing something more personal–more private? It is reciprocated. Slowly, a trust bond, an evinced sharing of interests and mutual investment, becomes the foundation of cooperation. The Prisoner’s Dilemma is solved the old fashioned way.

Following Carey’s logic that communication as mere transmission when sustained over time becomes communication as ritual and the foundation of community, we can look at this slow process of trust formation as one of the ways that a context, in Nissenbaum’s sense, perhaps, forms. If Anne and Betsy have mutually internalized each others interests, then information flow between them will by and large support the interests of the pair, and Betsy will have low incentives to reveal private information in a way that would be detrimental to Anne.

Of course this is a huge oversimplification in lots of ways. One way is that it does not take into account the way the same agent may participant in many social roles or contexts. Communication is not a single edge from one agent to another in many circumstances. Perhaps the situation is better represented as a hypergraph. One reason why this whole domain may be so difficult to reason about is the sheer representational complexity of modeling the situation. It may require the kind of mathematical sophistication used by quantum physicists. Why not?

Not having that kind of insight into the problem yet, I will continue to sling what the social scientists call ‘theory’. Let’s talk about an exisiting community of practice, where the practice is a certain kind of communication. A community of scholars. A community of software developers. Weird Twitter. A backchannel mailing list coordinating a political campaign. A church.

According to Lave and Wenger, the way newcomers gradually become members and oldtimers of a community of practice is legitimate peripheral participation. This is consistent with the model described above characterizing the growth of trust through gradually deepening communication. Peripheral participation is low-risk. In an open source context, this might be as simple as writing a question to the mailing list or filing a bug report. Over time, the agent displays good faith and competence. (I’m disappointed to read just now that Wenger ultimately abandoned this model in favor of a theory of dualities. Is that a Hail Mary for empirical content for the theory? Also interested to follow links on this topic to a citation of von Krogh 1998, whose later work found its way onto my Open Collaboration and Peer Production syllabus. It’s a small world.

I’ve begun reading as I write this fascinating paper by Hildreth and Kimble 2002 and am now have lost my thread. Can I recover?)

Some questions:

  • Can this process of context-formation be characterized empirically through an analysis of e.g. the timing dynamics of communication (c.f. Thomas Maillart’s work)? If so, what does that tell us about the design of information systems for privacy?
  • What about illegitimate peripheral participation? Arguably, this blog is that kind of participation–it participates in a form of informal, unendorsed quasi-scholarship. It is a tool of context and disciplinary collapse. Is that a kind of violation of privacy? Why not?

developing a nuanced view on transparency

I’m a little late to the party, but I think I may at last be developing a nuanced view on transparency. This is a personal breakthrough about the importance of privacy that I owe largely to the education I’m getting at Berkeley’s School of Information.

When I was an undergrad, I also was a student activist around campaign finance reform. Money in politics was the root of all evil. We were told by our older, wiser activist mentors that we were supposed to lay the groundwork for our policy recommendation and then wait for journalists to expose a scandal. That way we could move in to reform.

Then I worked on projects involving open source, open government, open data, open science, etc. The goal of those activities is to make things more open/transparent.

My ideas about transparency as a political, organizational, and personal issue originated in those experiences with those movements and tactics.

There is a “radically open” wing of these movements which thinks that everything should be open. This has been debunked. The primary way to debunk this is to point out that less privileged groups often need privacy for reasons that more privileged advocates of openness have trouble understanding. Classic cases of this include women who are trying to evade stalkers.

This has been expanded to a general critique of “big data” practices. Data is collected from people who are less powerful than people that process that data and act on it. There has been a call to make the data processing practices more transparent to prevent discrimination.

A conclusion I have found it easy to draw until relatively recently is: ok, this is not so hard. What’s important is that we guarantee privacy for those with less power, and enforce transparency on those with more power so that they can be held accountable. Let’s call this “openness for accountability.” Proponents of this view are in my opinion very well-intended, motivated by values like justice, democracy, and equity. This tends to be the perspective of many journalists and open government types especially.

Openness for accountability is not a nuanced view on transparency.

Here are some examples of cases where an openness for accountability view can go wrong:

  • Arguably, the “Gawker Stalker” platform for reporting the location of celebrities was justified by an ‘opennes for accountability’ logic. Jimmy Kimmel’s browbeating of Emily Gould indicates how this can be a problem. Celebrity status is a form of power but also raises ones level of risk because there is a small percentage of the population that for unfathomable reasons goes crazy and threatens and even attacks people. There is a vicious cycle here. If one is perceived to be powerful, then people will feel more comfortable exposing and attacking that person, which increases their celebrity, increasing their perceived power.
  • There are good reasons to be concerned about stereotypes and representation of underprivileged groups. There are also cases where members of those groups do things that conform to those stereotypes. When these are behaviors that are ethically questionable or manipulative, it’s often important organizationally for somebody to know about them and act on them. But transparency about that information would feed the stereotypes that are being socially combated on a larger scale for equity reasons.
  • Members of powerful groups can have aesthetic taste and senses of humor that are offensive or even triggering to less powerful groups. More generally, different social groups will have different and sometimes mutually offensive senses of humor. A certain amount of public effort goes into regulating “good taste” and that is fine. But also, as is well known, art that is in good taste is often bland and fails to probe the depths of the human condition. Understanding the depths of the human condition is important for everybody but especially for powerful people who have to take more responsibility for other humans.
  • This one is based on anecdotal information from a close friend: one reason why Congress is so dysfunctional now is that it is so much more transparent. That transparency means that politicians have to be more wary of how they act so that they don’t alienate their constituencies. But bipartisan negotiation is exactly the sort of thing that alienates partisan constituencies.

If you asked me maybe two years ago, I wouldn’t have been able to come up with these cases. That was partly because of my positionality in society. Though I am a very privileged man, I still perceived myself as an outsider to important systems of power. I wanted to know more about what was going on inside important organizations and was frustrated by my lack of access to it. I was very idealistic about wanting a more fair society.

Now I am getting older, reading more, experiencing more. As I mature, people are trusting me with more sensitive information, and I am beginning to anticipate the kinds of positions I may have later in my career. I have begun to see how my best intentions for making the world a better place are at odds with my earlier belief in openness for accountability.

I’m not sure what to do with this realization. I put a lot of thought into my political beliefs and for a long time they have been oriented around ideas of transparency, openness, and equity. Now I’m starting to see the social necessity of power that maintains its privacy, unaccountable to the public. I’m starting to see how “Public Relations” is important work. A lot of what I had a kneejerk reaction against now makes more sense.

I am in many ways a slow learner. These ideas are not meant to impress anybody. I’m not a privacy scholar or expert. I expect these thoughts are obvious to those with less of an ideological background in this sort of thing. I’m writing this here because I see my current role as a graduate student as participating in the education system. Education requires a certain amount of openness because you can’t learn unless you have access to information and people who are willing to teach you from their experience, especially their mistakes and revisions.

I am also perhaps writing this now because, who knows, maybe one day I will be an unaccountable, secretive, powerful old man. Nobody would believe me if I said all this then.

The link between computation asymmetry and openness

I want to jot something down while it is on my mind. It’s rather speculative, but may wind up being the theme of my thesis work.

I’ve written here about computational asymmetry in the economy. The idea is that when different agents are endowed with different capacity to compute (or are differently boundedly rational)), then that can become an extreme inequality (power law distributed, as is income) as computational power is stockpiled as a kind of capital accumulation.

Whereas a solution to unequal income is redistribution and a solution to unequal physical is regulation against violence, for computational asymmetry there is a simpler solution: “openness” in the products of computation. In particular, high quality data goods–data that is computationally rich (has more logical depth)–can be made available as public goods.

There are several challenges to this idea. One is the problem of funding. How do you encourage the production of costly public goods? The classic answer is state funding. Today we have another viable option, crowdfunding.

Another involves questions of security and privacy. Can a policy of ‘openness’ lead to problematic invasions of privacy? Viewing the problem in light of computational assymetry sheds light into this dynamic. Privacy should be a privilege of the disempowered, openness a requirement of the powerful.

In an ideal economy, agents are rewarded for their contribution to social welfare. For high quality data goods, openness leads to the maximum social welfare. So in theory, agents should be willingly adopting an open policy of their own volition. What has prevented them in the past is transactions costs and the problem of incurred risk. As institutions that reduce transaction costs and absorb risks get better, the remaining problems will be ones of regulation of noncompetitive practices.