Digifesto

Category: privacy

Towards a Synthesis of Differential Privacy and Contextual Integrity

At last week’s 3rd Annual Symposium on Applications of Contextual Integrity, there was a lively discussion of a top-of-mind concern for computers scientists seeking to work with Contextual Integrity (CI): how does CI relate to differential privacy (DP)? Yan Shvartzshnaider encouraged me to write up my own comments as a blog post.

Differential Privacy (DP)

Differential privacy (Dwork, 2006) is a widely studied paradigm of computational privacy. It is a mathematical property of an algorithm or database A which dictates that the output of the mechanism depends only slightly on any one individual data subject’s data. This is most often expressed mathematically as

Pr[A(D_1) \in S] \leq e^\epsilon \cdot Pr[A(D_2) \in S]

Where D_1 and D_2 differ only by the contents on one data point corresponding to a single individual, and S is any arbitrary set of outputs of the mechanism.

A key motivation for DP is that each individual should, in principle, be indifferent to whether or not they are included in the DP database, because their impact on the result is bounded by a small value, \epsilon.

There are many, many variations of DP that differ based on assumptions about the generative model of the data set, the privacy threat model, and others ways of relaxing the indifference constraint. However, the technical research of DP is often silent on some key implementation details, such as how to choose the privacy budget \epsilon. There are some noteworthy industrial applications of DP, but they may use egregiously high values of \epsilon. There are also several reasons to believe the DP is not getting at a socially meaningful sense of privacy, but rather is merely a computationally convenient thing to research and implement.

Contextual Integrity (CI)

Contextual Integrity (Nissenbaum, 2009/2020) aims to capture what is socially meaningful about privacy. It defined privacy as appropriate information flow, where appropriateness means alignment with norms based in social context. Following Walzer (2008)’s vision of society divided into separate social spheres, CI recognizes that society is differentiated into many contexts, such as education, healthcare, the workplace, and the family, and that each context has different norms about personal information flow that are adapted to that context’s purpose. For example, the broadly understood rules that doctors keep their patient’s medical information confidential, but can share records with patient’s consent to other medical specialists, are examples of information norms that adhere in the context of healthcare. CI provides a template for understanding information norms, parameterized in terms of:

  • Sender of the personal information
  • Receiver of the personal information
  • Subject of the personal information — the “data subject” in legal terms
  • The attribute of the data subject that is referred to or described in the personal information.
  • The transmission principle, the normative rule governing the conditions under which the above parameterized information flow is (in)appropriate. Examples of transmission principles include reciprocity, confidentiality, and consent.

Though CI is a theory based in social, philosophical, and legal theories of privacy, it has had uptake in other disciplines, including computer science. These computer science applications have engaged CI deeply and contributed to it by clarifying the terms and limits of the theory (Benthall et al., 2017).

CI has perhaps been best used by computer scientists thus far as a way of conceptualizing the privacy rules of sectoral regulations such as HIPAA, GLBA, and COPPA (Barth et al., 2006) and commercial privacy polices (Shvartzshnaider et al., 2019). However, a promise of CI is that is can address social expectations that have not yet been codified into legal language, helping to bridge between technical design, social expectation, and legal regulation in new and emerging contexts.

Bridging Between DP and CI

I believe it’s safe to say that whereas DP has been widely understood and implemented by computer scientists, it has not sufficed as either a theory or practice to meet the complex and nuanced requirements that socially meaningful privacy entails. On the other hand, while CI does a better job of capturing socially meaningful privacy, it has not yet been computationally operationalized in a way that makes it amenable to widespread implementation. The interest at the Symposium in bridging DP and CI was due to a recognition that CI has defined problems worth solving by privacy oriented computer scientists who would like to build on their deep expertise in DP.

What, then, are the challenges to be addressed by a synthesis of DP and CI? These are just a few conjectures.

Social choice of epsilon. DP is a mathematical theory that leaves open the key question of the choice of privacy budget \epsilon. DP researchers would love a socially well-grounded way to choose is numerical value. CI can theoretically provide that social expectation, except for the fact that social norms are generally not expressed with such mathematical sensitivity. Rather, social norms (and legal rules) use less granular terms like confidentiality and consent. A DP/CI synthesis might involve a mapping from natural language privacy rules to numerical values for tuning DP.

Being explicit about context. DP is attractive precisely because it is a property of a mechanism that does not depend on the system’s context (Tschantz et al., 2020). But this is also its weakness. Key assumptions behind the motivation of DP, such as that the data subjects’ qualities are independent from each other, are wrong in many important privacy contexts. Variations of DP have been developed to, for example, adapt to how genetically or socially related people will have similar data, but the choice of which variant to use should be tailored to the conditions of social context. CI can inform DP practice by clarifying which contextual conditions matter and how to map these to DP variations.

DP may only address a subset of CI’s transmission principles. The rather open concept of transmission principle in CI does a lot of work for the theory by making it extensible to almost any conceivable privacy norm. Computer scientists may need to accept that DP will only be able to address a subset of CI’s transmission principles — those related to negative rules of personal information flow. Indeed, some have argued that CI’s transmission principles include rules that will always be incompletely auditable from a computer science perspective. (Datta et al., 2001) DP scholars may need to accept the limits of DP and see CI as a new frontier.

Horizontal data relations and DP for data governance. Increasingly, legal privacy scholars are becoming skeptical that socially meaningful privacy can be guaranteed to individuals alone. Because any individual’s data can enable an inference that has an effect on others, even those who are not in the data set, privacy may not properly be an individual concern. Rather, as Viljoen argues, these horizontal relationships between individuals via their data make personal data a democratic concern properly addressed with a broader understanding of collective or institutional data governance. This democratic data approach is quite consistent with CI, which was among the first privacy theories to emphasize the importance of socially understood norms as opposed to privacy as individual “control” of data. DP can no longer rely on its motivating idea that individual indifference to inclusion in a data set is sufficient for normative, socially meaningful privacy. However, some DP scholars have already begun to expand their expertise and address how DP can play a role in data governance. (Zhang et al., 2020)


DP and CI are two significant lines of privacy research that have not yet been synthesized effectively. That presents an opportunity for researchers in either subfield to reach across the aisle and build new theoretical and computational tools for socially meaningful privacy. In many ways, CI has worked to understand the socially contextual aspects of privacy, preparing the way for more mathematically oriented DP scholars to operationalize them. However, DP scholars may need to relax some of their assumptions and open their minds to make the most of what CI has to offer computational privacy.

References

Barth, A., Datta, A., Mitchell, J. C., & Nissenbaum, H. (2006, May). Privacy and contextual integrity: Framework and applications. In 2006 IEEE symposium on security and privacy (S&P’06) (pp. 15-pp). IEEE.

Benthall, S., G├╝rses, S., & Nissenbaum, H. (2017). Contextual integrity through the lens of computer science. Now Publishers.

Datta, A., Blocki, J., Christin, N., DeYoung, H., Garg, D., Jia, L., Kaynar, D. and Sinha, A., 2011, December. Understanding and protecting privacy: Formal semantics and principled audit mechanisms. In International Conference on Information Systems Security (pp. 1-27). Springer, Berlin, Heidelberg.

Dwork, C. (2006, July). Differential privacy. In International Colloquium on Automata, Languages, and Programming (pp. 1-12). Springer, Berlin, Heidelberg.

Nissenbaum, H. (2020). Privacy in context. Stanford University Press.

Shvartzshnaider, Y., Pavlinovic, Z., Balashankar, A., Wies, T., Subramanian, L., Nissenbaum, H., & Mittal, P. (2019, May). Vaccine: Using contextual integrity for data leakage detection. In The World Wide Web Conference (pp. 1702-1712).

Shvartzshnaider, Y., Apthorpe, N., Feamster, N., & Nissenbaum, H. (2019, October). Going against the (appropriate) flow: A contextual integrity approach to privacy policy analysis. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (Vol. 7, No. 1, pp. 162-170).

Tschantz, M. C., Sen, S., & Datta, A. (2020, May). Sok: Differential privacy as a causal property. In 2020 IEEE Symposium on Security and Privacy (SP) (pp. 354-371). IEEE.

Viljoen, S. (forthcoming). Democratic data: A relational theory for data governance. Yale Law Journal.

Walzer, M. (2008). Spheres of justice: A defense of pluralism and equality. Basic books.

Zhang, W., Ohrimenko, O., & Cummings, R. (2020). Attribute Privacy: Framework and Mechanisms. arXiv preprint arXiv:2009.04013.

Trade secrecy, “an FDA for algorithms”, a software bills of materials (SBOM) #SecretAlgos

At the Conference on Trade Secrets and Algorithmic Systems at NYU today, the target of most critiques is the use of trade secrecy by proprietary technology providers to prevent courts and the public from seeing the inner workings of algorithms that determine people’s credit scores, health care, criminal sentencing, and so on. The overarching theme is that sometimes companies will use trade secrecy to hide the ways that their software is bad, and that that is a problem.

In one panel, the question of whether an “FDA for Algorithms” is on the table–referring the Food and Drug Administration’s approval of pharmaceuticals. It was not dealt with in too much depth, which is too bad, because it is a nice example of how government oversight of potentially dangerous technology is managed in a way that respects trade secrecy.

According to this article, when filing for FDA approval, a company can declare some of their ingredients to be trade secrets. The upshot of that is that those trade secrets are not subject to FOIA requests. However, these ingredients are still considered when approval is granted by the FDA.

It so happens that in the cybersecurity policy conversation (more so than in privacy) the question of openness of “ingredients” to inspection has been coming up in a serious way. NTIA has been hosting multistakeholder meetings about standards and policy around Software Component Transparency. In particular they are encouraging standardizations of Software Bills of Materials (SBOM) like the Linux Foundation’s Software Package Data Exchange (SPDX). SPDX (and SBOM’s more generally) describe the “ingredients” in a software package at a higher level of resolution than exposing the full source code, but at a level specific enough useful for security audits.

It’s possible that a similar method could be used for algorithmic audits with fairness (i.e., nondiscrimination compliance) and privacy (i.e., information sharing to third-parties) in mind. Particular components could be audited (perhaps in a way that protects trade secrecy), and then those components could be listed as “ingredients” by other vendors.

Privacy of practicing high-level martial artists (BJJ, CI)

Continuing my somewhat lazy “ethnographic” study of Brazilian Jiu Jitsu, an interesting occurrence happened the other day that illustrates something interesting about BJJ that is reflective of privacy as contextual integrity.

Spencer (2016) has accounted for the changes in martial arts culture, and especially Brazilian Jiu Jitsu, due to the proliferation of video on-line. Social media is now a major vector for the skill acquisition in BJJ. It is also, in my gym, part of the social experience. A few dedicated accounts on social media platforms that share images and video from the practice. There is a group chat where gym members cheer each other on, share BJJ culture (memes, tips), and communicate with the instructors.

Several members have been taking pictures and videos of others in practice and sharing them to the group chat. These are generally met with enthusiastic acclaim and acceptance. The instructors have also been inviting in very experienced (black belt) players for one-off classes. These classes are opportunities for the less experienced folks to see another perspective on the game. Because it is a complex sport, there are a wide variety of styles and in general it is exciting and beneficial to see moves and attitudes of masters besides the ones we normally train with.

After some videos of a new guest instructor were posted to the group chat, one of the permanent instructors (“A”) asked not to do this:

A: “As a general rule of etiquette, you need permission from a black belt and esp if two black belts are rolling to record them training, be it drilling not [sic] rolling live.”

A: “Whether you post it somewhere or not, you need permission from both to record then [sic] training.”

B: “Heard”

C: “That’s totally fine by me, but im not really sure why…?

B: “I’m thinking it’s a respect thing.”

A: “Black belt may not want footage of him rolling or training. as a general rule if two black belts are training together it’s not to be recorded unless expressly asked. if they’re teaching, that’s how they pay their bills so you need permission to record them teaching. So either way, you need permission to record a black belt.”

A: “I’m just clarifying for everyone in class on etiquette, and for visiting other schools. Unless told by X, Y, [other gym staff], etc., or given permission at a school you’re visiting, you’re not to record black belts and visiting upper belts while rolling and potentially even just regular training or class. Some schools take it very seriously.”

C: “OK! Totally fine!”

D: “[thumbs up emoji] gots it :)”

D: “totally makes sense”

A few observations on this exchange.

First, there is the intriguing point that for martial arts black belts teaching, their instruction is part of their livelihood. The knowledge of the expert martial arts practitioner is hard-earned and valuable “intellectual property”, and it is exchanged through being observed. Training at a gym with high-rank players is a privilege that lower ranks pay for. The use of video recording has changed the economy of martial arts training. This has in many ways opened up the sport; it also opens up potential opportunities for the black belt in producing training videos.

Second, this is framed as etiquette, not as a legal obligation. I’m not sure what the law would say about recordings in this case. It’s interesting that as a point of etiquette, it applies only to videos of high belt players. Recording low belt players doesn’t seem to be a problem according to the agreement in the discussion. (I personally have asked not to be recorded at one point at the gym when an instructor explicitly asked to be recorded in order to create demo videos. This was out of embarrassment at my own poor skills; I was also feeling badly because I was injured at the time. This sort of consideration does not, it seem, currently operate as privacy etiquette within the BJJ community. Perhaps these norms are currently being negotiated or are otherwise in flux.)

Third, there is a sense in which high rank in BJJ comes with authority and privileges that do not require any justification. The “trainings are livelihood” argument does apply directly to general practice roles; the argument is not airtight. There is something else about the authority and gravitas of the black belt that is being preserved here. There is a sense of earned respect. Somehow this translates into a different form of privacy (information flow) norm.

References

Spencer, D. C. (2016). From many masters to many Students: YouTube, Brazilian Jiu Jitsu, and communities of practice. Jomec Journal, (5).