Digifesto

Tag: data economics

Research update: to study the economy of personal data

I have not been writing here for some time because of strokes of good luck that have been keeping me busy.

I’ve been awarded a Social Behavioral and Economic Sciences (SBE) Post-Doctoral Research Fellowship (“SPRF” in total) by the National Science Foundation.

This is a lot of words to write out, but they sum up to a significant change in my research and role that I’m still adjusting to.

First, I believe this means that I am a social scientist of some kind. What kind? It’s not clear. If I could have my choice, it would be “economist”. But since Economics is a field widely known for gatekeeping, and I do not have an Economic degree, I’m not sure I can get away with this.

Nevertheless, my SPRF research project is an investigation into the economics of data (especially personal data) using methods that are build on those used in orthodox and heterodox economics.

The study of the economics of personal data is coming from my dissertation work and the ongoing policy research I’ve done at NYU School of Law’s Information Law Institute. Though my work has touched on many other fields — computer science and the design of information systems; sociology and the study of race and networked publics; philosophy and law — at the end of the day the drivers of “technology’s” impact on society are businesses operating according to an economic logic. This is something that everybody knows, but that few academic researchers are in a position to admit, because many of the scholars who think seriously about these issues are coming from other disciplines.

For better or for worse, I have trouble sticking to a tunnel of which I can’t see the intellectual daylight at the end.

So how can we study the economy of personal data?

I would argue — that this is something that most Economists will balk at — that the tools currently available to study this economy are insufficient for the task. Who am I to say such a thing? Nobody special.

But weighing in my favor is the argument that the even the tools used by Economists to study the macroeconomy are insufficient for the task. This point was made decisively by the 2008 Financial Crisis, which blindsided the economic establishment. One of the reasons why Economics failed was because the discipline had deeply entrenched oversimplified assumptions in their economic models. One of these was representative agent modeling, which presumed to model the enter economy with a single “representative agent” for a sector or domain. This makes the economist’s calculations easier but is clearly unrealistic, and indeed it’s the differences between agents that create much of the dynamism and pitfalls of the economy. Hence the rise in heterogeneous agent modeling (HAM), which is explicit about the differences between agents with respect to things like, for example, wealth, risk aversion, discount factor, level of education, and so on.

It was my extraordinary good fortune to find an entry into the world of HAM via the Econ-ARK software project (Carroll et al, 2018; Benthall and Seth, 2020), which needed a software engineer enthusiastic about open source scientific tools at a moment when I was searching for a job. Econ-ARK’s HAM toolkit, HARK, has come a long way since I joined the project in late 2019. And it still has quite a ways to go. But it’s been a tremendously rewarding project to be involved with, in no small part because it has been a hands-on introduction to the nitty-gritty of contemporary Economics methods.

It’s these tools which I will be extending with insights from my other work, which is grounded more in computer science and legal scholarship, in order to model the data economy. Naturally, the economy for personal data depends on the heterogeneity of consumers — it is those differences that make a difference between consumers that make the trade in personal information possible and relevant. And while there are many notational and conventional differences between the orthodox Economics methods and the causal Bayesian frameworks that I’ve worked in before, these methods in fact share a logical core that makes them commensurable.

I’ve mentioned both orthodox and heterodox economics. By this I mean to draw a distinction between the core of the Economics discipline, which in my understanding is still tied to rational expectations and general equilibria — meaning the idea that agents know what to expect from the market and act accordingly — and heterodox views that find these assumptions to be dangerously unrealistic. This is truly a sore spot for Economics. As the trenchant critiques of Mirowski and Nik-Kah (2017) reveal, these core assumptions commit Economists to many absurd conclusions; however, they are loathe to abandon them lest they lose the tight form of rigor which they have demanding to maintain a kind of standardization within the discipline. Rational expectations aligns economics with engineering disciplines, like control theory and artificial intelligence, which makes their methods more in-demand. Equilibrium theories give Economics a normative force and excuses when its predictions do not pan out. However, the 2008 Financial Crisis embarassed these methods, and now the emerging HAM techniqes include not only a broadened from of rational agent modeling, but also a much looser paradigm of Agent-Based Modeling (ABM) that allow for more realistic dynamics with boundedly rational agents (Bookstaber, 2017).

Today, the biggest forces in the economy are precisely those that have marshaled information to their advantage in a world with heterogeneous agents (Benthall and Goldenfein, 2021). Economic agents differ both horizontally — like consumers of different demographic categories such as race and sex — and vertically — as consumers and producers of information services have different relationships to personal data. As I explore in forthcoming work with Salome Viljoen (2021), the monetization of personal data has always been tied to the financial system, first via credit reporting, and later through the financialization of consumer behavior through digital advertising networks. And yet the macroeconomic impact of the industries that profit from these information flows, which now account for the largest global companies, is not understood because of disciplinary blinders that Economics has had for decades and is only now trying to shed.

I’m convinced the research is well motivated. The objection, which comes from my most well-meaning mentors, is that the work is too difficult or in fact impossible. Introducing heterogeneously bounded rationality into economic modeling creates a great deal of modeling and computational complexity. Calibrating, simulating, and testing such models is expensive, and progress requires a great deal of technical thinking about how to compute results efficiently. There are also many social and disciplinary obstacles to this kind of work: for the reasons discussed above, it’s not clear where this work belongs.

However, I consider myself immensely fortunate to have a real, substantive, difficult problem to work on, and enough confidence from the National Science Foundation that they support my trying to solve it. It’s an opportunity of a lifetime and, to be honest, as a researcher who has often felt at the fringes of a viable scholarly career, a real break. The next steps are exciting and I can’t wait to see what’s around the corner.

References

Benthall, S., & Goldenfein, J. (2021, May). Artificial Intelligence and the Purpose of Social Systems. In Proceedings of the 2021 AAAI/ACM Conference on AI Ethics and Society (AIES’21).

Benthall, S., & Seth, M. (2020). Software Engineering as Research Method: Aligning Roles in Econ-ARK.

Benthall, S. & Viljoen, S. (2021) Data Market Discipline: From Financial Regulation to Data Governance. J. Int’l & Comparative Law https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3774418

Bookstaber, R. (2017). The end of theory. Princeton University Press.

Carroll, C. D., Kaufman, A. M., Kazil, J. L., Palmer, N. M., & White, M. N. (2018). The Econ-ARK and HARK: Open Source Tools for Computational Economics. In Proceedings of the 17th Python in Science Conference (pp. 25-30).

Mirowski, P., & Nik-Khah, E. (2017). The knowledge we have lost in information: the history of information in modern economics. Oxford University Press.

Regulating infoglut?

In the 20’s, many people were attracted for the first time in investing in the stock market. It was a time when fortunes were made and lost, but made more than they were lost, and so on average investors saw large returns. However, the growth in value of stocks was driven in part, and especially in the later half of the decade, by debt. The U.S. Federal Reserve chose to lower interest rates, making it easier to borrow money. When the interest rates on loans were lower than the rates of return on stocks, everybody from households to brokers began to take on debt to reinvest in the stock market. (Brooks, 1999)

After the crash of ’29, which left the economy decimated, there was a reckoning, leading to the Securities Act of 1933 and the Securities Exchange Act of 1934. The latter established the Securities and Exchange Commission (SEC), and established the groundwork for the more trusted financial institutions we have today.

Cohen (2016) writes about a more current economic issue. As the economy changes from being centered on industrial capitalism to informational capitalism, the infrastructural affordances of modern computing and networking have invalidated the background logic of how many regulations are supposed to work. For example, anti-discrimination regulation is designed to prevent decisions from being made based on protected or sensitive attributes of individuals. However, those regulations made most sense when personal information was relatively scarce. Today, when individual activity is highly instrumented by pervasive computing infrastructure, we suffer from infoglut — more information than is good for us, either as individuals or as a society. As a consequence, proxies of protected attributes are readily available for decision-makers and indeed are difficult to weed out of a machine learning system even when market actors fully intend to do so (see Datta et al., 2017). In other words, the structural conditions that enable infoglut erode rights that we took for granted in the absence of today’s network and computing systems.

In an ongoing project with Salome Viljoen, we are examining the parallels between the financial economy and the data economy. These economies are, of course, not fully distinct. However, they are distinguished in part by how they are regulated: the financial economy has over a century of matured regulations defining it and reducing system risks such as those resulting from a debt-financed speculative bubble; the data economy has emerged only recently as a major source of profit with perhaps unforeseen systemic risks.

We have an intuition that we would like to pin down more carefully as we work through these comparisons: that there is something similar about the speculative bubbles that led to the Great Depression and today’s infoglut. In a similar vein to prior work looking that uses regulatory analogy to motivate new thinking about data regulation (Hirsch, 2013; Froomkin, 2015) and professional codes (Stark and Hoffman, 2019), we are interested in how financial regulation may be a precedent for regulation of the data economy.

However, we have reason to believe that the connections between finance and personal data are not merely metaphorical. Indeed, finance is an area with well-developed sectoral privacy laws that guarantee the confidentiality of personal data (Swire, 2003); it is also the case that financial institutions are one of the many ways personal data originating from non-financial contexts is monetized. We do not have to get poetic to see how these assets are connected; they are related as a matter of fact.

What is more elusive, and at this point only a hypothesis, is that there is valid sense in which the systemic risks of infoglut can be conceptually understood using tools similar to those that are used to understand financial risk. Here I maintain an ambition: that systemic risk due to infoglut may be understood using the tools of macroeconomics and hence internalized via technocratic regulatory mechanisms. This would be a departure from Cohen (2016), who gestures more favorably towards “uncertainty” based regulation that does not attempt probabilistic expectation but rather involves tools such as threat modeling, as used in some cybersecurity practices.

References

Brooks, J. (1999). Once in Golconda: A true drama of Wall Street 1920-1938. John Wiley & Sons.

Cohen, J. E. (2016). The regulatory state in the information age. Theoretical Inquiries in Law17(2), 369-414.

Datta, A., Fredrikson, M., Ko, G., Mardziel, P., & Sen, S. (2017, October). Use privacy in data-driven systems: Theory and experiments with machine learnt programs. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1193-1210).

Froomkin, A. M. (2015). Regulating Mass Surveillance as Privacy Pollution: Learning from Environmental Impact Statements. U. Ill. L. Rev., 1713.

Hirsch, D. D. (2013). The glass house effect: Big Data, the new oil, and the power of analogy. Me. L. Rev.66, 373.

Stark, L., & Hoffmann, A. L. (2019). Data is the new what? Popular metaphors & professional ethics in emerging data culture.

Swire, P. P. (2003). Efficient confidentiality for privacy, security, and confidential business information. Brookings-Wharton Papers on Financial Services2003(1), 273-310.

Surden, H. (2007). Structural rights in privacy. SMUL Rev.60, 1605.

For a more ethical Silicon Valley, we need a wiser economics of data

Kara Swisher’s NYT op-ed about the dubious ethics of Silicon Valley and Nitasha Tiku’s WIRED article reviewing books with alternative (and perhaps more cynical than otherwise stated) stories about the rise of Silicon Valley has generated discussion and buzz among the tech commentariat.

One point of debate is whether the focus should be on “ethics” or on something more substantively defined, such as human rights. Another point is whether the emphasis should be on “ethics” or on something more substantively enforced, like laws which impose penalties between 1% and 4% of profits, referring of course to the GDPR.

While I’m sympathetic to the European approach (laws enforcing human rights with real teeth), I think there is something naive about it. We have not yet seen whether it’s ever really possible to comply with the GDPR could wind up being a kind of heavy tax on Big Tech companies operating in the EU, but one that doesn’t truly wind up changing how people’s data are used. In any case, the broad principles of European privacy are based on individual human dignity, and so they do not take into account the ways that corporations are social structures, i.e. sociotechnical organizations that transcend individual people. The European regulations address the problem of individual privacy while leaving mystified the question of why the current corporate organization of the world’s personal information is what it is. This sets up the fight over ‘technology ethics’ to be a political conflict between different kinds of actors whose positions are defined as much by their social habitus as by their intellectual reasons.

My own (unpopular!) view is that the solution to our problems of technology ethics are going to have to rely on a better adapted technology economics. We often forget today that economics was originally a branch of moral philosophy. Adam Smith wrote The Theory of Moral Sentiments (1759) before An Inquiry into the Nature and Causes of the Wealth of Nations (1776). Since then the main purpose of economics has been to intellectually grasp the major changes to society due to production, trade, markets, and so on in order to better steer policy and business strategy towards more fruitful equilibria. The discipline has a bad reputation among many “critical” scholars due to its role in supporting neoliberal ideology and policies, but it must be noted that this ideology and policy work is not entirely cynical; it was a successful centrist hegemony for some time. Now that it is under threat, partly due to the successes of the big tech companies that benefited under its regime, it’s worth considering what new lessons we have to learn to steer the economy in an improved direction.

The difference between an economic approach to the problems of the tech economy and either an ‘ethics’ or a ‘law’ based approach is that it inherently acknowledges that there are a wide variety of strategic actors co-creating social outcomes. Individual “ethics” will not be able to settle the outcomes of the economy because the outcomes depend on collective and uncoordinated actions. A fundamentally decent person may still do harm to others due to their own bounded rationality; “the road to hell is paved with good intentions”. Meanwhile, regulatory law is not the same as command; it is at best a way of setting the rules of a game that will be played, faithfully or not, by many others. Putting regulations in place without a good sense of how the game will play out differently because of them is just as irresponsible as implementing a sweeping business practice without thinking through the results, if not more so because the relationship between the state and citizens is coercive, not voluntary as the relationship between businesses and customers is.

Perhaps the biggest obstacle to shifting the debate about technology ethics to one about technology economics is that it requires a change in register. It drains the conversation of the pathos which is so instrumental in surfacing it as an important political topic. Sound analysis often ruins parties like this. Nevertheless, it must be done if we are to progress towards a more just solution to the crises technology gives us today.

“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics” <– My dissertation

In the last two weeks, I’ve completed, presented, and filed my dissertation, and commenced as a doctor of philosophy. In a word, I’ve PhinisheD!

The title of my dissertation is attention-grabbing, inviting, provocative, and impressive:

“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics”

If you’re reading this, you are probably wondering, “How can I drop everything and start reading that hot dissertation right now?”

Look no further: here is a link to the PDF.

You can also check out this slide deck from my “defense”. It covers the highlights.

I’ll be blogging about this material as I break it out into more digestible forms over time. For now, I’m obviously honored by any interest anybody takes in this work and happy to answer questions about it.