Know-how is not interpretable so algorithms are not interpretable

I happened upon Hildreth and Kimble’s “The duality of knowledge” (2002) earlier this morning while writing this and have found it thought-provoking through to lunch.

What’s interesting is that it is (a) 12 years old, (b) a rather straightforward analysis of information technology, expert systems, ‘knowledge management’, etc. in light of solid post-Enlightenment thinking about the nature of knowledge, and (c) an anticipation of the problems of ‘interpretability’ that were a couple months ago at least an active topic of academic discussion. Or so I hear.

This is the paper’s abstract:

Knowledge Management (KM) is a field that has attracted much attention both in academic and practitioner circles. Most KM projects appear to be primarily concerned with knowledge that can be quantified and can be captured, codified and stored – an approach more deserving of the label Information Management.

Recently there has been recognition that some knowledge cannot be quantified and cannot be captured, codified or stored. However, the predominant approach to the management of this knowledge remains to try to convert it to a form that can be handled using the ‘traditional’ approach.

In this paper, we argue that this approach is flawed and some knowledge simply cannot be captured. A method is needed which recognises that knowledge resides in people: not in machines or documents. We will argue that KM is essentially about people and the earlier technology driven approaches, which failed to consider this, were bound to be limited in their success. One possible way forward is offered by Communities of Practice, which provide an environment for people to develop knowledge through interaction with others in an environment where knowledge is created nurtured and sustained.

The authors point out that Knowledge Management (KM) is an extension of the earlier program of Artificiali Intelligence, depends on a model of knowledge that maintains that knowledge can be explicitly represented and hence stored and transfered, and propose an alternative way of thinking about things based on the Communities of Practice framework.

A lot of their analysis is about the failures of “expert systems”, which is a term that has fallen out of use but means basically the same thing as the contemporary uncomputational scholarly use of ‘algorithm’. An expert system was a computer program designed to make decisions about things. Broadly speaking, a search engine is a kind of expert system. What’s changed are the particular techniques and algorithms that such systems employ, and their relationship with computing and sensing hardware.

Here’s what Hildreth and Kimble have to say about expert systems in 2002:

Viewing knowledge as a duality can help to explain the failure of some KM initiatives. When the harder aspects are abstracted in isolation the representation is incomplete: the softer aspects of knowledge must also be taken into account. Hargadon (1998) gives the example of a server holding past projects, but developers do not look there for solutions. As they put it, ‘the important knowledge is all in people’s heads’, that is the solutions on the server only represent the harder aspects of the knowledge. For a complete picture, the softer aspects are also necessary. Similarly, the expert systems of the 1980s can be seen as failing because they concentrated solely on the harder aspects of knowledge. Ignoring the softer aspects meant the picture was incomplete and the system could not be moved from the environment in which it was developed.

However, even knowledge that is ‘in people’s heads’ is not sufficient – the interactive aspect of Cook and Seely Brown’s (1999) ‘knowing’ must also be taken into account. This is one of the key aspects to the management of the softer side to knowledge.

In 2002, this kind of argument was seen as a valuable critique of artificial intelligence and the practices based on it as a paradigm. But already by 2002 this paradigm was falling away. Statistical computing, reinforcement learning, decision tree bagging, etc. were already in use at this time. These methods are “softer” in that they don’t require the “hard” concrete representations of the earlier artificial intelligence program, which I believe by that time was already refered to as “Good Old Fashioned AI” or GOFAI by a number of practicioners.

(I should note–that’s a term I learned while studying AI as an undergraduate in 2005.)

So throughout the 90’s and the 00’s, if not earlier, ‘AI’ transformed into ‘machine learning’ and become the implementation of ‘soft’ forms of knowledge. These systems are built to learn to perform a task optimally based flexibly on feedback from past performance. They are in fact the cybernetic systems imagined by Norbert Wiener.

Perplexing, then, is the contemporary problem that the models created by these machine learning algorithms are opaque to their creators. These models were created using techniques that were designed precisely to solve the problems that systems based on explicit, communicable knowledge were meant to solve.

If you accept the thesis that contemporary ‘algorithms’-driven systems are well-designed implementations of ‘soft’ knowledge systems, then you get some interesting conclusions.

First, forget about interpeting the learned models of these systems and testing them for things like social discrimination, which is apparently in vogue. The right place to focus attention is on the function being optimized. All these feedback-based systems–whether they be based on evolutionary algorithms, or convergence on local maxima, or reinforcement learning, or whatever–are designed to optimize some goal function. That goal function is the closest thing you will get to an explicit representation of the purpose of the algorithm. It may change over time, but it should be coded there explicitly.

Interestingly, this is exactly the sense of ‘purpose’ that Wiener proposed could be applied to physical systems in his landmark essay, published with Rosenbleuth and Bigelow, “Purpose, Behavior, and Teleology.” In 1943. Sly devil.

EDIT: An excellent analysis of how fairness can be represented as an explicit goal function can be found in Dwork et al. 2011.

Second, because what the algorithms is designed to optimize is generally going to be something like ‘maximize ad revenue’ and not anything particularly explicitly pernicious like ‘screw over the disadvantaged people’, this line of inquiry will raise some interesting questions about, for example, the relationship between capitalism and social justice. By “raise some interesting questions”, I mean, “reveal some uncomfortable truths everyone is already aware of”. Once it becomes clear that the whole discussion of “algorithms” and their inscrutability is just a way of talking about societal problems and entrenched political interests without talking about it, it will probably be tabled due to its political infeasibility.

That is (and I guess this is the third point) unless somebody can figure out how to explicitly define the social justice goals of the activists/advocates into a goal function that could be implemented by one of these soft-touch expert systems. That would be rad. Whether anybody would be interested in using or investing in such a system is an important open question. Not a wide open question–the answer is probably “Not really”–but just open enough to let some air onto the embers of my idealism.

Horkheimer and Wiener

[I began writing this weeks ago and never finished it. I’m posting it here in its unfinished form just because.]

I think I may be condemning myself to irrelevance by reading so many books. But as I make an effort to read up on the foundational literature of today’s major intellectual traditions, I can’t help but be impressed by the richness of their insight. Something has been lost.

I’m currently reading Norbert Wiener’s The Human Use of Human Beings (1950) and Max Horkheimer’s Eclipse of Reason (1947). The former I am reading for the Berkeley School of Information Classics reading group. Norbert Wiener was one of the foundational mathematicians of 20th century information technology, a colleague of Claude Shannon. Out of his own sense of social responsibility, he articulated his predictions for the consequences of the technology he developed in Human Use. This work was the foundation of cybernetics, an influential school of thought in the 20th century. Terrell Bynum, in his Stanford Encyclopedia of Philosophy article on “Computer and Information Ethics“, attributes to Wiener’s cybernetics the foundation of all future computer ethics. (I think that the threads go back earlier, at least through to Heidegger’s Question Concerning Technology. (EDIT: Actually, QCT was published, it seems, in 1954, after Weiner’s book.)) It is hard to find a straight answer to the question of what happened to cybernetics?. By some reports, the artificial intelligence community cut their NSF funding in the 60’s.

Horkheimer is one of the major thinkers of the very influential Frankfurt School, the postwar social theorists at the core of intellectual critical theory. Of the Frankfurt School, perhaps the most famous in the United States is Adorno. Adorno is also the most caustic and depressed, and unfortunately much of popular critical theory now takes on his character. Horkheimer is more level-headed. Eclipse of Reason is an argument about the ways that philosophical empiricism and pragmatism became complicit in fascism. Here is an interested quotation.

It is very interesting to read them side by side. Published only a few years apart, Wiener and Horkheimer are giants of two very different intellectual traditions. There’s little reason to expect they ever communicated (a more thorough historian would know more). But each makes sweeping claims about society, language, and technology and contextualizes them in broader intellectual awareness of religion, history and science.

Horkheimer writes about how the collapse of the Enlightment project of objective reason has opened the way for a society ruled by subjective reason, which he characterizes as the reason of formal mathematics and scientific thinking that is neutral to its content. It is instrumental thinking in its purest, most rigorous form. His descriptions of it sound like gestures to what we today call “data science”–a set of mechanical techniques that we can use to analyze and classify anything, perfecting our understanding of technical probabilities towards whatever ends one likes.

I find this a more powerful critique of data science than recent paranoia about “algorithms”. It is frustrating to read something over sixty years old that covers the same ground as we are going over again today but with more composure. Mathematized reasoning about the world is an early 20th century phenomenon and automated computation a mid-20th century phenomenon. The disparities in power that result from the deployment of these tools were thoroughly discussed at the time.

But today, at least in my own intellectual climate, it’s common to hear a mention of “logic” with the rebuttal “whose logic?“. Multiculturalism and standpoint epistemology, profoundly important for sensitizing researchers to bias, are taken to an extreme the glorifies technical ignorance. If the foundation of knowledge is in one’s lived experience, as these ideologies purport, and one does not understand the technical logic used so effectively by dominant identity groups, then one can dismiss technical logic as merely a cultural logic of an opposing identity group. I experience the technically competent person as the Other and cannot perceive their actions as skill but only as power and in particular power over me. Because my lived experience is my surest guide, what I experience must be so!

It is simply tragic that the education system has promoted this kind of thinking so much that it pervades even mainstream journalism. This is tragic for reasons I’ve expressed in “objectivity is powerful“. One solution is to provide more accessible accounts of the lived experience of technicality through qualitative reporting, which I have attempted in “technical work“.

But the real problem is that the kind of formal logic that is at the foundation of modern scientific thought, including its most recent manifestation ‘data science’, is at its heart perfectly abstract and so cannot be captured by accounts of observed practices or lived experience. It is reason or thought. Is it disembodied? Not exactly. But at least according to constructivist accounts of mathematical knowledge, which occupy a fortunate dialectical position in this debate, mathematical insight is built from embodied phenomenological primitives but by their psychological construction are abstract. This process makes it possible for people to learn abstract principles such as the mathematical theory of information on which so much of the contemporary telecommunications and artificial intelligence apparatus depends. These are the abstract principles with which the mathematician Norbert Wiener was so intimately familiar.

Privacy, trust, context, and legitimate peripheral participation

Privacy is important. For Nissenbaum, what’s essential to privacy is control over context. But what is context?

Using Luhmann’s framework of social systems–ignoring for a moment e.g. Habermas’ criticism and accepting the naturalized, systems theoretic understanding of society–we would have to see a context as a subsystem of the total social system. In so far as the social system is constituted by many acts of communication–let’s visualize this as a network of agents, whose edges are acts of communication–then a context is something preserved by configurations of agents and the way they interact.

Some of the forces that shape a social system will be exogenous. A river dividing two cities or, more abstractly, distance. In the digital domain, the barriers of interoperability between one virtual community infrastructure and another.

But others will be endogenous, formed from the social interactions themselves. An example is the gradual deepening of trust between agents based on a history of communication. Perhaps early conversations are formal, stilted. Later, an agent takes a risk, sharing something more personal–more private? It is reciprocated. Slowly, a trust bond, an evinced sharing of interests and mutual investment, becomes the foundation of cooperation. The Prisoner’s Dilemma is solved the old fashioned way.

Following Carey’s logic that communication as mere transmission when sustained over time becomes communication as ritual and the foundation of community, we can look at this slow process of trust formation as one of the ways that a context, in Nissenbaum’s sense, perhaps, forms. If Anne and Betsy have mutually internalized each others interests, then information flow between them will by and large support the interests of the pair, and Betsy will have low incentives to reveal private information in a way that would be detrimental to Anne.

Of course this is a huge oversimplification in lots of ways. One way is that it does not take into account the way the same agent may participant in many social roles or contexts. Communication is not a single edge from one agent to another in many circumstances. Perhaps the situation is better represented as a hypergraph. One reason why this whole domain may be so difficult to reason about is the sheer representational complexity of modeling the situation. It may require the kind of mathematical sophistication used by quantum physicists. Why not?

Not having that kind of insight into the problem yet, I will continue to sling what the social scientists call ‘theory’. Let’s talk about an exisiting community of practice, where the practice is a certain kind of communication. A community of scholars. A community of software developers. Weird Twitter. A backchannel mailing list coordinating a political campaign. A church.

According to Lave and Wenger, the way newcomers gradually become members and oldtimers of a community of practice is legitimate peripheral participation. This is consistent with the model described above characterizing the growth of trust through gradually deepening communication. Peripheral participation is low-risk. In an open source context, this might be as simple as writing a question to the mailing list or filing a bug report. Over time, the agent displays good faith and competence. (I’m disappointed to read just now that Wenger ultimately abandoned this model in favor of a theory of dualities. Is that a Hail Mary for empirical content for the theory? Also interested to follow links on this topic to a citation of von Krogh 1998, whose later work found its way onto my Open Collaboration and Peer Production syllabus. It’s a small world.

I’ve begun reading as I write this fascinating paper by Hildreth and Kimble 2002 and am now have lost my thread. Can I recover?)

Some questions:

  • Can this process of context-formation be characterized empirically through an analysis of e.g. the timing dynamics of communication (c.f. Thomas Maillart’s work)? If so, what does that tell us about the design of information systems for privacy?
  • What about illegitimate peripheral participation? Arguably, this blog is that kind of participation–it participates in a form of informal, unendorsed quasi-scholarship. It is a tool of context and disciplinary collapse. Is that a kind of violation of privacy? Why not?