judea pearl | Digifesto

causal inference in networks is hard

I am trying to make statistically valid inferences about the mechanisms underlying observational networked data and it is really hard.

Here’s what I’m up against:

Even though my data set is a complete ecologically valid data set representing a lot of real human communication over time, it (tautologically) leaves out everything that it leaves out. I can’t even count all the latent variables.
The best methods for detecting causal mechanism, the potential outcomes framework for Rubin model, depends on the assumption that different members of the sample don’t interfere. But I’m working with networked data. Everything interferes with everything else, at least indirectly. That’s why it’s a network.
Did I mention that I’m working with communications data? What’s interesting about human communication is that it’s not really generated at random at all. It’s very deliberately created by people acting more or less intelligently all the time. If the phenomenon I’m studying is not more complex than the models I’m using to study it, then there is something seriously wrong with the people I’m studying.

I think I can deal with the first point here by gracefully ignoring it. It may be true that any apparent causal effect in my data is spurious and due to a common latent cause upstream. It may be true that the variance in the data is largely due to exogenous factors. Fine. That’s noise. I’m looking for a reliable endogenous signal. If there isn’t something there that would suggest that my entire data set is epiphenomal. But I know it’s not. So there’s got to be something there.

For the second point, there are apparently sophisticated methods for extending the potential outcomes framework to handling peer effects. These are gnarly and though I figure I could work with them, I don’t think they are going to be what I need because I’m not really looking for a causal relationship like a statistical relationship between treatment and outcome. I’m not after in the first instance what might be called type causation. I’m rather trying to demonstrate cases of token causation where causation is literally the transfer of information from object to another. And then I’m trying to show regularity in this underlying kind of causation in a layer of abstraction over it.

The best angle I can come up with on this so far is to use emergent properties of the network like degree assortativity to sort through potential mathematically defined graph generation algorithms. These algorithms can act as alternative hypotheses, and the observed emergent properties can theoretically be used to compute the likelihood of the observed data given the generation methods. Then all I need is a prior over graph generation methods! It’s perfectly Bayesian! I wonder if it is at all feasible to execute on. I will try.

It’s not 100% clear how you can take an algorithmically defined process and turn that into a hypothesis about causal mechanisms. Theoretically, as long as a causal network has computable conditional dependencies it can be represented by and algorithm. I believe that any algorithm (in the Church/Turing sense) can be represented as a causal network. Can this be done elegantly, so that the corresponding causal network represents something like what we’d expect from the scientific theory on the matter? This is unclear because, again, Pearl’s causal networks are great at representing type causation but not as expressive of token causation among a large population of uniquely positioned, generatively produced stuff. Pearl is not good at modeling life, I think.

The strategic activity of the actors is a modeling challenge but I think this is actually where there is substantive potential in this kind of research. If effective strategic actors are working in a way that is observably different from naive actors in some way that’s measurable in aggregate behavior, that’s a solid empirical result! I have some hypotheses around this that I think are worth checking. For example, probably the success of an open source community depends in part on whether members of the community act in ways that successfully bring new members in. Strategies that cultivate new members are going to look different from strategies that exclude newcomers or try to maintain a superior status. Based on some preliminary results, it looks like this difference between successful open source projects and most other social networks is observable in the data.

textual causation

A problem that’s coming up for me as a data scientist is the problem of textual causation.

There has been significant interesting research into the problem of extracting causal relationships between things in the world from text about those things. That’s an interesting problem but not the problem I am talking about.

I am talking about the problem of identifying when a piece of text has been the cause of some event in the world. So, did the State of the Union address affect the stock prices of U.S. companies? Specifically, did the text of the State of the Union address affect the stock price? Did my email cause my company to be more productive? Did specifically what I wrote in the email make a difference?

A trivial example of textual causation (if I have my facts right–maybe I don’t) is the calculation of Twitter trending topics. Millions of users write text. That text is algorithmically scanned and under certain conditions, Twitter determines a topic to be trending and displays it to more users through its user interface, which also uses text. The user interface text causes thousands more users to look at what people are saying about the topic, increasing the causal impact of the original text. And so on.

These are some challenges to understanding the causal impact of text:

Text is an extraordinarily high-dimensional space with tremendous irregularity in distribution of features.
Textual events are unique not just because the probability of any particular utterance is so low, but also because the context of an utterance is informed by all the text prior to it.
For the most part, text is generated by a process of unfathomable complexity and interpreted likewise.
A single ‘piece’ of text can appear and reappear in multiple contexts as distinct events.

I am interested in whether it is possible to get a grip on textual causation mathematically and with machine learning tools. Bayesian methods theoretically can help with the prediction of unique events. And the Pearl/Rubin model of causation is well integrated with Bayesian methods. But is it possible to use the Pearl/Rubin model to understand unique events? The methodological uses of Pearl/Rubin I’ve seen are all about establishing type causation between independent occurrences. Textual causation appears to be as a rule a kind of token causation in a deeply integrated contextual web.

Perhaps this is what makes the study of textual causation uninteresting. If it does not generalize, then it is difficult to monetize. It is a matter of historical or cultural interest.

But think about all the effort that goes into communication at, say, the operational level of an organization. How many jobs require “excellent communication skills.” A great deal of emphasis is placed not only on that communication happens, but how people communicate.

One way to approach this is using the tools of linguistics. Linguistics looks at speech and breaks it down into components and structures that can be scientifically analyzed. It can identify when there are differences in these components and structures, calling these differences dialects or languages.

Digifesto

Tag: judea pearl

April 8, 2015

causal inference in networks is hard

November 29, 2014

textual causation