Digifesto

Tag: causation

How to promote employees using machine learning without societal bias

Though it may at first read as being callous, a managerialist stance on inequality in statistical classification can help untangle some of the rhetoric around this tricky issue.

Consider the example that’s been in the news lately:

Suppose a company begins to use an algorithm to make decisions about which employees to promote. It uses a classifier trained on past data about who has been promoted. Because of societal bias, women are systematically under-promoted; this is reflected in the data set. The algorithm, naively trained on the historical data, reproduces the historical bias.

This example describes a bad situation. It is bad from a social justice perspective; by assumption, it would be better if men and women had equal opportunity in this work place.

It is also bad from a managerialist perspective. Why? Because if the point of using an algorithm were not to correct for societal biases introducing irrelevancies into the promotion decision, then it would not make managerial sense to change business practices over to using an algorithm. The whole point of using an algorithm is to improve on human decision-making. This is a poor match of an algorithm to a problem.

Unfortunately, what makes this example compelling is precisely what makes it a bad example of using an algorithm in this context. The only variables discussed in the example are the socially salient ones thick with political implications: gender, and promotion. What are more universal concerns than gender relations and socioeconomic status?!

But from a managerialist perspective, promotions should be issued based on a number of factors not mentioned in the example. What factors are these? That’s a great and difficult question. Promotions can reward hard work and loyalty. They can also be issued to those who demonstrate capacity for leadership, which can be a function of how well they get along with other members of the organization. There may be a number of features that predict these desirable qualities, most of which will have to do with working conditions within the company as opposed to qualities inherent in the employee (such as their past education, or their gender).

If one were to start to use machine learning intelligently to solve this problem, then one would go about solving it in a way entirely unlike the procedure in the problematic example. One would rather draw on soundly sourced domain expertise to develop a model of the relationship between relevant, work-related factors. For many of the key parts of the model, such as general relationships between personality type, leadership style, and cooperation with colleagues, one would look outside the organization for gold standard data that was sampled responsibly.

Once the organization has this model, then it can apply it to its own employees. For this to work, employees would need to provide significant detail about themselves, and the company would need to provide contextual information about the conditions under which employees work, as these may be confounding factors.

Part of the merit of building and fitting such a model would be that, because it is based on a lot of new and objective scientific considerations, it would produce novel results in recommending promotions. Again, if the algorithm merely reproduced past results, it would not be worth the investment in building the model.

When the algorithm is introduced, it ideally is used in a way that maintains traditional promotion processes in parallel so that the two kinds of results can be compared. Evaluation of the algorithm’s performance, relative to traditional methods, is a long, arduous process full of potential insights. Using the algorithm as an intervention at first allows the company to develop a causal understanding its impact. Insights from the evaluation can be factored back into the algorithm, improving the latter.

In all these cases, the company must keep its business goals firmly in mind. If they do this, then the rest of the logic of their method falls out of data science best practices, which are grounded in mathematical principles of statistics. While the political implications of poorly managed machine learning are troubling, effective management of machine learning which takes the precautions necessary to develop objectivity is ultimately a corrective to social bias. This is a case where sounds science and managerialist motives and social justice are aligned.

causal inference in networks is hard

I am trying to make statistically valid inferences about the mechanisms underlying observational networked data and it is really hard.

Here’s what I’m up against:

  • Even though my data set is a complete ecologically valid data set representing a lot of real human communication over time, it (tautologically) leaves out everything that it leaves out. I can’t even count all the latent variables.
  • The best methods for detecting causal mechanism, the potential outcomes framework for Rubin model, depends on the assumption that different members of the sample don’t interfere. But I’m working with networked data. Everything interferes with everything else, at least indirectly. That’s why it’s a network.
  • Did I mention that I’m working with communications data? What’s interesting about human communication is that it’s not really generated at random at all. It’s very deliberately created by people acting more or less intelligently all the time. If the phenomenon I’m studying is not more complex than the models I’m using to study it, then there is something seriously wrong with the people I’m studying.

I think I can deal with the first point here by gracefully ignoring it. It may be true that any apparent causal effect in my data is spurious and due to a common latent cause upstream. It may be true that the variance in the data is largely due to exogenous factors. Fine. That’s noise. I’m looking for a reliable endogenous signal. If there isn’t something there that would suggest that my entire data set is epiphenomal. But I know it’s not. So there’s got to be something there.

For the second point, there are apparently sophisticated methods for extending the potential outcomes framework to handling peer effects. These are gnarly and though I figure I could work with them, I don’t think they are going to be what I need because I’m not really looking for a causal relationship like a statistical relationship between treatment and outcome. I’m not after in the first instance what might be called type causation. I’m rather trying to demonstrate cases of token causation where causation is literally the transfer of information from object to another. And then I’m trying to show regularity in this underlying kind of causation in a layer of abstraction over it.

The best angle I can come up with on this so far is to use emergent properties of the network like degree assortativity to sort through potential mathematically defined graph generation algorithms. These algorithms can act as alternative hypotheses, and the observed emergent properties can theoretically be used to compute the likelihood of the observed data given the generation methods. Then all I need is a prior over graph generation methods! It’s perfectly Bayesian! I wonder if it is at all feasible to execute on. I will try.

It’s not 100% clear how you can take an algorithmically defined process and turn that into a hypothesis about causal mechanisms. Theoretically, as long as a causal network has computable conditional dependencies it can be represented by and algorithm. I believe that any algorithm (in the Church/Turing sense) can be represented as a causal network. Can this be done elegantly, so that the corresponding causal network represents something like what we’d expect from the scientific theory on the matter? This is unclear because, again, Pearl’s causal networks are great at representing type causation but not as expressive of token causation among a large population of uniquely positioned, generatively produced stuff. Pearl is not good at modeling life, I think.

The strategic activity of the actors is a modeling challenge but I think this is actually where there is substantive potential in this kind of research. If effective strategic actors are working in a way that is observably different from naive actors in some way that’s measurable in aggregate behavior, that’s a solid empirical result! I have some hypotheses around this that I think are worth checking. For example, probably the success of an open source community depends in part on whether members of the community act in ways that successfully bring new members in. Strategies that cultivate new members are going to look different from strategies that exclude newcomers or try to maintain a superior status. Based on some preliminary results, it looks like this difference between successful open source projects and most other social networks is observable in the data.

textual causation

A problem that’s coming up for me as a data scientist is the problem of textual causation.

There has been significant interesting research into the problem of extracting causal relationships between things in the world from text about those things. That’s an interesting problem but not the problem I am talking about.

I am talking about the problem of identifying when a piece of text has been the cause of some event in the world. So, did the State of the Union address affect the stock prices of U.S. companies? Specifically, did the text of the State of the Union address affect the stock price? Did my email cause my company to be more productive? Did specifically what I wrote in the email make a difference?

A trivial example of textual causation (if I have my facts right–maybe I don’t) is the calculation of Twitter trending topics. Millions of users write text. That text is algorithmically scanned and under certain conditions, Twitter determines a topic to be trending and displays it to more users through its user interface, which also uses text. The user interface text causes thousands more users to look at what people are saying about the topic, increasing the causal impact of the original text. And so on.

These are some challenges to understanding the causal impact of text:

  • Text is an extraordinarily high-dimensional space with tremendous irregularity in distribution of features.
  • Textual events are unique not just because the probability of any particular utterance is so low, but also because the context of an utterance is informed by all the text prior to it.
  • For the most part, text is generated by a process of unfathomable complexity and interpreted likewise.
  • A single ‘piece’ of text can appear and reappear in multiple contexts as distinct events.

I am interested in whether it is possible to get a grip on textual causation mathematically and with machine learning tools. Bayesian methods theoretically can help with the prediction of unique events. And the Pearl/Rubin model of causation is well integrated with Bayesian methods. But is it possible to use the Pearl/Rubin model to understand unique events? The methodological uses of Pearl/Rubin I’ve seen are all about establishing type causation between independent occurrences. Textual causation appears to be as a rule a kind of token causation in a deeply integrated contextual web.

Perhaps this is what makes the study of textual causation uninteresting. If it does not generalize, then it is difficult to monetize. It is a matter of historical or cultural interest.

But think about all the effort that goes into communication at, say, the operational level of an organization. How many jobs require “excellent communication skills.” A great deal of emphasis is placed not only on that communication happens, but how people communicate.

One way to approach this is using the tools of linguistics. Linguistics looks at speech and breaks it down into components and structures that can be scientifically analyzed. It can identify when there are differences in these components and structures, calling these differences dialects or languages.