machine learning | Digifesto

December 2, 2017

How to promote employees using machine learning without societal bias

Though it may at first read as being callous, a managerialist stance on inequality in statistical classification can help untangle some of the rhetoric around this tricky issue.

Consider the example that’s been in the news lately:

Suppose a company begins to use an algorithm to make decisions about which employees to promote. It uses a classifier trained on past data about who has been promoted. Because of societal bias, women are systematically under-promoted; this is reflected in the data set. The algorithm, naively trained on the historical data, reproduces the historical bias.

This example describes a bad situation. It is bad from a social justice perspective; by assumption, it would be better if men and women had equal opportunity in this work place.

It is also bad from a managerialist perspective. Why? Because if the point of using an algorithm were not to correct for societal biases introducing irrelevancies into the promotion decision, then it would not make managerial sense to change business practices over to using an algorithm. The whole point of using an algorithm is to improve on human decision-making. This is a poor match of an algorithm to a problem.

Unfortunately, what makes this example compelling is precisely what makes it a bad example of using an algorithm in this context. The only variables discussed in the example are the socially salient ones thick with political implications: gender, and promotion. What are more universal concerns than gender relations and socioeconomic status?!

But from a managerialist perspective, promotions should be issued based on a number of factors not mentioned in the example. What factors are these? That’s a great and difficult question. Promotions can reward hard work and loyalty. They can also be issued to those who demonstrate capacity for leadership, which can be a function of how well they get along with other members of the organization. There may be a number of features that predict these desirable qualities, most of which will have to do with working conditions within the company as opposed to qualities inherent in the employee (such as their past education, or their gender).

If one were to start to use machine learning intelligently to solve this problem, then one would go about solving it in a way entirely unlike the procedure in the problematic example. One would rather draw on soundly sourced domain expertise to develop a model of the relationship between relevant, work-related factors. For many of the key parts of the model, such as general relationships between personality type, leadership style, and cooperation with colleagues, one would look outside the organization for gold standard data that was sampled responsibly.

Once the organization has this model, then it can apply it to its own employees. For this to work, employees would need to provide significant detail about themselves, and the company would need to provide contextual information about the conditions under which employees work, as these may be confounding factors.

Part of the merit of building and fitting such a model would be that, because it is based on a lot of new and objective scientific considerations, it would produce novel results in recommending promotions. Again, if the algorithm merely reproduced past results, it would not be worth the investment in building the model.

When the algorithm is introduced, it ideally is used in a way that maintains traditional promotion processes in parallel so that the two kinds of results can be compared. Evaluation of the algorithm’s performance, relative to traditional methods, is a long, arduous process full of potential insights. Using the algorithm as an intervention at first allows the company to develop a causal understanding its impact. Insights from the evaluation can be factored back into the algorithm, improving the latter.

In all these cases, the company must keep its business goals firmly in mind. If they do this, then the rest of the logic of their method falls out of data science best practices, which are grounded in mathematical principles of statistics. While the political implications of poorly managed machine learning are troubling, effective management of machine learning which takes the precautions necessary to develop objectivity is ultimately a corrective to social bias. This is a case where sounds science and managerialist motives and social justice are aligned.

Leave a comment

November 29, 2014

textual causation

A problem that’s coming up for me as a data scientist is the problem of textual causation.

There has been significant interesting research into the problem of extracting causal relationships between things in the world from text about those things. That’s an interesting problem but not the problem I am talking about.

I am talking about the problem of identifying when a piece of text has been the cause of some event in the world. So, did the State of the Union address affect the stock prices of U.S. companies? Specifically, did the text of the State of the Union address affect the stock price? Did my email cause my company to be more productive? Did specifically what I wrote in the email make a difference?

A trivial example of textual causation (if I have my facts right–maybe I don’t) is the calculation of Twitter trending topics. Millions of users write text. That text is algorithmically scanned and under certain conditions, Twitter determines a topic to be trending and displays it to more users through its user interface, which also uses text. The user interface text causes thousands more users to look at what people are saying about the topic, increasing the causal impact of the original text. And so on.

These are some challenges to understanding the causal impact of text:

Text is an extraordinarily high-dimensional space with tremendous irregularity in distribution of features.
Textual events are unique not just because the probability of any particular utterance is so low, but also because the context of an utterance is informed by all the text prior to it.
For the most part, text is generated by a process of unfathomable complexity and interpreted likewise.
A single ‘piece’ of text can appear and reappear in multiple contexts as distinct events.

I am interested in whether it is possible to get a grip on textual causation mathematically and with machine learning tools. Bayesian methods theoretically can help with the prediction of unique events. And the Pearl/Rubin model of causation is well integrated with Bayesian methods. But is it possible to use the Pearl/Rubin model to understand unique events? The methodological uses of Pearl/Rubin I’ve seen are all about establishing type causation between independent occurrences. Textual causation appears to be as a rule a kind of token causation in a deeply integrated contextual web.

Perhaps this is what makes the study of textual causation uninteresting. If it does not generalize, then it is difficult to monetize. It is a matter of historical or cultural interest.

But think about all the effort that goes into communication at, say, the operational level of an organization. How many jobs require “excellent communication skills.” A great deal of emphasis is placed not only on that communication happens, but how people communicate.

One way to approach this is using the tools of linguistics. Linguistics looks at speech and breaks it down into components and structures that can be scientifically analyzed. It can identify when there are differences in these components and structures, calling these differences dialects or languages.

Leave a comment

November 4, 2014

prediction and computational complexity

To the extent that an agent is predictable, it must be:

observable, and
have a knowable internal structure

The first implies that the predictor has collected data emitted by the agent.

The second implies that the agent has internal structure and that the predictor has the capacity to represent the internal structure of the other agent.

In general, we can say that people do not have the capacity to explicitly represent other people very well. People are unpredictable to each other. This is what makes us free. When somebody is utterly predictable to us, their rigidity is a sign of weakness or stupidity. They are following a simple algorithm.

We are able to model the internal structure of worms with available computing power.

As we build more and more powerful predictive systems, we can ask: is our internal structure in principle knowable by this powerful machine?

This is different from the question of whether or not the predictive machine has data from which to draw inferences. Though of course the questions are related in their implications.

I’ve tried to make progress on modeling this with limited success. Spiros has just told me about binary decision diagrams which are a promising lead.

Leave a comment

May 3, 2008

Filtering feeds

About a week ago Subtraction made a long post complaining about the main problem of feed aggregators:

No matter how much I try to organize it, it’s always in disarray, overflowing with unread posts and encumbered with mothballed feeds. … The whole process frustrates me though, mostly because I feel like I shouldn’t have to do it at all. The software should just do it for me.

These are my reactions to this, roughly in order:

I feel the pain of feed bloat myself, and know many others that do. It’s another symptom of internet-enabled information explosion.
It’s amazing that we live in an era when a feeling of entitlement about our interactions with web technology isn’t seen as ridiculous outright. It’s true–it does feel surprising that somebody smart hasn’t solved this problem for everybody yet.
The reason why it hasn’t been solved yet is probably because it’s a tough problem. It’s not easy to program a computer to know What I Find Interesting…

…or is it? This is, after all, what various web services have fought to do well for us ever since the dawn of the search engine. And the results are pretty good right now. So there must be a good way to solve this problem.

As far as I can tell, there are two successful ways of doing smart filtering-for-people on the internet, both of which are being applied to feeds:

Using direct social recommendations to let people know what other people find interesting. Digg is probably the best example of this. In the feed domain, Google Reader’s “Shared items” does this.
Machine learning techniques. Probably the most successful implementation of this on the internet are Bayesian spam filters. The Tao of Mac reports on a writer’s personal experiment to add a Bayesian filter to his feed aggregator. It’s a qualified success.

The most interesting solutions to these kinds of problems are collaborative filtering algorithms that combine both methods. This is why Gmail’s spam filter is so good: it uses the input of its gillions of users to collaborative train its algorithmic filter. StumbleUpon is probably my favorite implementation of this for general web content–although its closed-ness spooks me out.

We’re working on applying collaborative filtering methods to feeds at The Open Planning Project. Specifically, Luke Tucker has been developing Melkjug, an open source collaborative filtering feed aggregator. It’s currently in version 0.2.1. To get involved in the project, check out the Melkjug Project page on OpenPlans.org.

Leave a comment

Tag: machine learning