representation learning

It all comes back to Artificial Intelligence

I am blessed with many fascinating conversations every week. Because of the field I am in, these conversations are mainly about technology and people and where they intersect.

Sometimes they are about philosophical themes like how we know anything, or what is ethical. These topics are obviously relevant to an academic researcher, especially when one is interested in computational social science, a kind of science whose ethics have lately been called into question. Other times they are about the theoretical questions that such a science should or could address, like: how do we identify leaders? Or determine what are the ingredients for a thriving community? What is creativity, and how can we mathematically model how it arises from social interaction?

Sometimes the conversations are political. Is it a problem that algorithms are governing more of our political lives and culture? If so, what should we do about it?

The richest and most involved conversations, though, are about artificial intelligence (AI). As a term, it has fallen out of fashion. I was very surprised to see it as a central concept in Bengio et al.’s “Representation Learning: A Review and New Perspectives” [arXiv]. In most discussion scientific computing or ‘data science’ for the most part people have abandoned the idea of intelligent machines. Perhaps this is because so many of the applications of this technology seem so prosaic now. Curating newsfeeds, for example. That can’t be done intelligently. That’s just an algorithm.

Never mind that the origins of all of what we now call machine learning was in the AI research program, which is as old as computer science itself and really has grown up with it. Marvin Minsky famously once defined artificial intelligence as ‘whatever humans still do better than computers.’ And this is the curse of the field. With every technological advance that is at the time mind-blowingly powerful, performing a task that it used to require hundreds of people to perform, it very shortly becomes mere technology.

It’s appropriate then that representation learning, the problem of deriving and selecting features from a complex data set that are valuable for other kinds of statistical analysis in other tasks, is brought up in the context of AI. Because this is precisely the sort of thing that people still think they are comparatively good at. A couple years ago, everyone was talking about the phenomenon of crowdsourced image tagging. People are better at seeing and recognizing objects in images than computers, so in order to, say, provide the data for Google’s Image search, you still need to mobilize lots of people. You just have to organize them as if they were computer functions so that you can properly aggregate their results.

On of the earliest tasks posed to AI, the Turing Test, proposed and named after Alan Turing, the inventor of the fricking computer, is the task of engaging in conversation as if one is a human. This is harder than chess. It is harder than reading handwriting. Something about human communication is so subtle that it has withstood the test of time as an unsolved problem.

Until June of this year, when a program passed the Turing Test in the annual competition. Conversation is no longer something intelligent. It can be performed by a mere algorithm. Indeed, I have heard that a lot of call centers now use scripted dialog. An operator pushes buttons guiding the caller through a conversation that has already been written for them.

So what’s next?

I have a proposal: software engineering. We still don’t have an AI that can write its own source code.

How could we create such an AI? We could use machine learning, training it on data. What’s amazing is that we have vast amounts of data available on what it is like to be a functioning member of a software development team. Open source software communities have provided an enormous corpus of what we can guess is some of the most complex and interesting data ever created. Among other things, this software includes source code for all kinds of other algorithms that were once considered AI.

One reason why I am building BigBang, a toolkit for the scientific analysis of software communities, is because I believe it’s the first step to a better understanding of this very complex and still intelligent process.

While above I have framed AI pessimistically–as what we delegate away from people to machines, that is unnecessarily grim. In fact, with every advance in AI we have come to a better understanding of our world and how we see, hear, think, and do things. The task of trying to scientifically understand how we create together and the task of developing an AI to create with us is in many ways the same task. It’s just a matter of how you look at it.

Digifesto

Tag: representation learning

October 18, 2014

It all comes back to Artificial Intelligence