Digifesto

Category: ideawork

The instability of adversarial roles

One topic I’m interested in researching is the automated detection of certain kinds of social (or anti-social) activity on the internet. This paper. “Visualizing the Signatures of Social Roles in Online Discussion Groups”, by Welser et al., is a good example of a stab at the problem. Can we look at data from a mailing list and identify the most helpful person on it? Welser thinks so, and they develop a preliminary model for detecting them.

That’s all well and good until the roles get more complicated. A great way to make things more complicated is by introducing an adversarial relationship into the mix. The internet is rife with adversity, in the form of flame warriors, trolls, and spammers. There is also much more benign disagreement as well, but this is probably comparatively rare. Or (this is a broad claim based in cynicism, not research:) people are so likely to take disagreement and conflict personally or dismissively that much legitimate conflict on the Internet is probably seen as flaming, trolling, or spamming.

The problem is that it is very hard to pin down the definitions of these terms. This isn’t just a conceptual problem. It’s also a problem for engineering solutions around these roles. Spam filtering, for example, depends on a certain model of what counts as spam. While training a classifier based on a user’s subjective classification makes lots of sense in some circumstances (like a mail filter), in other cases the line may be less clear. Trolling, meanwhile, can be death to a web-based community. Once could argue that Pinterest has only been successful partly because it has been able to keep the trolls out. But in other contexts, where vigorous debate is encouraged, standards of ‘trolling’ may differ dramatically.

Horse_ebooks is spam content (or, spam detection evasion content) that that has turned into a viral meme. Trolls sometimes become accepted members of a web community, understood to be entertaining and serving as rites of passage to n00bs. So these roles are not necessarily fixed.

With so much data about such roles available for analysis, research into these questions could teach us a lot more about human communication and conflict.

Computing power

I’m working on a project analyzing Twitter data with Sean Chen for a class. I am learning one of the simple pleasures of scientific computing, which is watching your machine ramp up to use all its processing power because numpy is crunching some big arrays.

There’s a sense in which computing power is the limited resource for humanity these days.

We have the vast canon of recorded human thought available as digitized text. We have countless sensors, oceans of data, very accurate models of the fundamental mechanics of our universe. We we lack is the ability to synthesize that data and learn as much as we could from it.

This isn’t new; brains are an important source of computing power, and in fact a remarkably efficient one. But digital processing and memory have accelerated human thought to such a degree that we have outpaced ourselves.

It doesn’t help that so much of this precious resource is used against itself. The processing power that spammers use to spread new spam is pitted against the processing needed to identify and block it. We revere projects like reCAPTCHA because they harness that computing power that otherwise goes to waste for something good.

So, there is something heartwarming about see that my little lappy is running at full steam. It’s actualizing some potential. I hope I’m putting it to good use.

EDIT: Ironically, just an hour or so after I wrote this, my laptop shut down spontaneously and wouldn’t restart until I took the battery in and out. Maybe lappy couldn’t handle it after all. I’ll be doing more intensive computing on the cloud from now on.