Digifesto

Category: Uncategorized

thinking about computational social science

I’m facing a challenging paradox in how to approach my research.

On the one hand, we have the trend of increasing instrumentation of society. From quantified self to the Internet of things to Netflix clicks to the fully digitized archives of every newspaper, we have more data than we’ve ever had before to ask fundamental social scientific questions.

That should make it easier to research society and infer principles about how it works. But there is a long-standing counterpoint in the social sciences that claims that all social phenomena are sui generis and historically situated. If no social phenomenon generalizes, then it shouldn’t be possible to infer anything from the available data, no matter how much of it there is.

One view is that we should only be able to infer stuff that isn’t very interesting at all. One name for this view is “punctuated equilibrium.” The national borders of countries don’t move around…until they do. Regimes don’t change…until they do. It’s the ability to predict these kinds of political events that Philip Tetlock has called “expert political judgment.” The Good Judgment Project is a test to see what properties make a person or team of people good at this kind of task.

What now seems like many years ago I wrote a book review of Tetlock’s book. In that review, I pointed out a facet of Tetlock’s research I found most compelling but underdeveloped: that the best predictors he found were algorithmic predictors that drew their conclusions from linear regressions drawn from just the top three or so salient features in the data.

Six or so years later, Big Data is a powerful enough industrial and political phenomenon academic social science feels it needs to catch up. But to a large extent industrial data science is still about using pretty basic statistical models drawn from physics (that assume that everything stands in Gaussian relations to everything else, say), or otherwise applying a broad range of modeling techniques and aggregating them under statistical boosting. This is great for edge out the competition on selling ads.

But it tells us nothing about the underlying structure of what’s going on in society. And it’s possible that the fact that we haven’t done any better is really a condemnation of the whole process of social science in general. The data we are getting, rather than making us understand what’s going on around us better, is perhaps just proving to us that it’s a complex chaotic system. If so, the better we understand it, the more we will lose our confidence in our ability to predict it.

Historically, we’ve been through all this before. The mid-20th century saw the expansion of scope of Norbert Weiner’s cybernetics from electrical engineering of homeostatic machines to modeling of the political system and the economy as complex feedback systems. Indeed, cybernetics was intended as a theory of steering systems by thinking about their communications mechanisms. (Wikipedia: “The word “cybernetics” comes from the Greek word κυβερνητική (kyverni̱tikí̱, “government”), i.e. all that are pertinent to κυβερνώ (kyvernó̱), the latter meaning to “steer,” “navigate” or “govern,” hence κυβέρνησις (kyvérni̱sis, “government”) is the government while κυβερνήτης (kyverní̱ti̱s) is the governor or the captain.”) These models were on some level interesting and intuitive, even beautiful in their ambition. But they failed in their applications because social systems did not obey the kind of regularity that systems engineered for reliable equilibria did.

The difficulty with applying these theories that acknowledge the complexity of the social system to reality is that they are only explanatory in retrospect because other the path dependence of history. That’s pretty close to rendering them pseudoscientific.

Nevertheless, there are countless pressing societal challenges–climate change, unfair crime laws, war, political crisis, public health policy–on which social scientific research must be brought to bear, because there is a dimension to them which is a problem of predicting social action.

It is possible (I wonder if it’s necessary) that there are laws–perhaps just local laws–of social activity. Most people certainly believe their are. Business strategy, for example, depends on so much theorizing about the market and the relationships between different companies and their products. If these laws exist, they must be operationalizable and discoverable in the data itself.

But there is the problem of the researcher’s effect on the system being observed and, even more confounding, the result of the researcher’s discovery on the system itself. When a social system becomes self-aware through a particular theoretical lens, it can change its behavior. (I’ve heard that Milton Friedman’s monetarist economics are fantastically predictive of economic growth in the United States right up until he published them.)

If reflexivity contributes to social entropy, then it’s not clear what the point of any social research agenda is.

The one exception I can think of is if an empirical principle of social organization is robust under social reflection. The goal would be to define an equilibrium state worth striving for, so that the society in question can accept it harmoniously as a norm.

This looks like relevant prior work–a lucky google hit.

Decentralization, democracy, and rationality

I got into a conversation involving @billmon1, Brian Keegan, and Nate Matias on Twitter the other day. I got so much out of it that I resolved to write up thoughts on it in a blog post. It’s appropriate that I get around to it on Martin Luther King Jr. day, since the conversation was about whether technology can advance social and democratic ideals, a topic I’ve been interested in for a long time.

Billmon (I don’t know his real name–not sure how I got into this conversation in the first place, to be honest) opened by expressing what’s becoming an increasingly popular cynicism towards those that look to social media technology for political solutions.

Now, I’m aware of the critiques of the idea that social media is inherently democratic. But I this claim, as stated, is wrong in lots of interesting ways.

The problem is that while social media technologies can’t be guaranteed to foster liberal, democratic values, decentralization and inter-connectivity of communications infrastructure are precisely the material conditions for liberal democracy. State power depends on communications, if the state can control the means of communication, or if citizens are so disconnected that they cannot come together to deliberate, then there is no place for an independent public to legitimize the state institutions that control it.

This isn’t a novel or radical position. It’s a centrist argument about the conditions of liberal democracy. However, it’s a position that’s presently under attack from all sides. Will Wilkinson has recently written that “liberal” statist critiques of Snowden and Wikileaks, who were essentially acting in accord with interests of independent public media, are being cast as “libertarian” boogeymen for what are essentially liberal principles. Meanwhile, the left is eager to attack this position as not democratic or radical enough.

A number of these critiques came through in our conversation. And since I’m writing the blog post summarizing the conversation, naturally I won all the arguments. For example, Billmon pointed out that Facebook is itself a mega-corporation controlling and monetizing communication with political interests. But that’s just it: Facebook is not inherently democratic because it is under centralized control and does not promote inter-connectivity (EdgeRank is great for filter bubbles). Contrast this with Diaspora, and you have a technology that supports a very different politics.

Brian Keegan came through with a more conservative argument:

Keegan is way right about this. Because I’m a pedant, I pointed out that populism is in accordance with the democratic ideal. But I hate the tribalist mob as much as anybody. That’s why I’m looking for ways to design infrastructure to enable communicative rationality–the kind of principled communication that leads to legitimate consensus among its diverse constituents. Keegan pointed me to Living Voters Guide as an existing example of this. It’s a cool example, but I’m looking for something that could integrate better with communications infrastructure already used on a massive scale, like email or Twitter.

The problem with bringing up Habermasian rationality in today’s academic or quasi-academic environment is that you immediately get hit by the left-wing critique that came up in the late 80’s and early 90’s. Cue Nate Matias:

He’s right of course. He also pointed me to this excellent actual by Nancy Fraser from 1990 articulating ways in which Habermas idealized bourgeois masculinist notions of the public sphere and ignored things like the exclusion of women and the working class counterpublics.

Reading the Fraser piece, I note that she doesn’t actually dismiss the idea of communicative rationality in its entirety. Rather, she simply doesn’t want it to be used in a way that falsely “brackets” (leaves out of the conversation) status differences:

Now, there is a remarkable irony here, one that Habermas’s account of the rise of the public sphere fails fully to appreciate.8 A discourse of publicity touting accessibility, rationality, and the suspension of status hierarchies is itself deployed as a strategy of distinction. Of course, in and of itself, this irony does not fatally compromise the discourse of publicity; that discourse can be, indeed has been, differently deployed in different circumstances and contexts. Nevertheless, it does suggest that the rela- tionship between publicity and status is more complex than Habermas intimates, that declaring a deliberative arena to be a space where extant status distinctions are bracketed and neutralized is not sufficient to make it so.

(This is her only use of the word “rationality” in the linked piece, though poking around I gather that she has a more comprehensive critique elsewhere.)

So there is plenty of room for a moderate position that favors decentralized communications organized under a more inclusive principle of rationality–especially if that principle of rationality allows for discussion of status differences.

I’m personally happy with the idea of keeping irrational people out of my town hall. Otherwise, as Billmon points out, you can get people using decentralized communication channels to promote bad things like ethnic violence. This is already in fact the status quo, as every major social media host invests heavily in spam prevention, effectively excluding a class of actors who are presumed to be acting in bad faith. I’ve suggested elsewhere that we should extend our definition of spam to exclude more bad actors.

This opens up some really interesting questions. If we are willing to accept that there is an appropriate middle ground between centralized control of communications on the one hand and demagogue-prone chaos on the other, where should we draw the line? And how would we want to design, distribute, and organize our communications technology and our use of it to hit that sweet spot?

Protected: Dear Internet Ladies, please report harassment as spam

This content is password-protected. To view it, please enter the password below.

the social intelligence of spotted hyenas

The best thing I did today was stop by for the beginning of Kay Holekamp‘s talk on “Social Complexity and the Evolution of Intelligence.”

Her work involves researching spotted hyenas.

Spotted hyenas live in clans of about a hundred hyenas, which contain several martilineal kinship groups each. Female hyenas have an observable social hierarchy that is caused by and a cause of survival “fitness”.  Male hyenas migrate to a different clan before reproducing.

This is very similar to the social structure of certain primates, like baboons.  It is nothing like the social structure of cats and dogs (hyenas are somewhere in between the two, closer to cats.)

What’s interesting about the research is that without exception, results about the social cognitive capabilities of primates is, without exception, reproducible in spotted hyenas.

That means that the same capacities for social intelligence has been achieved by multiple species through convergent evolution.

on courage in the face of failure developing bluestocking

It would be easy to be discouraged by early experiments with bluestocking.

sb@lebenswelt:~/dev/bluestocking$ python factchecker.py "Courage is what makes us. Courage is what divides us. Courage is what drives us. Courage is what stops us. Courage creates news. Courage demands more. Courage creates blame. Courage brings shame. Courage shows in school. Courage determines the cool. Courage divides the weak. Courage pours out like a leak. Courage puts us on a knee. Courage makes us free. Courage makes us plea. Courage helps us flee. Corey Fauchon"
Looking up Fauchon
Lookup failed
Looking up shame
Looking up news
Looking up puts
Lookup failed
Looking up leak
Lookup failed
Looking up stops
Lookup failed
Looking up Courage
Looking up helps
Lookup failed
Looking up divides
Lookup failed
Looking up shows
Lookup failed
Looking up demands
Lookup failed
Looking up pours
Lookup failed
Looking up brings
Lookup failed
Looking up weak
Lookup failed
Looking up drives
Lookup failed
Looking up free
Looking up blame
Lookup failed
Looking up Corey
Lookup failed
Looking up plea
Lookup failed
Looking up knee
Looking up flee
Lookup failed
Looking up cool
Looking up school
Looking up determines
Lookup failed
Looking up like
Looking up us
Lookup failed
Looking up creates
Lookup failed
Looking up makes
Lookup failed
Building knowledge base
Querying knowledge base with original document
Consistency: 0
Contradictions: []
Supported: []
Novel: [(True, 'helps', 'flee'), (True, 'helps', 'us'), (True, 'determines', 'cool'), (True, 'like', 'leak'), (True, 'puts', 'knee'), (True, 'puts', 'us'), (True, 'pours', 'leak'), (True, 'pours', 'like'), (True, 'brings', 'shame'), (True, 'drives', 'us'), (True, 'stops', 'us'), (True, 'creates', 'blame'), (True, 'creates', 'news'), (True, 'Courage', 'shame'), (True, 'Courage', 'news'), (True, 'Courage', 'puts'), (True, 'Courage', 'leak'), (True, 'Courage', 'stops'), (True, 'Courage', 'helps'), (True, 'Courage', 'divides'), (True, 'Courage', 'shows'), (True, 'Courage', 'demands'), (True, 'Courage', 'pours'), (True, 'Courage', 'brings'), (True, 'Courage', 'weak'), (True, 'Courage', 'drives'), (True, 'Courage', 'free'), (True, 'Courage', 'blame'), (True, 'Courage', 'plea'), (True, 'Courage', 'knee'), (True, 'Courage', 'flee'), (True, 'Courage', 'cool'), (True, 'Courage', 'school'), (True, 'Courage', 'determines'), (True, 'Courage', 'like'), (True, 'Courage', 'us'), (True, 'Courage', 'creates'), (True, 'Courage', 'makes'), (True, 'us', 'knee'), (True, 'us', 'flee'), (True, 'us', 'plea'), (True, 'us', 'free'), (True, 'Corey', 'Fauchon'), (True, 'makes', 'plea'), (True, 'makes', 'free'), (True, 'makes', 'us'), (True, 'divides', 'weak'), (True, 'divides', 'us'), (True, 'shows', 'school')]

But, then again, our ambitions are outlandish. Nevertheless, there is a silver lining:

sb@lebenswelt:~/dev/bluestocking$ python factchecker.py "The sky is not blue."
Looking up blue
Looking up sky
Building knowledge base
Querying knowledge base with original document
Consistency: -1
Contradictions: [(True, 'sky', 'blue')]
Supported: []
Novel: []

This is critical

Notes on using the neo4j-scala package, Part 1

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

To start with, I had trouble wrangling the dependencies. Spoiled by scripting languages, I’ve been half-assing my way around Maven for years, so I got burned a bit.

What happened was that in earlier messing around in my project, I had installed an earlier version of neo4j-scala from a different github repository. Don’t use that one. At the time of this writing, FaKoD‘s version is much more up to date and featureful.

I was getting errors that looked like this:

> [error] error while loading Neo4jWrapper, Scala signature Neo4jWrapper has
> wrong version
> [error]  expected: 5.0
> [error]  found: 4.1 in
> /home/sb/.ivy2/cache/org.neo4j/neo4j-scala/bundles/neo4j-scala-0.9.9-SNAPSHOT.jar(org/neo4j/scala/Neo4jWrapper.class)

The only relevant web pages I could find on this suggested that the problem had to due with having compiled the dependency in a different version of Scala. Since I had the Ubuntu package installed, which is pegged at 2.7.7, this seemed plausible. I went through a lot of flailing to reinstall Scala and rebuild the package, but to no avail.

That wasn’t the problem. Rather, when I asked him about it FaKoD patiently pointed out that older library has version 0.9.9-SNAPSHOT, whereas the newer one is version 0.1.0-SNAPSHOT. So, my sbt build configuration file has this line now:

libraryDependencies += "org.neo4j" % "neo4j-scala" % "0.1.0-SNAPSHOT"

Thanks to FaKoD’s walking me through these problems, I stopped getting cryptic errors and could start hacking.

Here’s what I had to start with, copying out of one of the neo4j-scala’s tests:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._
import org.neo4j.scala.{EmbeddedGraphDatabaseServiceProvider, Neo4jWrapper}

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
}

Running this in sbt, I get this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:6: class Krow needs to be abstract, /
since method neo4jStoreDir in trait EmbeddedGraphDatabaseServiceProvider /
of type => String is not defined

That’s because EmbeddedGraphDatabaseServiceProvider (this code is written by a German, I gather) has an abstract method that I haven’t defined.

What I find neat is that this is an abstract method–it’s a function that takes no arguments and returns a String. But Scala seems smart enough to allow this to be defined by either methods or more naturally variables. So, this compiles:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  val neo4jStoreDir = "var/graphdb"
}

but so does this:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  def neo4jStoreDir = {
    var a = "var/"
    var b = "graphdb"

    a + b
  }

(Functions in Scala can be defined by a block of code in curly braces, with the last line evaluated and returned.)

Next, I worked on rewriting my toy app, using this unittest as a guide.

Here was the code from my original experiment:

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()

    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")

      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)

      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   

      println("finished transaction 1")
    }

You could see why I would like it to be more concise. Here’s a first pass on what neo4j-scala let me whittle it down to:

    var first : Node = null
    var second : Node = null

    withTx {
      neo =>
          first = neo.gds.createNode()  
          first.setProperty("name","first")

          second = neo.gds.createNode()
          second.setProperty("name","second")

          first --> "isRelatedTo" --> second
    }

There is a lot of magic going on and it took me a while to get my head around it.

The point of withTx is to wrap around the try/success/finally pattern needed for most Neo4j transactions. Here’s the code for it:

  def withTx[T <: Any](operation: DatabaseService => T): T = {
    val tx = synchronized {
      ds.gds.beginTx
    }
    try {
      val ret = operation(ds)
      tx.success
      return ret
    } finally {
      tx.finish
    }
  }

Coming from years of JavaScript and Python, it was tough getting my head around this type signature. The syntax alone is daunting. But what I think it comes down to is this:

  • withTx takes a type parameter, T, which can be a subclass (<:) of Any.
  • It takes an argument, operation, which must be a function from something of type DatabaseService to something of type T.
  • It returns type T.

In practice, this means that the function can be called in a way that’s agnostic to the return type of its argument. But what is this DatabaseService argument?

In neo4j-scala, DatabaseService is a trait that wraps a Neo4j GraphDatabaseService. Then a GraphDatabaseServiceProvider wraps the DatabaseService. Application code is as far as I can tell expected to doubly inherit from both Neo4jWrapper, which handles the syntactic sugar, and a GraphDatabaseServiceProvider that provides the context for the sugar.

Which means that somewhere deep in the structure of our main object there is a DatabaseService that has real ultimate power over the database. withTx will find it for us, but we need to send it an operation that binds to it.

neo4j-scala also provides this helpful method, which operates in the context where that DatabaseService is available:

  def createNode(implicit ds: DatabaseService): Node = ds.gds.createNode

createNode‘s argument is implicit and so is plucked otherwise unbidden from its environment. And since Scala lets you call methods that have no arguments without parentheses, we can shorten the code further.

    withTx {
      implicit neo =>
          first = createNode
          first.setProperty("name","first")

          second = createNode
          second.setProperty("name","second")

          println("added nodes")

          // uses neo4j-scala special syntax
          first --> "isRelatedTo" --> second
    }

Notice that I had to put an implicit before neo in this code. When I didn’t, I got this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:23: /
 could not find implicit value for parameter ds: org.neo4j.scala.DatabaseService
[error]           first = createNode

What I think is happening is that in order to make the DatabaseService, neo, available as an implicit argument of the createNode method, we have to mark it as available with the implicit keyword.

See this page for reference:

The actual arguments that are eligible to be passed to an implicit parameter fall into two categories:

* First, eligible are all identifiers x that can be accessed at the point of the method call without a prefix and that denote an implicit definition or an implicit parameter.
* Second, eligible are also all members of companion modules of the implicit parameter’s type that are labeled implicit.

The other interesting thing going on here is this line:

          first --> "isRelatedTo" --> second

This makes a Neo4j relationship between first and second of type “isRelatedTo.”

I have no idea how the code that makes this happen works. Looking at it hurts my head. I think there may be black magic involved.

This has been slow going, since I’m learning as I’m going. I’m not done yet, though. The code I’m converting had some code to do a short traversal between my two nodes, printing their names. I’m going to leave that to Part 2.

Goodbye, and Hello

This summer I have been transitioning to a new life chapter.

For the past four years I have worked at OpenPlans, or what was once The Open Planning Project, and specifically for OpenGeo. I couldn’t say enough about the range and depth of experiences or the wildness of the ride with that organization. For now, suffice it to say that I entered with a lot of enthusiasm for ideas of open source, technology, and good governance. Working at OpenPlans was a way to experience the confrontation of those ideas with reality.

What has been beautiful to witness is that those ideas are alive and victorious. While everything has been harder than it looked on paper, the dreams shared by so many about the promise of technology for justice are becoming real. They aren’t just dreams. They are truth.

I’m so proud to have worked in this field. What’s more, I’ve been able to be a part of OpenGeo growing as a company to become a vibrant force in world of geospatial software. This is exciting not only because it is changing the way governments use open software and open data, but also because it proves its business model. Through its success, OpenGeo shows a path of economic transformation that’s available to public and private organizations alike.

I’ve now left OpenGeo so that I can begin working towards a PhD at the UC Berkeley School of Information. Here is how their website describes the program:

The School of Information is both UC Berkeley’s newest and its smallest school. Located in the center of campus, the I School is a graduate research and education community committed to expanding access to information and to improving its usability, reliability, and credibility while preserving security and privacy. This requires the insights of scholars from diverse fields—information and computer science, design, social sciences, management, law, and policy.

I can’t explain any better than this what the program is. And to be honest, I don’t yet know what I will be doing there. It is an open future.

That said, what I better be doing there is writing more. I expect that at least initially I’ll be writing a lot of my day to day discoveries here. So, if you’ve been following this blog, then Goodbye, and Hello.

GeoNode Jobs

Despite the economy, there are lots of jobs cropping up to work on GeoNode.

OpenGeo’s hiring a new GeoNode developer, along with several other positions (a UX developer and a Global Support Services Manager.) See here for more details on careers at OpenGeo.

Robert Soden, GFDRR Lab’s new Geo Lead, recently posted this to the GeoNode developer’s list:

The World Bank’s GFDRR Labs team is seeking to hire a developer to work with us on GeoNode and several other exciting mapping projects. As part of the Open Data for Resilience Initiative (OpenDRI), the Global Facility for Disaster Reduction and Recovery collaborates with partners to deploy GeoNode in a number of countries around the world. The right candidate will assist the team in facilitating these deployments, contributing to the core software, and interfacing with the GeoNode community of users and developers.

If you’re interested, please contact me at: rsoden@worldbank.org. Deadline to apply is August 15, 2011.

It’s a growing community. Please consider becoming a (paid) part of it.

GitHub Unicorn

GitHub consistently amazes me with how quickly it adds new features. Here is the latest one that blows my mind.

David Winslow tells me that this just part of a larger pattern of unicorns taking over the internet.