Digifesto

Category: open source software

on courage in the face of failure developing bluestocking

It would be easy to be discouraged by early experiments with bluestocking.

sb@lebenswelt:~/dev/bluestocking$ python factchecker.py "Courage is what makes us. Courage is what divides us. Courage is what drives us. Courage is what stops us. Courage creates news. Courage demands more. Courage creates blame. Courage brings shame. Courage shows in school. Courage determines the cool. Courage divides the weak. Courage pours out like a leak. Courage puts us on a knee. Courage makes us free. Courage makes us plea. Courage helps us flee. Corey Fauchon"
Looking up Fauchon
Lookup failed
Looking up shame
Looking up news
Looking up puts
Lookup failed
Looking up leak
Lookup failed
Looking up stops
Lookup failed
Looking up Courage
Looking up helps
Lookup failed
Looking up divides
Lookup failed
Looking up shows
Lookup failed
Looking up demands
Lookup failed
Looking up pours
Lookup failed
Looking up brings
Lookup failed
Looking up weak
Lookup failed
Looking up drives
Lookup failed
Looking up free
Looking up blame
Lookup failed
Looking up Corey
Lookup failed
Looking up plea
Lookup failed
Looking up knee
Looking up flee
Lookup failed
Looking up cool
Looking up school
Looking up determines
Lookup failed
Looking up like
Looking up us
Lookup failed
Looking up creates
Lookup failed
Looking up makes
Lookup failed
Building knowledge base
Querying knowledge base with original document
Consistency: 0
Contradictions: []
Supported: []
Novel: [(True, 'helps', 'flee'), (True, 'helps', 'us'), (True, 'determines', 'cool'), (True, 'like', 'leak'), (True, 'puts', 'knee'), (True, 'puts', 'us'), (True, 'pours', 'leak'), (True, 'pours', 'like'), (True, 'brings', 'shame'), (True, 'drives', 'us'), (True, 'stops', 'us'), (True, 'creates', 'blame'), (True, 'creates', 'news'), (True, 'Courage', 'shame'), (True, 'Courage', 'news'), (True, 'Courage', 'puts'), (True, 'Courage', 'leak'), (True, 'Courage', 'stops'), (True, 'Courage', 'helps'), (True, 'Courage', 'divides'), (True, 'Courage', 'shows'), (True, 'Courage', 'demands'), (True, 'Courage', 'pours'), (True, 'Courage', 'brings'), (True, 'Courage', 'weak'), (True, 'Courage', 'drives'), (True, 'Courage', 'free'), (True, 'Courage', 'blame'), (True, 'Courage', 'plea'), (True, 'Courage', 'knee'), (True, 'Courage', 'flee'), (True, 'Courage', 'cool'), (True, 'Courage', 'school'), (True, 'Courage', 'determines'), (True, 'Courage', 'like'), (True, 'Courage', 'us'), (True, 'Courage', 'creates'), (True, 'Courage', 'makes'), (True, 'us', 'knee'), (True, 'us', 'flee'), (True, 'us', 'plea'), (True, 'us', 'free'), (True, 'Corey', 'Fauchon'), (True, 'makes', 'plea'), (True, 'makes', 'free'), (True, 'makes', 'us'), (True, 'divides', 'weak'), (True, 'divides', 'us'), (True, 'shows', 'school')]

But, then again, our ambitions are outlandish. Nevertheless, there is a silver lining:

sb@lebenswelt:~/dev/bluestocking$ python factchecker.py "The sky is not blue."
Looking up blue
Looking up sky
Building knowledge base
Querying knowledge base with original document
Consistency: -1
Contradictions: [(True, 'sky', 'blue')]
Supported: []
Novel: []

Why federally funded software should be open source

Recently, open access to government funded research has gained attention and traction. Britain and Europe have both announced that they will make research they fund open access. In the United States, a community-driven effort has pushed a Whitehouse petition to the Obama administration for a similar policy. We may be experiencing a sea change.

Perhaps on the coattails of this movement, Open Source for America has launched a petition asking for a similar policy regarded federally funded software development: share all government-developed software under an open source license.

This is a really good idea.

Unfortunately, software development and the government IT procurement are so misunderstood that this is not likely to excite those who aren’t somehow directly by the issue. That is too bad, because every American stands to benefit from this sort of change. That makes it important for those of us who do understand to act.

I’ll try to illustrate why this is important with a story, or really a template of a story. This is a story told in countless cases of government software procurement:

ACRNM, a federal agency, has realized that its database management system and its user interface have not been updated since the late 90’s, because building it the last time was such a headache. It never really worked the way they wanted it, and the vendor who built it for them has since vanished off the face of the earth. Desperate and beleaguered, ACRNM finally gets the budget together to build a new system, and put out a bid.

Vendors that have navigated the prerequisite bureaucratic maze flock to this bid, knowing victory will be lucrative. Among them is FUBAR Enterprise Solutions. They know that whatever they build, they have a revenue stream for life. Not only does ACRNM have an enormous internal incentive to declare the new system a success to justify their budget, but they also have nobody to turn to for help with their software when it inevitably fails but FUBAR. FUBAR can continue extorting ACRNM for cash until ACRNM gives up, and the cycle continues.

What is wrong with this picture? Let’s count the problems:

  • FUBAR has ACRNM by the (pardon me, there’s really no other way to put this) balls. The term is vendor lock-in. The second ACRNM installs their system, FUBAR becomes a parasite on the government leeching taxpayer money. This is because the software is proprietary. No other company is legally allowed to fix or modify FUBAR’s proprietary system, so FUBAR faces no competition and so can charge through the nose. If the software were open source, ACRNM could turn to other contractors to repair their system, lowering total costs.
  • ACRNM has to do its work with worse software. Remember, this is a government agency that we pay taxes to for their services. With so much government activity boiling down to bureaucratic information processing, and so much innovation in software engineering and design, and so much budgetary pressure, you would think that the federal government would leap at technological innovation. But proprietary contracting causes the government to cripple itself at a tipping point.
  • Today, government agencies like ACRNM are wisening up and turning to open source solutions. But it’s a slow, slow process. This is partly because FUBAR and its buddy companies who, after so many years of this relationship with government, are now an entrenched lobby that will sow Fear, Uncertainty, and Doubt about open source alternatives if they can get away with it. In recent years, since open source has become more mainstream, these companies are admitting the viability of open source compatibility and mixed solutions. They see the writing on the wall. They will of course fight an open source purchasing mandate with everything they have.
  • Few governmental problems are unique. If ACRNM is paying for a new custom software solution, there likely many other agencies–at federal, state, or local level–with a similar problem. Civic Commons has already jumped on this opportunity by trying to facilitate technology reuse across city governments. If ACRNM invests in an open source solution, then other agencies can seek out that solution and adapt it to their needs, reducing government IT costs overall.
  • As we’ve discussed, open source software creates a competitive market for services. That makes an open source mandate a job creation program. Every new open technology is an opportunity for several small businesses to open. These are businesses that share fixed costs to market entry and add value through technologist consulting and custom development. Jobs customizing existing open source solutions can be well-paid with even an entry-level programming skill set, and are a good way to build a lasting career in the technology sector. Federal investment in open source software builds our national supply of technology skill faster than proprietary investment.
  • Lastly, but certainly not least, is the possible reuse of open source technology by the private sector. Just as federally funded research contributes to growth in America’s scientific industry, federal investment in software provides a foundation for stronger tech companies. Openness in both cases expands the impact of the funding.

So, to recap: if this sort of policy passes, the winners are government employees, taxpayers, entry-level workers with a minimum of technical skills, and the tech industry in general. The losers (in the short term) are those existing companies that have the federal government locked into custom proprietary software contracts.

I want to make a point clear: I am talking specifically about new software development in this post. Purchasing licenses for existing proprietary software is a different story.

Brian Carver, professor at UC Berkeley School of Information, has offered this clarification of what an open source mandate could look like:

  1. An unambiguous policy and awareness that all software created by
    federal employees as part of their job duties is not subject to copyright
    at all and is born in the public domain, and therefore not subject to any
    license terms at all, including a FOSS license.
  2. Given 1, the federal government should either just use github/bitbucket
    or set up a similar repository to share all such federal government
    software that is in the public domain.
  3. When the federal government contracts with developers for software,
    there should be an unambiguous policy that all such software must be
    licensed under a FOSS license unless subject to a specifically-requested
    exemption (national security, military, etc.)

A central election issue is the size and role of government in the economy. Politicians on the right advocate for smaller government and a strong private sector with competitive markets. Politicians on the left advocate for government’s active investment in the economy.

Proprietary government-developed software is the worst of both worlds: inefficient government spending to create parasitic, uncompetitive companies that don’t invest their technology back into the economy. An open source mandate would give us the best of both worlds: efficient government spending that shrinks government (by easing overhead) while investing in new technology and competitive businesses.

The movement for open access to government funded research is strong and winning victories around the world. Maybe we can do the same for government funded software development.

We need help naming a software project

Speaking of computational argumentation, Dave Kush and I are starting a software project and we need a name for it.

The purpose of the software is to extract information from a large number of documents, and then merge this information together into a knowledge base. We think this could be pretty great because it would support:

  • Conflict detection and resolution. In the process of combining information from many sources into a single knowledge base, the system should be able to mark conflicts of information. That would indicate an inconsistency or controversy between the documents, which could be flagged for further investigation.
  • Naturally queryable aggregate knowledge. We anticipate being able to build a query interface that is a natural extension of this system: just run the query through the extraction process and compare the result for consistency with the knowledge base. This would make the system into a “dissonance engine,” useful for opposition research or the popping of filter bubbles.

I should say that neither of us knows exactly what we are doing. But Dave’s almost got his PhD in human syntax so I think we’ve got a shot at building a sweet parser. What’s more, we’ve got the will and plan. It will be open source, of course, and we’re eager for collaborators.

We have one problem:

We don’t know what to call it.

I can’t even make the GitHub account for our code until we have a good name. And until then we’ll be sending Python scripts to each other as email attachments and that will never get anywhere.

Please help us. Tell us what to name our project. If we use your name, we’ll do something awesome for you some day.

Scratch that. We’re calling it Bluestocking. The GitHub repo is here.

Academia vs. FOSS: The Good, The Bad, and the Ugly

Mel Chua has been pushing forward on the theme of FOSS culture in academia, and has gotten a lot of wonderful comments, many about why it’s not so simple to just port one culture over to the other. I want to try to compile items from Mel, comments on that post, and a few other sources. The question is: what are the salient differences between FOSS and academia?

I will proceed using the now-standard Spaghetti Western classification schema.

The Good

  • Universities tend to be more proactive about identifying and aiding newcomers that are struggling, as opposed to many FOSS projects that have high failure-and-dropout rates due to poorly designed scaffolding.
  • Academia is much more demographically inclusive. FOSS communities are notoriously imbalanced in terms of gender and race.

The Bad

  • The academic fear of having ones results scooped or stolen results in redundant, secrecy, and lonely effort. FOSS communities get around this by having good systems for attribution of incremental progress.
  • Despite scientific ideals, academic scientific research is getting less reproducible, and therefore less robust, because of closed code and data. FOSS work is often more reproducible (though not if its poorly documented).
  • Closed access academic journals hold many disciplines hostage by holding a monopoly on prestige. This is changing with the push for open access research, but this is still a significant issue. FOSS communities may care about community prestige, but often that prestige comes from community helpfulness or stake in a project. If metrics are used, they are often implicit ones extractable from the code repository itself, like Ohloh. Altmetrics are a solution to this problem.

The Ugly

  • In both FOSS and academia, a community of collaborators needs to form around shared interests and skills. But FOSS has come to exemplify the power of the distributed collaboration towards pragmatic goals. One is judged more by ones contributions than by ones academic pedigree, which means that FOSS does not have as much institutional gatekeeping.
  • Tenure committees look at papers published, not software developed. So there is little incentive for making robust software as part of the research process, however much that might allow reproducibility and encourage collaboration.
  • Since academics are often focused on “the frontier”, they don’t pay much attention to “building blocks”. Academic research culture tends to encourage this because it’s a race for discovery. FOSS regards care of the building blocks as a virtue and rewards the effort with stronger communities built on top of those blocks.
  • One reason for the difference between academia and FOSS is bandwidth. Since publications have page limits and are also the main means of academic communication, one wants to dedicate as much space as possible to juicy results at the expense of process documentation that would aid reproducibility. Since FOSS developed using digital communication tools with fewer constraints, it doesn’t have this problem. But academia doesn’t yet value contributions to this amorphous digital wealth of knowledge.

Have I left anything out?

Don’t use Venn diagrams like this

Today I saw this whitepaper by Esri about their use of open source software. It’s old, but still kept my attention.

There’s several reasons why this paper is interesting. One reason is that it reflects the trend of companies that once used FUD tactics around open source software to singing a soothing song of compatibilism. It makes an admirable effort to explain the differences between open source, proprietary software, and open standards to its enterprise client audience. That is the good news.

The bad news is that since this new compatibilism is just bending to market pressure after the rise of successful open source software complements, it lacks an understanding of why the open source development process has caused those market successes. Of course, proprietary companies have good reason to blur these lines, because otherwise they would need to acknowledge the existence of open source substitutes. In Esri’s case, that would mean products like the OpenGeo Suite.

I probably wouldn’t have written this post if it were not for this Venn diagram, which is presented with the caption A hybrid relationship:

I don’t think there is a way to interpret this diagram in a way that makes sense. It correctly identifies that Closed Source, Open Source, and Open Standards are different. But what do the overlapping regions represent? Presumabely they are meant to indicate that a system may both be open source and use open standards, or have open standards and be closed, or…be both open and closed?

It’s a subtle point but the semantics of set containment implied by the Venn diagram really don’t apply here. A system that’s a ‘hybrid’ between a closed and open software is not “both” closed and open the same way closed software that uses open standards is “both” closed and open. Rather, the hybrid system is just that, a hybrid, which means that its architecture is going to suffer tradeoffs as different components have different properties.

I don’t think that the author of this whitepaper was trying to deliberately obscure this idea. But I think that they didn’t know or care about it. That’s a problem, because it’s marketing material like this that clouds the picture about the value of open source. At a pointy-haired managerial level, one can answer the question “why aren’t you using more open source software” with a glib, “oh, we’re using a hybrid model, tailored to our needs.” But unless you actually understand what you’re talking about, your technical stack may still be full of buggy and unaccountable software, without you even knowing it.

The open source acqui-hire

There’s some interesting commentary around Twitter’s recent acquisition, Whisper Systems:

Twitter has begun to open source the software built by Whisper Systems, the enterprise mobile security startup it acquired just three weeks ago. …This move confirms the, well, whipsers that the Whisper Systems deal was mostly made for acqui-hire purposes.

Another acquisition like this that comes to mind is Etherpad, which Google bought (presumably to get the Etherpad team working on Wave) then open sourced. The logic of these acquisitions is that the talent is what matters, the IP is incidental or perhaps better served by an open community.

When I talk to actual or aspiring entrepreneurs, they often make the assumption that it would spoil their business to start building out their product open source. For one thing, they argue, there will be competitors who launch their own startups off of the open innovation. Then, they will miss their chance at a big exit because there will be no IP to tempt Facebook or whoever else to buy them out.

These open source acqui-hires defy these concerns. Demonstrating talent is part of what makes one acquirable. Logically, then, starting a competing company based on technology in which you don’t have talent makes you less competitive, from the perspective of a market exit. It’s hard to see what kind of competitive advantage the copycat company would have, really, since it doesn’t have the expertise in technology that comes from building it. If they do find some competitive advantage (perhaps they speak a foreign language and so are able to target a different market), then they are natural partners, not natural competitors.

One can take this argument further. Making open and available software is one of the best ways for a developer to make others aware of their talents and increase the demand (and value) for their own labor. So the talent in an open source company should be on average more valuable in case of an acqui-hire.

This doesn’t seem like a bad way out for a talented entrepreneur. Why, then, is this not a more well-known model for startups?

One reason is that the real winners in the startup scene are not the entrepreneurs. It’s the funders, and to the funders it is more worthwhile to invest in several different technologies with the small chance of selling one off big than to invest in the market value of their entrepreneurs. Because, after all, venture capitalists are in the same war for engineering talent as Google, Facebook, etc.. This should become less of an issue, however, as crowdfunding becomes more viable.

To Google Reader users

A lot of friends of mine were avid Google Reader users. For some of them, it was their primary social media tool. They had built a strong community around it. Naturally, they were attached to its user interface, features, and workflows. It was home to them.

Google recently ‘redesigned’ Google Reader in a way that blatantly forced Reader users to adopt Google+ as their social media platform. A lot of Reader devotees are pissed about this. They want their old technology back.

My first response to this is: What did you expect? What made Reader so special? It was just the first of several experiments in social media that Google’s used to edge into the Facebook’s market. (Reader, Buzz, Wave, now Google+). Of course, the industry logic is that your community should be dumped onto the newer platform, so that Google can capture the network effects of your participation. Your community is what will make their new technology so valuable to them!

Still not happy?

The problem is that Google Reader was a corporately operated platform, not a community operated one. You may not have know that you had other options. There are a lot of social media communities that have a lot of self-control, Metafilter being a particularly great one. (incidentally, Ask Metafilter has a good guide to Reader alternatives) There is also a lot of energy going into open source social media tools.

The most prominent of these is Diaspora, which raised a ridiculous amount of funding on Kickstarter when the New York Times wrote about its being a project. I stopped following it after the first press buzz, but maybe it’s time to start paying attention to it again. Since its community has recently announced that it is not vaporware, I decided to go ahead and join the diasp.org pod.

To my surprise, it’s pretty great! Smooth, intuitive interface, fast enough, seems to have all the bells and whistles you’d want and not a lot of cruft–basically all the stuff I care about on Google+. I’ve got a public profile. Plus, it has great tools for data export in case I want to pick up and move to a different pod.

Looking into it, Diaspora does not yet work as an RSS reader, though there is an open issue for it. A bit of a missed opportunity, IMO. Some other people are build an open-source Reader clone in response, which could more directly solve the Reader problem. Whatever the current technical limitations, though, they can be surmounted by some creative piping between services.

The point that I hope stands is that there is a hidden cost to a community investing in a technical infrastructure when it is being maintained by those that do not value your community. People’s anger at the Reader redesign demonstrates the value of the open source alternatives.

Notes on using the neo4j-scala package, Part 1

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

To start with, I had trouble wrangling the dependencies. Spoiled by scripting languages, I’ve been half-assing my way around Maven for years, so I got burned a bit.

What happened was that in earlier messing around in my project, I had installed an earlier version of neo4j-scala from a different github repository. Don’t use that one. At the time of this writing, FaKoD‘s version is much more up to date and featureful.

I was getting errors that looked like this:

> [error] error while loading Neo4jWrapper, Scala signature Neo4jWrapper has
> wrong version
> [error]  expected: 5.0
> [error]  found: 4.1 in
> /home/sb/.ivy2/cache/org.neo4j/neo4j-scala/bundles/neo4j-scala-0.9.9-SNAPSHOT.jar(org/neo4j/scala/Neo4jWrapper.class)

The only relevant web pages I could find on this suggested that the problem had to due with having compiled the dependency in a different version of Scala. Since I had the Ubuntu package installed, which is pegged at 2.7.7, this seemed plausible. I went through a lot of flailing to reinstall Scala and rebuild the package, but to no avail.

That wasn’t the problem. Rather, when I asked him about it FaKoD patiently pointed out that older library has version 0.9.9-SNAPSHOT, whereas the newer one is version 0.1.0-SNAPSHOT. So, my sbt build configuration file has this line now:

libraryDependencies += "org.neo4j" % "neo4j-scala" % "0.1.0-SNAPSHOT"

Thanks to FaKoD’s walking me through these problems, I stopped getting cryptic errors and could start hacking.

Here’s what I had to start with, copying out of one of the neo4j-scala’s tests:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._
import org.neo4j.scala.{EmbeddedGraphDatabaseServiceProvider, Neo4jWrapper}

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
}

Running this in sbt, I get this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:6: class Krow needs to be abstract, /
since method neo4jStoreDir in trait EmbeddedGraphDatabaseServiceProvider /
of type => String is not defined

That’s because EmbeddedGraphDatabaseServiceProvider (this code is written by a German, I gather) has an abstract method that I haven’t defined.

What I find neat is that this is an abstract method–it’s a function that takes no arguments and returns a String. But Scala seems smart enough to allow this to be defined by either methods or more naturally variables. So, this compiles:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  val neo4jStoreDir = "var/graphdb"
}

but so does this:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  def neo4jStoreDir = {
    var a = "var/"
    var b = "graphdb"

    a + b
  }

(Functions in Scala can be defined by a block of code in curly braces, with the last line evaluated and returned.)

Next, I worked on rewriting my toy app, using this unittest as a guide.

Here was the code from my original experiment:

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()

    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")

      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)

      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   

      println("finished transaction 1")
    }

You could see why I would like it to be more concise. Here’s a first pass on what neo4j-scala let me whittle it down to:

    var first : Node = null
    var second : Node = null

    withTx {
      neo =>
          first = neo.gds.createNode()  
          first.setProperty("name","first")

          second = neo.gds.createNode()
          second.setProperty("name","second")

          first --> "isRelatedTo" --> second
    }

There is a lot of magic going on and it took me a while to get my head around it.

The point of withTx is to wrap around the try/success/finally pattern needed for most Neo4j transactions. Here’s the code for it:

  def withTx[T <: Any](operation: DatabaseService => T): T = {
    val tx = synchronized {
      ds.gds.beginTx
    }
    try {
      val ret = operation(ds)
      tx.success
      return ret
    } finally {
      tx.finish
    }
  }

Coming from years of JavaScript and Python, it was tough getting my head around this type signature. The syntax alone is daunting. But what I think it comes down to is this:

  • withTx takes a type parameter, T, which can be a subclass (<:) of Any.
  • It takes an argument, operation, which must be a function from something of type DatabaseService to something of type T.
  • It returns type T.

In practice, this means that the function can be called in a way that’s agnostic to the return type of its argument. But what is this DatabaseService argument?

In neo4j-scala, DatabaseService is a trait that wraps a Neo4j GraphDatabaseService. Then a GraphDatabaseServiceProvider wraps the DatabaseService. Application code is as far as I can tell expected to doubly inherit from both Neo4jWrapper, which handles the syntactic sugar, and a GraphDatabaseServiceProvider that provides the context for the sugar.

Which means that somewhere deep in the structure of our main object there is a DatabaseService that has real ultimate power over the database. withTx will find it for us, but we need to send it an operation that binds to it.

neo4j-scala also provides this helpful method, which operates in the context where that DatabaseService is available:

  def createNode(implicit ds: DatabaseService): Node = ds.gds.createNode

createNode‘s argument is implicit and so is plucked otherwise unbidden from its environment. And since Scala lets you call methods that have no arguments without parentheses, we can shorten the code further.

    withTx {
      implicit neo =>
          first = createNode
          first.setProperty("name","first")

          second = createNode
          second.setProperty("name","second")

          println("added nodes")

          // uses neo4j-scala special syntax
          first --> "isRelatedTo" --> second
    }

Notice that I had to put an implicit before neo in this code. When I didn’t, I got this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:23: /
 could not find implicit value for parameter ds: org.neo4j.scala.DatabaseService
[error]           first = createNode

What I think is happening is that in order to make the DatabaseService, neo, available as an implicit argument of the createNode method, we have to mark it as available with the implicit keyword.

See this page for reference:

The actual arguments that are eligible to be passed to an implicit parameter fall into two categories:

* First, eligible are all identifiers x that can be accessed at the point of the method call without a prefix and that denote an implicit definition or an implicit parameter.
* Second, eligible are also all members of companion modules of the implicit parameter’s type that are labeled implicit.

The other interesting thing going on here is this line:

          first --> "isRelatedTo" --> second

This makes a Neo4j relationship between first and second of type “isRelatedTo.”

I have no idea how the code that makes this happen works. Looking at it hurts my head. I think there may be black magic involved.

This has been slow going, since I’m learning as I’m going. I’m not done yet, though. The code I’m converting had some code to do a short traversal between my two nodes, printing their names. I’m going to leave that to Part 2.

Neo4j and Scala hacking notes

This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.

Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.

I wrote some graphical hello world code to test things out in a file called Krow.scala (don’t ask). I’ll walk you through it:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

I wanted to code against an embedded database, rather than code against the Neo4j server, because I have big dreams of combining Neo4j with some other web framework and don’t like have to start and stop databases. So I needed EmbeddedGraphDatabase, which implements the GraphDatabaseService interface, and persists its data to a directory of files.

I’ll talk about the JavaConversions bit later.

object Krow extends Application {

I am a lazy programmer who only bothers to encapsulate things into software architecture at the last minute. I’m also spoiled by Python and JavaScript and intimidated by the idea of code compilation. So initially I wanted to write this as an interpreted script so I wouldn’t have to think about it. But I’ve heard great things about sbt (simple-build-tool) so I figured I’d try it out.

Using sbt was definitely worth it, if only because it is really well documented and starting up my project with it got me back into the mindset of Java development enough to get Dwins to explain Maven repositories to me again. Adding dependencies to an sbt project involves writing Scala itself, which is a nice way to ease into the language.

But running my project in sbt meant I needed a main method on by lame-o script. Ugh. That sounded like too much work for me, and args: Array[String] looks ugly and typing it ruins my day.

Dwins recommended I try using Scala’s Application trait. He explained that this would take code from an object’s body and do some magic to turn it into a main method. Rock on!

Of course, I didn’t bother to check the documentation or anything. Otherwise, I would have seen this:

The Application trait can be used to quickly turn objects into executable programs, but is not recommended.

For a language that is so emphatically Correct in its design, I give a lot of credit to whoever it was that had the balls to include this language feature so that newbs could hang themselves on it. If they hadn’t, I wouldn’t have had to confront hard truths about threading. (That’s foreshadowing)

  println("start")

  val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")

Sweet, a database I don’t have to start and stop on the command line! This var/graphdb directory is made in the directory in which I run the program (for me, using sbt run).

Next:

  var tx: Transaction = neo.beginTx()

  var first : Node = null
  var second : Node = null

  try {
    first = neo.createNode()
    first.setProperty("name","first")
    
    second = neo.createNode()
    second.setProperty("name","second")

    first.createRelationshipTo(second, "isRelatedTo")

    tx.success()
  } finally {
    println("finished transaction 1")
  }

What I’m trying to do with this code is make two nodes and a relationship between them. Should be simple.

But it turns out that with Neo4j, all modifications to the database have to be done in a transaction context, and for that you have to do this business of creating a new Transaction:

A programmatically handled transaction. All modifying operations that work with the node space must be wrapped in a transaction. Transactions are thread confined. Transactions can either be handled programmatically, through this interface, or by a container through the Java Transaction API (JTA). The Transaction interface makes handling programmatic transactions easier than using JTA programmatically. Here’s the idiomatic use of programmatic transactions in Neo4j:

 Transaction tx = graphDb.beginTx();
 try
 {
        ... // any operation that works with the node space
     tx.success();
 }
 finally
 {
     tx.finish();
 }
 

No big deal.

This bit of code was a chance for Dwins to show me a Scala feature that makes the language shine. Check out this line:

first.createRelationshipTo(second, "isRelatedTo")

If you check the documentation for this method, you can see that I’m not using this method as expect. The Java type signature is:

Relationship createRelationshipTo(Node otherNode, RelationshipType type)

where RelationshipType is a Neo4j concept that’s what it sounds like. I suppose it is important to set apart from mere Properties for performance on traversals something. RelationshipTypes can be created dynamically and seem to more or less exist in the either, but you need to provide them when you create a relationship. All relationships are of a type.

In terms of their data content, though, RelationshipTypes are just wrappers around strings. Rather than doing this wrapping in the same line that I createRelationship, Scala lets me establish a conversion from strings to RelationshipTypes in an elegant way.

You see, I lied. The above code would not have compiled had I not also included this earlier in the object’s definition:

  implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

This code uses Scala’s implicit conversions to define a conversion between Strings and RelationshipTypes.

DynamicRelationshipType.withName(x) is one of Neo4j’s ways of making a new RelationshipType. Scala’s type inference means that the compiler knows that string2relationshipType returns a RelationshipType.

Since I used the implicit keyword, Scala knows that when a String is used in a method that expects a RelationshipType, it can use this function to convert it on the fly.

Check out all that majesty. Thanks, Scala!

Ok, so now I want to show that I was actually able to get something into the database. So here’s my node traversal and printing code.

  tx = neo.beginTx()

  try{
    
    val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                          StopEvaluator.END_OF_GRAPH,
                                          ReturnableEvaluator.ALL,
                                          "isRelatedTo",
                                          Direction.BOTH)

    for(node <- trav){
      println(node.getProperty("name"))
    }
    tx.success()
  } finally {
    tx.finish()
    println("finished transaction 2")
  }

  neo.shutdown()

  println("done")

}

Two observations:

  • traverse takes a lot of arguments, most of which seem to be these awkwardly specified static variables. I bet there’s a way to use Scala features to wrap that and make it more elegant.
  • Check out that for loop. Concise syntax that takes an iterator. There’s one catch: Traverser is a Java.lang.Iterable iterator, whereas the loop syntax requires a scala.collection.Iterable. Remember that import scala.collection.JavaConversions._ line? That imported an implicit conversion from Java to Scala iterables.

All in all, pretty straightforward stuff, I thought. Here’s what I got when I used sbt to run this project:

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

That’s not what I wanted! Not only did I not get any printed acknowledgement of the nodes that I had made in the database, but program hangs and doesn’t finish.

What the hell?!

Asking Dwins about it, he tells me sagely about threads. Transactions need to be run in a single thread. The Application trait does a lot of bad stuff with threads. To be technically specific about it, it does…some really bad stuff with threads. I thought I had a handle on it when I started writing this blog post but instead I’m just going to copy/paste from the Application trait docs, which I should have read in the first place.

In practice the Application trait has a number of serious pitfalls:

* Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object.

Oh. Huh. That’s interesting.

It is recommended to use the App trait instead.

Now you’re talking. Let me just change that line to object Krow extends App { and I’ll be cooking in no…

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

…time.

God dammit. There’s something else about App, which runs all the object code at initialization, which is causing a problem I guess. I asked Dwins what he thought.

Too much magic.

I guess I’m going to have to write a main method after all.


After some further messing around with the code, I have something that runs and prints the desired lines.

While the code would compile, I got I wound up having to explicitly name the RelationshipType type in the calls where I was trying to implicitly convert the strings; otherwise I got exceptions like this:

java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo

Does that make it an explicit conversion?

Overall, hacking around with this makes me excited about both Scala and Neo4j despite the setbacks and wrangling.

Complete working code appended below.


import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

object Krow {

  println("start")

  def main(args: Array[String]){

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()
      
    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")
    
      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)
      
      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   
      
      println("finished transaction 1")
    }

    tx = neo.beginTx()

    try{
    
      val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                            StopEvaluator.END_OF_GRAPH,
                                            ReturnableEvaluator.ALL,
                                            "isRelatedTo" : RelationshipType,
                                            Direction.BOTH)

      for(node <- trav){
        println(node.getProperty("name"))
      }
      tx.success()
    } finally {
      tx.finish()
      println("finished transaction 2")
    }

    neo.shutdown()

    println("done")
  }
}

Open Source is Software’s Labor Movement

Software developers often have high starting salaries, but wind up not getting paid as much as managers and others who work in software. Why is this? I’d argue that it’s not because software development is any less essential or important to the software business. Rather, the reason is that software developers that work for a proprietary company give away the rights to their work to their employer.

No matter how much the proprietary developer is getting paid, they gradually experience employer lock-in. Why is this? They are the natural experts of the software that they write. This expertise is valuable. So, developers get more and more valuable to a company the longer they work for it.

Employers give developers raises, but not all is well. If the propriety developer leaves their job, they will generally not be able to find as high-paying a job elsewhere, because they will not have the same relevant expertise. That means that there is no incentive on their current employer to pay them for all they contribute. They only need to pay them enough to keep them from quitting.

That means that if you have been working for a long time for a proprietary company, you probably aren’t getting paid enough or getting the benefits you deserve.

Now consider open source developers. By now it is clear that there is a large market for open source companies and freelance consulting. If you are a developer, you should get into that market.

If you are an open source developer, then you get to have the same access to the fruits of your labor as your employer. You are not alienated from your labor, in Marx’s sense of the word. Your employer can never take away the public record of your contributions to an open source project or your standing in the community. And, importantly, the skills you have learned from your hard work are transferable to any other job using the software you have developed.

The result is that even more so than other software developers, open source developers can have their pick of jobs. That means employers have to compete for them even more.

While many companies will shower their developers with perks to keep them on board, savvy open source developers will demand time to do core community work that provides them the intangible social relationships and technical skills that make them more valuable workers–for anyone using their software. And they will demand that new code be vetted through core community processes whenever possible, as opposed to being one off applications.

Proprietary software companies should fear these trends. More and more developers are going to choose open source jobs, so proprietary companies are going to have to pay more for worse programming talent. (This is one of many reasons why proprietary software is doomed.)

Open source companies, on the other hand, should embrace this trend and do their best to satisfy their developer’s demands. Ultimately, these company’s value will be in the superiority of their technology and their developers. The developers want their software to be excellent, because it is the software and their involvement in it, not the company employing them, that guarantees them employment. In the open source economy, this loyalty is not misplaced. Rather, employing companies need to align themselves with it. In so doing, they will achieve technological superiority and community fame. They may err from this path at their peril.

Follow

Get every new post delivered to your Inbox.

Join 938 other followers