Digifesto

Category: open source software

We need help naming a software project

Speaking of computational argumentation, Dave Kush and I are starting a software project and we need a name for it.

The purpose of the software is to extract information from a large number of documents, and then merge this information together into a knowledge base. We think this could be pretty great because it would support:

  • Conflict detection and resolution. In the process of combining information from many sources into a single knowledge base, the system should be able to mark conflicts of information. That would indicate an inconsistency or controversy between the documents, which could be flagged for further investigation.
  • Naturally queryable aggregate knowledge. We anticipate being able to build a query interface that is a natural extension of this system: just run the query through the extraction process and compare the result for consistency with the knowledge base. This would make the system into a “dissonance engine,” useful for opposition research or the popping of filter bubbles.

I should say that neither of us knows exactly what we are doing. But Dave’s almost got his PhD in human syntax so I think we’ve got a shot at building a sweet parser. What’s more, we’ve got the will and plan. It will be open source, of course, and we’re eager for collaborators.

We have one problem:

We don’t know what to call it.

I can’t even make the GitHub account for our code until we have a good name. And until then we’ll be sending Python scripts to each other as email attachments and that will never get anywhere.

Please help us. Tell us what to name our project. If we use your name, we’ll do something awesome for you some day.

Scratch that. We’re calling it Bluestocking. The GitHub repo is here.

Academia vs. FOSS: The Good, The Bad, and the Ugly

Mel Chua has been pushing forward on the theme of FOSS culture in academia, and has gotten a lot of wonderful comments, many about why it’s not so simple to just port one culture over to the other. I want to try to compile items from Mel, comments on that post, and a few other sources. The question is: what are the salient differences between FOSS and academia?

I will proceed using the now-standard Spaghetti Western classification schema.

The Good

  • Universities tend to be more proactive about identifying and aiding newcomers that are struggling, as opposed to many FOSS projects that have high failure-and-dropout rates due to poorly designed scaffolding.
  • Academia is much more demographically inclusive. FOSS communities are notoriously imbalanced in terms of gender and race.

The Bad

  • The academic fear of having ones results scooped or stolen results in redundant, secrecy, and lonely effort. FOSS communities get around this by having good systems for attribution of incremental progress.
  • Despite scientific ideals, academic scientific research is getting less reproducible, and therefore less robust, because of closed code and data. FOSS work is often more reproducible (though not if its poorly documented).
  • Closed access academic journals hold many disciplines hostage by holding a monopoly on prestige. This is changing with the push for open access research, but this is still a significant issue. FOSS communities may care about community prestige, but often that prestige comes from community helpfulness or stake in a project. If metrics are used, they are often implicit ones extractable from the code repository itself, like Ohloh. Altmetrics are a solution to this problem.

The Ugly

  • In both FOSS and academia, a community of collaborators needs to form around shared interests and skills. But FOSS has come to exemplify the power of the distributed collaboration towards pragmatic goals. One is judged more by ones contributions than by ones academic pedigree, which means that FOSS does not have as much institutional gatekeeping.
  • Tenure committees look at papers published, not software developed. So there is little incentive for making robust software as part of the research process, however much that might allow reproducibility and encourage collaboration.
  • Since academics are often focused on “the frontier”, they don’t pay much attention to “building blocks”. Academic research culture tends to encourage this because it’s a race for discovery. FOSS regards care of the building blocks as a virtue and rewards the effort with stronger communities built on top of those blocks.
  • One reason for the difference between academia and FOSS is bandwidth. Since publications have page limits and are also the main means of academic communication, one wants to dedicate as much space as possible to juicy results at the expense of process documentation that would aid reproducibility. Since FOSS developed using digital communication tools with fewer constraints, it doesn’t have this problem. But academia doesn’t yet value contributions to this amorphous digital wealth of knowledge.

Have I left anything out?

Don’t use Venn diagrams like this

Today I saw this whitepaper by Esri about their use of open source software. It’s old, but still kept my attention.

There’s several reasons why this paper is interesting. One reason is that it reflects the trend of companies that once used FUD tactics around open source software to singing a soothing song of compatibilism. It makes an admirable effort to explain the differences between open source, proprietary software, and open standards to its enterprise client audience. That is the good news.

The bad news is that since this new compatibilism is just bending to market pressure after the rise of successful open source software complements, it lacks an understanding of why the open source development process has caused those market successes. Of course, proprietary companies have good reason to blur these lines, because otherwise they would need to acknowledge the existence of open source substitutes. In Esri’s case, that would mean products like the OpenGeo Suite.

I probably wouldn’t have written this post if it were not for this Venn diagram, which is presented with the caption A hybrid relationship:

I don’t think there is a way to interpret this diagram in a way that makes sense. It correctly identifies that Closed Source, Open Source, and Open Standards are different. But what do the overlapping regions represent? Presumabely they are meant to indicate that a system may both be open source and use open standards, or have open standards and be closed, or…be both open and closed?

It’s a subtle point but the semantics of set containment implied by the Venn diagram really don’t apply here. A system that’s a ‘hybrid’ between a closed and open software is not “both” closed and open the same way closed software that uses open standards is “both” closed and open. Rather, the hybrid system is just that, a hybrid, which means that its architecture is going to suffer tradeoffs as different components have different properties.

I don’t think that the author of this whitepaper was trying to deliberately obscure this idea. But I think that they didn’t know or care about it. That’s a problem, because it’s marketing material like this that clouds the picture about the value of open source. At a pointy-haired managerial level, one can answer the question “why aren’t you using more open source software” with a glib, “oh, we’re using a hybrid model, tailored to our needs.” But unless you actually understand what you’re talking about, your technical stack may still be full of buggy and unaccountable software, without you even knowing it.

The open source acqui-hire

There’s some interesting commentary around Twitter’s recent acquisition, Whisper Systems:

Twitter has begun to open source the software built by Whisper Systems, the enterprise mobile security startup it acquired just three weeks ago. …This move confirms the, well, whipsers that the Whisper Systems deal was mostly made for acqui-hire purposes.

Another acquisition like this that comes to mind is Etherpad, which Google bought (presumably to get the Etherpad team working on Wave) then open sourced. The logic of these acquisitions is that the talent is what matters, the IP is incidental or perhaps better served by an open community.

When I talk to actual or aspiring entrepreneurs, they often make the assumption that it would spoil their business to start building out their product open source. For one thing, they argue, there will be competitors who launch their own startups off of the open innovation. Then, they will miss their chance at a big exit because there will be no IP to tempt Facebook or whoever else to buy them out.

These open source acqui-hires defy these concerns. Demonstrating talent is part of what makes one acquirable. Logically, then, starting a competing company based on technology in which you don’t have talent makes you less competitive, from the perspective of a market exit. It’s hard to see what kind of competitive advantage the copycat company would have, really, since it doesn’t have the expertise in technology that comes from building it. If they do find some competitive advantage (perhaps they speak a foreign language and so are able to target a different market), then they are natural partners, not natural competitors.

One can take this argument further. Making open and available software is one of the best ways for a developer to make others aware of their talents and increase the demand (and value) for their own labor. So the talent in an open source company should be on average more valuable in case of an acqui-hire.

This doesn’t seem like a bad way out for a talented entrepreneur. Why, then, is this not a more well-known model for startups?

One reason is that the real winners in the startup scene are not the entrepreneurs. It’s the funders, and to the funders it is more worthwhile to invest in several different technologies with the small chance of selling one off big than to invest in the market value of their entrepreneurs. Because, after all, venture capitalists are in the same war for engineering talent as Google, Facebook, etc.. This should become less of an issue, however, as crowdfunding becomes more viable.

To Google Reader users

A lot of friends of mine were avid Google Reader users. For some of them, it was their primary social media tool. They had built a strong community around it. Naturally, they were attached to its user interface, features, and workflows. It was home to them.

Google recently ‘redesigned’ Google Reader in a way that blatantly forced Reader users to adopt Google+ as their social media platform. A lot of Reader devotees are pissed about this. They want their old technology back.

My first response to this is: What did you expect? What made Reader so special? It was just the first of several experiments in social media that Google’s used to edge into the Facebook’s market. (Reader, Buzz, Wave, now Google+). Of course, the industry logic is that your community should be dumped onto the newer platform, so that Google can capture the network effects of your participation. Your community is what will make their new technology so valuable to them!

Still not happy?

The problem is that Google Reader was a corporately operated platform, not a community operated one. You may not have know that you had other options. There are a lot of social media communities that have a lot of self-control, Metafilter being a particularly great one. (incidentally, Ask Metafilter has a good guide to Reader alternatives) There is also a lot of energy going into open source social media tools.

The most prominent of these is Diaspora, which raised a ridiculous amount of funding on Kickstarter when the New York Times wrote about its being a project. I stopped following it after the first press buzz, but maybe it’s time to start paying attention to it again. Since its community has recently announced that it is not vaporware, I decided to go ahead and join the diasp.org pod.

To my surprise, it’s pretty great! Smooth, intuitive interface, fast enough, seems to have all the bells and whistles you’d want and not a lot of cruft–basically all the stuff I care about on Google+. I’ve got a public profile. Plus, it has great tools for data export in case I want to pick up and move to a different pod.

Looking into it, Diaspora does not yet work as an RSS reader, though there is an open issue for it. A bit of a missed opportunity, IMO. Some other people are build an open-source Reader clone in response, which could more directly solve the Reader problem. Whatever the current technical limitations, though, they can be surmounted by some creative piping between services.

The point that I hope stands is that there is a hidden cost to a community investing in a technical infrastructure when it is being maintained by those that do not value your community. People’s anger at the Reader redesign demonstrates the value of the open source alternatives.

Notes on using the neo4j-scala package, Part 1

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

To start with, I had trouble wrangling the dependencies. Spoiled by scripting languages, I’ve been half-assing my way around Maven for years, so I got burned a bit.

What happened was that in earlier messing around in my project, I had installed an earlier version of neo4j-scala from a different github repository. Don’t use that one. At the time of this writing, FaKoD‘s version is much more up to date and featureful.

I was getting errors that looked like this:

> [error] error while loading Neo4jWrapper, Scala signature Neo4jWrapper has
> wrong version
> [error]  expected: 5.0
> [error]  found: 4.1 in
> /home/sb/.ivy2/cache/org.neo4j/neo4j-scala/bundles/neo4j-scala-0.9.9-SNAPSHOT.jar(org/neo4j/scala/Neo4jWrapper.class)

The only relevant web pages I could find on this suggested that the problem had to due with having compiled the dependency in a different version of Scala. Since I had the Ubuntu package installed, which is pegged at 2.7.7, this seemed plausible. I went through a lot of flailing to reinstall Scala and rebuild the package, but to no avail.

That wasn’t the problem. Rather, when I asked him about it FaKoD patiently pointed out that older library has version 0.9.9-SNAPSHOT, whereas the newer one is version 0.1.0-SNAPSHOT. So, my sbt build configuration file has this line now:

libraryDependencies += "org.neo4j" % "neo4j-scala" % "0.1.0-SNAPSHOT"

Thanks to FaKoD’s walking me through these problems, I stopped getting cryptic errors and could start hacking.

Here’s what I had to start with, copying out of one of the neo4j-scala’s tests:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._
import org.neo4j.scala.{EmbeddedGraphDatabaseServiceProvider, Neo4jWrapper}

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
}

Running this in sbt, I get this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:6: class Krow needs to be abstract, /
since method neo4jStoreDir in trait EmbeddedGraphDatabaseServiceProvider /
of type => String is not defined

That’s because EmbeddedGraphDatabaseServiceProvider (this code is written by a German, I gather) has an abstract method that I haven’t defined.

What I find neat is that this is an abstract method–it’s a function that takes no arguments and returns a String. But Scala seems smart enough to allow this to be defined by either methods or more naturally variables. So, this compiles:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  val neo4jStoreDir = "var/graphdb"
}

but so does this:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  def neo4jStoreDir = {
    var a = "var/"
    var b = "graphdb"

    a + b
  }

(Functions in Scala can be defined by a block of code in curly braces, with the last line evaluated and returned.)

Next, I worked on rewriting my toy app, using this unittest as a guide.

Here was the code from my original experiment:

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()

    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")

      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)

      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   

      println("finished transaction 1")
    }

You could see why I would like it to be more concise. Here’s a first pass on what neo4j-scala let me whittle it down to:

    var first : Node = null
    var second : Node = null

    withTx {
      neo =>
          first = neo.gds.createNode()  
          first.setProperty("name","first")

          second = neo.gds.createNode()
          second.setProperty("name","second")

          first --> "isRelatedTo" --> second
    }

There is a lot of magic going on and it took me a while to get my head around it.

The point of withTx is to wrap around the try/success/finally pattern needed for most Neo4j transactions. Here’s the code for it:

  def withTx[T <: Any](operation: DatabaseService => T): T = {
    val tx = synchronized {
      ds.gds.beginTx
    }
    try {
      val ret = operation(ds)
      tx.success
      return ret
    } finally {
      tx.finish
    }
  }

Coming from years of JavaScript and Python, it was tough getting my head around this type signature. The syntax alone is daunting. But what I think it comes down to is this:

  • withTx takes a type parameter, T, which can be a subclass (<:) of Any.
  • It takes an argument, operation, which must be a function from something of type DatabaseService to something of type T.
  • It returns type T.

In practice, this means that the function can be called in a way that’s agnostic to the return type of its argument. But what is this DatabaseService argument?

In neo4j-scala, DatabaseService is a trait that wraps a Neo4j GraphDatabaseService. Then a GraphDatabaseServiceProvider wraps the DatabaseService. Application code is as far as I can tell expected to doubly inherit from both Neo4jWrapper, which handles the syntactic sugar, and a GraphDatabaseServiceProvider that provides the context for the sugar.

Which means that somewhere deep in the structure of our main object there is a DatabaseService that has real ultimate power over the database. withTx will find it for us, but we need to send it an operation that binds to it.

neo4j-scala also provides this helpful method, which operates in the context where that DatabaseService is available:

  def createNode(implicit ds: DatabaseService): Node = ds.gds.createNode

createNode‘s argument is implicit and so is plucked otherwise unbidden from its environment. And since Scala lets you call methods that have no arguments without parentheses, we can shorten the code further.

    withTx {
      implicit neo =>
          first = createNode
          first.setProperty("name","first")

          second = createNode
          second.setProperty("name","second")

          println("added nodes")

          // uses neo4j-scala special syntax
          first --> "isRelatedTo" --> second
    }

Notice that I had to put an implicit before neo in this code. When I didn’t, I got this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:23: /
 could not find implicit value for parameter ds: org.neo4j.scala.DatabaseService
[error]           first = createNode

What I think is happening is that in order to make the DatabaseService, neo, available as an implicit argument of the createNode method, we have to mark it as available with the implicit keyword.

See this page for reference:

The actual arguments that are eligible to be passed to an implicit parameter fall into two categories:

* First, eligible are all identifiers x that can be accessed at the point of the method call without a prefix and that denote an implicit definition or an implicit parameter.
* Second, eligible are also all members of companion modules of the implicit parameter’s type that are labeled implicit.

The other interesting thing going on here is this line:

          first --> "isRelatedTo" --> second

This makes a Neo4j relationship between first and second of type “isRelatedTo.”

I have no idea how the code that makes this happen works. Looking at it hurts my head. I think there may be black magic involved.

This has been slow going, since I’m learning as I’m going. I’m not done yet, though. The code I’m converting had some code to do a short traversal between my two nodes, printing their names. I’m going to leave that to Part 2.

Neo4j and Scala hacking notes

This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.

Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.

I wrote some graphical hello world code to test things out in a file called Krow.scala (don’t ask). I’ll walk you through it:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

I wanted to code against an embedded database, rather than code against the Neo4j server, because I have big dreams of combining Neo4j with some other web framework and don’t like have to start and stop databases. So I needed EmbeddedGraphDatabase, which implements the GraphDatabaseService interface, and persists its data to a directory of files.

I’ll talk about the JavaConversions bit later.

object Krow extends Application {

I am a lazy programmer who only bothers to encapsulate things into software architecture at the last minute. I’m also spoiled by Python and JavaScript and intimidated by the idea of code compilation. So initially I wanted to write this as an interpreted script so I wouldn’t have to think about it. But I’ve heard great things about sbt (simple-build-tool) so I figured I’d try it out.

Using sbt was definitely worth it, if only because it is really well documented and starting up my project with it got me back into the mindset of Java development enough to get Dwins to explain Maven repositories to me again. Adding dependencies to an sbt project involves writing Scala itself, which is a nice way to ease into the language.

But running my project in sbt meant I needed a main method on by lame-o script. Ugh. That sounded like too much work for me, and args: Array[String] looks ugly and typing it ruins my day.

Dwins recommended I try using Scala’s Application trait. He explained that this would take code from an object’s body and do some magic to turn it into a main method. Rock on!

Of course, I didn’t bother to check the documentation or anything. Otherwise, I would have seen this:

The Application trait can be used to quickly turn objects into executable programs, but is not recommended.

For a language that is so emphatically Correct in its design, I give a lot of credit to whoever it was that had the balls to include this language feature so that newbs could hang themselves on it. If they hadn’t, I wouldn’t have had to confront hard truths about threading. (That’s foreshadowing)

  println("start")

  val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")

Sweet, a database I don’t have to start and stop on the command line! This var/graphdb directory is made in the directory in which I run the program (for me, using sbt run).

Next:

  var tx: Transaction = neo.beginTx()

  var first : Node = null
  var second : Node = null

  try {
    first = neo.createNode()
    first.setProperty("name","first")
    
    second = neo.createNode()
    second.setProperty("name","second")

    first.createRelationshipTo(second, "isRelatedTo")

    tx.success()
  } finally {
    println("finished transaction 1")
  }

What I’m trying to do with this code is make two nodes and a relationship between them. Should be simple.

But it turns out that with Neo4j, all modifications to the database have to be done in a transaction context, and for that you have to do this business of creating a new Transaction:

A programmatically handled transaction. All modifying operations that work with the node space must be wrapped in a transaction. Transactions are thread confined. Transactions can either be handled programmatically, through this interface, or by a container through the Java Transaction API (JTA). The Transaction interface makes handling programmatic transactions easier than using JTA programmatically. Here’s the idiomatic use of programmatic transactions in Neo4j:

 Transaction tx = graphDb.beginTx();
 try
 {
        ... // any operation that works with the node space
     tx.success();
 }
 finally
 {
     tx.finish();
 }
 

No big deal.

This bit of code was a chance for Dwins to show me a Scala feature that makes the language shine. Check out this line:

first.createRelationshipTo(second, "isRelatedTo")

If you check the documentation for this method, you can see that I’m not using this method as expect. The Java type signature is:

Relationship createRelationshipTo(Node otherNode, RelationshipType type)

where RelationshipType is a Neo4j concept that’s what it sounds like. I suppose it is important to set apart from mere Properties for performance on traversals something. RelationshipTypes can be created dynamically and seem to more or less exist in the either, but you need to provide them when you create a relationship. All relationships are of a type.

In terms of their data content, though, RelationshipTypes are just wrappers around strings. Rather than doing this wrapping in the same line that I createRelationship, Scala lets me establish a conversion from strings to RelationshipTypes in an elegant way.

You see, I lied. The above code would not have compiled had I not also included this earlier in the object’s definition:

  implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

This code uses Scala’s implicit conversions to define a conversion between Strings and RelationshipTypes.

DynamicRelationshipType.withName(x) is one of Neo4j’s ways of making a new RelationshipType. Scala’s type inference means that the compiler knows that string2relationshipType returns a RelationshipType.

Since I used the implicit keyword, Scala knows that when a String is used in a method that expects a RelationshipType, it can use this function to convert it on the fly.

Check out all that majesty. Thanks, Scala!

Ok, so now I want to show that I was actually able to get something into the database. So here’s my node traversal and printing code.

  tx = neo.beginTx()

  try{
    
    val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                          StopEvaluator.END_OF_GRAPH,
                                          ReturnableEvaluator.ALL,
                                          "isRelatedTo",
                                          Direction.BOTH)

    for(node <- trav){
      println(node.getProperty("name"))
    }
    tx.success()
  } finally {
    tx.finish()
    println("finished transaction 2")
  }

  neo.shutdown()

  println("done")

}

Two observations:

  • traverse takes a lot of arguments, most of which seem to be these awkwardly specified static variables. I bet there’s a way to use Scala features to wrap that and make it more elegant.
  • Check out that for loop. Concise syntax that takes an iterator. There’s one catch: Traverser is a Java.lang.Iterable iterator, whereas the loop syntax requires a scala.collection.Iterable. Remember that import scala.collection.JavaConversions._ line? That imported an implicit conversion from Java to Scala iterables.

All in all, pretty straightforward stuff, I thought. Here’s what I got when I used sbt to run this project:

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

That’s not what I wanted! Not only did I not get any printed acknowledgement of the nodes that I had made in the database, but program hangs and doesn’t finish.

What the hell?!

Asking Dwins about it, he tells me sagely about threads. Transactions need to be run in a single thread. The Application trait does a lot of bad stuff with threads. To be technically specific about it, it does…some really bad stuff with threads. I thought I had a handle on it when I started writing this blog post but instead I’m just going to copy/paste from the Application trait docs, which I should have read in the first place.

In practice the Application trait has a number of serious pitfalls:

* Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object.

Oh. Huh. That’s interesting.

It is recommended to use the App trait instead.

Now you’re talking. Let me just change that line to object Krow extends App { and I’ll be cooking in no…

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

…time.

God dammit. There’s something else about App, which runs all the object code at initialization, which is causing a problem I guess. I asked Dwins what he thought.

Too much magic.

I guess I’m going to have to write a main method after all.


After some further messing around with the code, I have something that runs and prints the desired lines.

While the code would compile, I got I wound up having to explicitly name the RelationshipType type in the calls where I was trying to implicitly convert the strings; otherwise I got exceptions like this:

java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo

Does that make it an explicit conversion?

Overall, hacking around with this makes me excited about both Scala and Neo4j despite the setbacks and wrangling.

Complete working code appended below.


import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

object Krow {

  println("start")

  def main(args: Array[String]){

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()
      
    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")
    
      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)
      
      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   
      
      println("finished transaction 1")
    }

    tx = neo.beginTx()

    try{
    
      val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                            StopEvaluator.END_OF_GRAPH,
                                            ReturnableEvaluator.ALL,
                                            "isRelatedTo" : RelationshipType,
                                            Direction.BOTH)

      for(node <- trav){
        println(node.getProperty("name"))
      }
      tx.success()
    } finally {
      tx.finish()
      println("finished transaction 2")
    }

    neo.shutdown()

    println("done")
  }
}

Open Source is Software’s Labor Movement

Software developers often have high starting salaries, but wind up not getting paid as much as managers and others who work in software. Why is this? I’d argue that it’s not because software development is any less essential or important to the software business. Rather, the reason is that software developers that work for a proprietary company give away the rights to their work to their employer.

No matter how much the proprietary developer is getting paid, they gradually experience employer lock-in. Why is this? They are the natural experts of the software that they write. This expertise is valuable. So, developers get more and more valuable to a company the longer they work for it.

Employers give developers raises, but not all is well. If the propriety developer leaves their job, they will generally not be able to find as high-paying a job elsewhere, because they will not have the same relevant expertise. That means that there is no incentive on their current employer to pay them for all they contribute. They only need to pay them enough to keep them from quitting.

That means that if you have been working for a long time for a proprietary company, you probably aren’t getting paid enough or getting the benefits you deserve.

Now consider open source developers. By now it is clear that there is a large market for open source companies and freelance consulting. If you are a developer, you should get into that market.

If you are an open source developer, then you get to have the same access to the fruits of your labor as your employer. You are not alienated from your labor, in Marx’s sense of the word. Your employer can never take away the public record of your contributions to an open source project or your standing in the community. And, importantly, the skills you have learned from your hard work are transferable to any other job using the software you have developed.

The result is that even more so than other software developers, open source developers can have their pick of jobs. That means employers have to compete for them even more.

While many companies will shower their developers with perks to keep them on board, savvy open source developers will demand time to do core community work that provides them the intangible social relationships and technical skills that make them more valuable workers–for anyone using their software. And they will demand that new code be vetted through core community processes whenever possible, as opposed to being one off applications.

Proprietary software companies should fear these trends. More and more developers are going to choose open source jobs, so proprietary companies are going to have to pay more for worse programming talent. (This is one of many reasons why proprietary software is doomed.)

Open source companies, on the other hand, should embrace this trend and do their best to satisfy their developer’s demands. Ultimately, these company’s value will be in the superiority of their technology and their developers. The developers want their software to be excellent, because it is the software and their involvement in it, not the company employing them, that guarantees them employment. In the open source economy, this loyalty is not misplaced. Rather, employing companies need to align themselves with it. In so doing, they will achieve technological superiority and community fame. They may err from this path at their peril.

Oh but of course: comparison of Google and Open Source innovation models

What is below is mere speculation. I don’t know anything about Google.

I guess this just is an admission of naivete, but one of the most surprising things about Google+ to me right now is that it’s been a surprise. Wave was a huge technical investment that flopped. Buzz happened and was meek. For some reason, they became old news. But Google is super organized on the inside, and can actually afford to annoy people mildly or lose cool points for the sake of conducting an experiment. Does Google+ use Wave technology? Well who knows because it’s locked down behind the NDA. Was it inspired by Buzz usage and feedback? Is it powered by Buzz data? It’s safe to assume so, right?

Part of Google’s power is that it isn’t just a startup operating in the market. It is a whole bunch of startups operating in a coordinated collective. A lot of its advantage is in its efficiency as an incubator, which means its ability to recycle old technology choices, talent, expertise. It’s other advantage is to have a business model that’s going to scale up with the amount of data collected, which means it can focus on user experience).

The only comparable thing I know of in terms of efficient recycling of technology and content is the “open source world.” People build technology for their application or to scratch their itch, and it sticks around for reuse. if a particular funder drops a project that has reached half of its potential, the same team can find a new funder to pick it up and take it the rest of the way. And hackers who learn something on one project are under no legal restrictions to take that human capital with them on to the next.

But there isn’t a unified business model for open source development. Its still struggling for funding, fighting the inherent free rider problems and figuring out how to get funders who are willing to take risk.

Interview on National Broadband Map architecture

Interview with Juan MarĂ­n Otero from Computech on development of the National Broadband Map, released by the National Telecommunications and Information Administration and the Federal Communications Commission.

The selection of open source software has been critical in my opinion. The entire project has been constructed with open source tools, from the operating system to the last JavaScript library used in the presentation layer. The main reasons are the flexibility and easy development that these solutions give us. When we evaluated the requirements of this project, some open source tools began with a clear advantage (functionality and easy to deploy) over most of the commercial tools.

Developing on open source has given us the flexibility needed in a project like this without having to worry about whether we had enough licenses per computing “core” or not, and choosing the right components for each requirement, rather than having to adopt monolithic suites with great functionality but poor performance and little chance of adaptation. One of the advantages that may seem trivial but in this work environment is very important is that I have the whole project on my laptop, and I can make changes from a Starbucks, if necessary (and I use this example because it happened to me once on the project).

Follow

Get every new post delivered to your Inbox.

Join 829 other followers