neo4j | Digifesto

Notes on using the neo4j-scala package, Part 1

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

To start with, I had trouble wrangling the dependencies. Spoiled by scripting languages, I’ve been half-assing my way around Maven for years, so I got burned a bit.

What happened was that in earlier messing around in my project, I had installed an earlier version of neo4j-scala from a different github repository. Don’t use that one. At the time of this writing, FaKoD‘s version is much more up to date and featureful.

I was getting errors that looked like this:

> [error] error while loading Neo4jWrapper, Scala signature Neo4jWrapper has
> wrong version
> [error]  expected: 5.0
> [error]  found: 4.1 in
> /home/sb/.ivy2/cache/org.neo4j/neo4j-scala/bundles/neo4j-scala-0.9.9-SNAPSHOT.jar(org/neo4j/scala/Neo4jWrapper.class)

The only relevant web pages I could find on this suggested that the problem had to due with having compiled the dependency in a different version of Scala. Since I had the Ubuntu package installed, which is pegged at 2.7.7, this seemed plausible. I went through a lot of flailing to reinstall Scala and rebuild the package, but to no avail.

That wasn’t the problem. Rather, when I asked him about it FaKoD patiently pointed out that older library has version 0.9.9-SNAPSHOT, whereas the newer one is version 0.1.0-SNAPSHOT. So, my sbt build configuration file has this line now:

libraryDependencies += "org.neo4j" % "neo4j-scala" % "0.1.0-SNAPSHOT"

Thanks to FaKoD’s walking me through these problems, I stopped getting cryptic errors and could start hacking.

Here’s what I had to start with, copying out of one of the neo4j-scala’s tests:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._
import org.neo4j.scala.{EmbeddedGraphDatabaseServiceProvider, Neo4jWrapper}

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
}

Running this in sbt, I get this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:6: class Krow needs to be abstract, /
since method neo4jStoreDir in trait EmbeddedGraphDatabaseServiceProvider /
of type => String is not defined

That’s because EmbeddedGraphDatabaseServiceProvider (this code is written by a German, I gather) has an abstract method that I haven’t defined.

What I find neat is that this is an abstract method–it’s a function that takes no arguments and returns a String. But Scala seems smart enough to allow this to be defined by either methods or more naturally variables. So, this compiles:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  val neo4jStoreDir = "var/graphdb"
}

but so does this:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  def neo4jStoreDir = {
    var a = "var/"
    var b = "graphdb"

    a + b
  }

(Functions in Scala can be defined by a block of code in curly braces, with the last line evaluated and returned.)

Next, I worked on rewriting my toy app, using this unittest as a guide.

Here was the code from my original experiment:

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()

    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")

      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)

      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   

      println("finished transaction 1")
    }

You could see why I would like it to be more concise. Here’s a first pass on what neo4j-scala let me whittle it down to:

    var first : Node = null
    var second : Node = null

    withTx {
      neo =>
          first = neo.gds.createNode()  
          first.setProperty("name","first")

          second = neo.gds.createNode()
          second.setProperty("name","second")

          first --> "isRelatedTo" --> second
    }

There is a lot of magic going on and it took me a while to get my head around it.

The point of withTx is to wrap around the try/success/finally pattern needed for most Neo4j transactions. Here’s the code for it:

  def withTx[T <: Any](operation: DatabaseService => T): T = {
    val tx = synchronized {
      ds.gds.beginTx
    }
    try {
      val ret = operation(ds)
      tx.success
      return ret
    } finally {
      tx.finish
    }
  }

Coming from years of JavaScript and Python, it was tough getting my head around this type signature. The syntax alone is daunting. But what I think it comes down to is this:

withTx takes a type parameter, T, which can be a subclass (<:) of Any.
It takes an argument, operation, which must be a function from something of type DatabaseService to something of type T.
It returns type T.

In practice, this means that the function can be called in a way that’s agnostic to the return type of its argument. But what is this DatabaseService argument?

In neo4j-scala, DatabaseService is a trait that wraps a Neo4j GraphDatabaseService. Then a GraphDatabaseServiceProvider wraps the DatabaseService. Application code is as far as I can tell expected to doubly inherit from both Neo4jWrapper, which handles the syntactic sugar, and a GraphDatabaseServiceProvider that provides the context for the sugar.

Which means that somewhere deep in the structure of our main object there is a DatabaseService that has real ultimate power over the database. withTx will find it for us, but we need to send it an operation that binds to it.

neo4j-scala also provides this helpful method, which operates in the context where that DatabaseService is available:

  def createNode(implicit ds: DatabaseService): Node = ds.gds.createNode

createNode‘s argument is implicit and so is plucked otherwise unbidden from its environment. And since Scala lets you call methods that have no arguments without parentheses, we can shorten the code further.

    withTx {
      implicit neo =>
          first = createNode
          first.setProperty("name","first")

          second = createNode
          second.setProperty("name","second")

          println("added nodes")

          // uses neo4j-scala special syntax
          first --> "isRelatedTo" --> second
    }

Notice that I had to put an implicit before neo in this code. When I didn’t, I got this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:23: /
 could not find implicit value for parameter ds: org.neo4j.scala.DatabaseService
[error]           first = createNode

What I think is happening is that in order to make the DatabaseService, neo, available as an implicit argument of the createNode method, we have to mark it as available with the implicit keyword.

See this page for reference:

The actual arguments that are eligible to be passed to an implicit parameter fall into two categories:

* First, eligible are all identifiers x that can be accessed at the point of the method call without a prefix and that denote an implicit definition or an implicit parameter.
* Second, eligible are also all members of companion modules of the implicit parameter’s type that are labeled implicit.

The other interesting thing going on here is this line:

          first --> "isRelatedTo" --> second

This makes a Neo4j relationship between first and second of type “isRelatedTo.”

I have no idea how the code that makes this happen works. Looking at it hurts my head. I think there may be black magic involved.

This has been slow going, since I’m learning as I’m going. I’m not done yet, though. The code I’m converting had some code to do a short traversal between my two nodes, printing their names. I’m going to leave that to Part 2.

Neo4j and Scala hacking notes

This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.

Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.

I wrote some graphical hello world code to test things out in a file called Krow.scala (don’t ask). I’ll walk you through it:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

I wanted to code against an embedded database, rather than code against the Neo4j server, because I have big dreams of combining Neo4j with some other web framework and don’t like have to start and stop databases. So I needed EmbeddedGraphDatabase, which implements the GraphDatabaseService interface, and persists its data to a directory of files.

I’ll talk about the JavaConversions bit later.

object Krow extends Application {

I am a lazy programmer who only bothers to encapsulate things into software architecture at the last minute. I’m also spoiled by Python and JavaScript and intimidated by the idea of code compilation. So initially I wanted to write this as an interpreted script so I wouldn’t have to think about it. But I’ve heard great things about sbt (simple-build-tool) so I figured I’d try it out.

Using sbt was definitely worth it, if only because it is really well documented and starting up my project with it got me back into the mindset of Java development enough to get Dwins to explain Maven repositories to me again. Adding dependencies to an sbt project involves writing Scala itself, which is a nice way to ease into the language.

But running my project in sbt meant I needed a main method on by lame-o script. Ugh. That sounded like too much work for me, and args: Array[String] looks ugly and typing it ruins my day.

Dwins recommended I try using Scala’s Application trait. He explained that this would take code from an object’s body and do some magic to turn it into a main method. Rock on!

Of course, I didn’t bother to check the documentation or anything. Otherwise, I would have seen this:

The Application trait can be used to quickly turn objects into executable programs, but is not recommended.

For a language that is so emphatically Correct in its design, I give a lot of credit to whoever it was that had the balls to include this language feature so that newbs could hang themselves on it. If they hadn’t, I wouldn’t have had to confront hard truths about threading. (That’s foreshadowing)

  println("start")

  val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")

Sweet, a database I don’t have to start and stop on the command line! This var/graphdb directory is made in the directory in which I run the program (for me, using sbt run).

  var tx: Transaction = neo.beginTx()

  var first : Node = null
  var second : Node = null

  try {
    first = neo.createNode()
    first.setProperty("name","first")
    
    second = neo.createNode()
    second.setProperty("name","second")

    first.createRelationshipTo(second, "isRelatedTo")

    tx.success()
  } finally {
    println("finished transaction 1")
  }

What I’m trying to do with this code is make two nodes and a relationship between them. Should be simple.

But it turns out that with Neo4j, all modifications to the database have to be done in a transaction context, and for that you have to do this business of creating a new Transaction:

A programmatically handled transaction. All modifying operations that work with the node space must be wrapped in a transaction. Transactions are thread confined. Transactions can either be handled programmatically, through this interface, or by a container through the Java Transaction API (JTA). The Transaction interface makes handling programmatic transactions easier than using JTA programmatically. Here’s the idiomatic use of programmatic transactions in Neo4j:
 Transaction tx = graphDb.beginTx();
 try
 {
        ... // any operation that works with the node space
     tx.success();
 }
 finally
 {
     tx.finish();
 }
 

No big deal.

This bit of code was a chance for Dwins to show me a Scala feature that makes the language shine. Check out this line:

first.createRelationshipTo(second, "isRelatedTo")

If you check the documentation for this method, you can see that I’m not using this method as expect. The Java type signature is:

Relationship createRelationshipTo(Node otherNode, RelationshipType type)

where RelationshipType is a Neo4j concept that’s what it sounds like. I suppose it is important to set apart from mere Properties for performance on traversals something. RelationshipTypes can be created dynamically and seem to more or less exist in the either, but you need to provide them when you create a relationship. All relationships are of a type.

In terms of their data content, though, RelationshipTypes are just wrappers around strings. Rather than doing this wrapping in the same line that I createRelationship, Scala lets me establish a conversion from strings to RelationshipTypes in an elegant way.

You see, I lied. The above code would not have compiled had I not also included this earlier in the object’s definition:

  implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

This code uses Scala’s implicit conversions to define a conversion between Strings and RelationshipTypes.

DynamicRelationshipType.withName(x) is one of Neo4j’s ways of making a new RelationshipType. Scala’s type inference means that the compiler knows that string2relationshipType returns a RelationshipType.

Since I used the implicit keyword, Scala knows that when a String is used in a method that expects a RelationshipType, it can use this function to convert it on the fly.

Check out all that majesty. Thanks, Scala!

Ok, so now I want to show that I was actually able to get something into the database. So here’s my node traversal and printing code.

  tx = neo.beginTx()

  try{
    
    val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                          StopEvaluator.END_OF_GRAPH,
                                          ReturnableEvaluator.ALL,
                                          "isRelatedTo",
                                          Direction.BOTH)

    for(node <- trav){
      println(node.getProperty("name"))
    }
    tx.success()
  } finally {
    tx.finish()
    println("finished transaction 2")
  }

  neo.shutdown()

  println("done")

}

Two observations:

traverse takes a lot of arguments, most of which seem to be these awkwardly specified static variables. I bet there’s a way to use Scala features to wrap that and make it more elegant.
Check out that for loop. Concise syntax that takes an iterator. There’s one catch: Traverser is a Java.lang.Iterable iterator, whereas the loop syntax requires a scala.collection.Iterable. Remember that import scala.collection.JavaConversions._ line? That imported an implicit conversion from Java to Scala iterables.

All in all, pretty straightforward stuff, I thought. Here’s what I got when I used sbt to run this project:

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

That’s not what I wanted! Not only did I not get any printed acknowledgement of the nodes that I had made in the database, but program hangs and doesn’t finish.

What the hell?!

Asking Dwins about it, he tells me sagely about threads. Transactions need to be run in a single thread. The Application trait does a lot of bad stuff with threads. To be technically specific about it, it does…some really bad stuff with threads. I thought I had a handle on it when I started writing this blog post but instead I’m just going to copy/paste from the Application trait docs, which I should have read in the first place.

In practice the Application trait has a number of serious pitfalls:

* Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object.

Oh. Huh. That’s interesting.

It is recommended to use the App trait instead.

Now you’re talking. Let me just change that line to object Krow extends App { and I’ll be cooking in no…

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

…time.

God dammit. There’s something else about App, which runs all the object code at initialization, which is causing a problem I guess. I asked Dwins what he thought.

Too much magic.

I guess I’m going to have to write a main method after all.

After some further messing around with the code, I have something that runs and prints the desired lines.

While the code would compile, I got I wound up having to explicitly name the RelationshipType type in the calls where I was trying to implicitly convert the strings; otherwise I got exceptions like this:

java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo

Does that make it an explicit conversion?

Overall, hacking around with this makes me excited about both Scala and Neo4j despite the setbacks and wrangling.

Complete working code appended below.

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

object Krow {

  println("start")

  def main(args: Array[String]){

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()
      
    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")
    
      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)
      
      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   
      
      println("finished transaction 1")
    }

    tx = neo.beginTx()

    try{
    
      val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                            StopEvaluator.END_OF_GRAPH,
                                            ReturnableEvaluator.ALL,
                                            "isRelatedTo" : RelationshipType,
                                            Direction.BOTH)

      for(node <- trav){
        println(node.getProperty("name"))
      }
      tx.success()
    } finally {
      tx.finish()
      println("finished transaction 2")
    }

    neo.shutdown()

    println("done")
  }
}

Digifesto

Tag: neo4j

September 21, 2011

Notes on using the neo4j-scala package, Part 1

September 15, 2011

Neo4j and Scala hacking notes