Notes on using the neo4j-scala package, Part 1

by Sebastian Benthall

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

To start with, I had trouble wrangling the dependencies. Spoiled by scripting languages, I’ve been half-assing my way around Maven for years, so I got burned a bit.

What happened was that in earlier messing around in my project, I had installed an earlier version of neo4j-scala from a different github repository. Don’t use that one. At the time of this writing, FaKoD‘s version is much more up to date and featureful.

I was getting errors that looked like this:

> [error] error while loading Neo4jWrapper, Scala signature Neo4jWrapper has
> wrong version
> [error]  expected: 5.0
> [error]  found: 4.1 in
> /home/sb/.ivy2/cache/org.neo4j/neo4j-scala/bundles/neo4j-scala-0.9.9-SNAPSHOT.jar(org/neo4j/scala/Neo4jWrapper.class)

The only relevant web pages I could find on this suggested that the problem had to due with having compiled the dependency in a different version of Scala. Since I had the Ubuntu package installed, which is pegged at 2.7.7, this seemed plausible. I went through a lot of flailing to reinstall Scala and rebuild the package, but to no avail.

That wasn’t the problem. Rather, when I asked him about it FaKoD patiently pointed out that older library has version 0.9.9-SNAPSHOT, whereas the newer one is version 0.1.0-SNAPSHOT. So, my sbt build configuration file has this line now:

libraryDependencies += "org.neo4j" % "neo4j-scala" % "0.1.0-SNAPSHOT"

Thanks to FaKoD’s walking me through these problems, I stopped getting cryptic errors and could start hacking.

Here’s what I had to start with, copying out of one of the neo4j-scala’s tests:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._
import org.neo4j.scala.{EmbeddedGraphDatabaseServiceProvider, Neo4jWrapper}

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
}

Running this in sbt, I get this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:6: class Krow needs to be abstract, /
since method neo4jStoreDir in trait EmbeddedGraphDatabaseServiceProvider /
of type => String is not defined

That’s because EmbeddedGraphDatabaseServiceProvider (this code is written by a German, I gather) has an abstract method that I haven’t defined.

What I find neat is that this is an abstract method–it’s a function that takes no arguments and returns a String. But Scala seems smart enough to allow this to be defined by either methods or more naturally variables. So, this compiles:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  val neo4jStoreDir = "var/graphdb"
}

but so does this:

class Krow extends Neo4jWrapper with EmbeddedGraphDatabaseServiceProvider {
  def neo4jStoreDir = {
    var a = "var/"
    var b = "graphdb"

    a + b
  }

(Functions in Scala can be defined by a block of code in curly braces, with the last line evaluated and returned.)

Next, I worked on rewriting my toy app, using this unittest as a guide.

Here was the code from my original experiment:

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()

    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")

      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)

      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   

      println("finished transaction 1")
    }

You could see why I would like it to be more concise. Here’s a first pass on what neo4j-scala let me whittle it down to:

    var first : Node = null
    var second : Node = null

    withTx {
      neo =>
          first = neo.gds.createNode()  
          first.setProperty("name","first")

          second = neo.gds.createNode()
          second.setProperty("name","second")

          first --> "isRelatedTo" --> second
    }

There is a lot of magic going on and it took me a while to get my head around it.

The point of withTx is to wrap around the try/success/finally pattern needed for most Neo4j transactions. Here’s the code for it:

  def withTx[T <: Any](operation: DatabaseService => T): T = {
    val tx = synchronized {
      ds.gds.beginTx
    }
    try {
      val ret = operation(ds)
      tx.success
      return ret
    } finally {
      tx.finish
    }
  }

Coming from years of JavaScript and Python, it was tough getting my head around this type signature. The syntax alone is daunting. But what I think it comes down to is this:

  • withTx takes a type parameter, T, which can be a subclass (<:) of Any.
  • It takes an argument, operation, which must be a function from something of type DatabaseService to something of type T.
  • It returns type T.

In practice, this means that the function can be called in a way that’s agnostic to the return type of its argument. But what is this DatabaseService argument?

In neo4j-scala, DatabaseService is a trait that wraps a Neo4j GraphDatabaseService. Then a GraphDatabaseServiceProvider wraps the DatabaseService. Application code is as far as I can tell expected to doubly inherit from both Neo4jWrapper, which handles the syntactic sugar, and a GraphDatabaseServiceProvider that provides the context for the sugar.

Which means that somewhere deep in the structure of our main object there is a DatabaseService that has real ultimate power over the database. withTx will find it for us, but we need to send it an operation that binds to it.

neo4j-scala also provides this helpful method, which operates in the context where that DatabaseService is available:

  def createNode(implicit ds: DatabaseService): Node = ds.gds.createNode

createNode‘s argument is implicit and so is plucked otherwise unbidden from its environment. And since Scala lets you call methods that have no arguments without parentheses, we can shorten the code further.

    withTx {
      implicit neo =>
          first = createNode
          first.setProperty("name","first")

          second = createNode
          second.setProperty("name","second")

          println("added nodes")

          // uses neo4j-scala special syntax
          first --> "isRelatedTo" --> second
    }

Notice that I had to put an implicit before neo in this code. When I didn’t, I got this error:

[error] /home/sb/dev/krow/src/main/scala/Krow.scala:23: /
 could not find implicit value for parameter ds: org.neo4j.scala.DatabaseService
[error]           first = createNode

What I think is happening is that in order to make the DatabaseService, neo, available as an implicit argument of the createNode method, we have to mark it as available with the implicit keyword.

See this page for reference:

The actual arguments that are eligible to be passed to an implicit parameter fall into two categories:

* First, eligible are all identifiers x that can be accessed at the point of the method call without a prefix and that denote an implicit definition or an implicit parameter.
* Second, eligible are also all members of companion modules of the implicit parameter’s type that are labeled implicit.

The other interesting thing going on here is this line:

          first --> "isRelatedTo" --> second

This makes a Neo4j relationship between first and second of type “isRelatedTo.”

I have no idea how the code that makes this happen works. Looking at it hurts my head. I think there may be black magic involved.

This has been slow going, since I’m learning as I’m going. I’m not done yet, though. The code I’m converting had some code to do a short traversal between my two nodes, printing their names. I’m going to leave that to Part 2.