Neo4j and Scala hacking notes
by Sebastian Benthall
This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.
Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.
I wrote some graphical hello world code to test things out in a file called Krow.scala (don’t ask). I’ll walk you through it:
import org.neo4j.kernel.EmbeddedGraphDatabase import org.neo4j.graphdb._ import collection.JavaConversions._
I wanted to code against an embedded database, rather than code against the Neo4j server, because I have big dreams of combining Neo4j with some other web framework and don’t like have to start and stop databases. So I needed EmbeddedGraphDatabase, which implements the GraphDatabaseService
interface, and persists its data to a directory of files.
I’ll talk about the JavaConversions bit later.
object Krow extends Application {
I am a lazy programmer who only bothers to encapsulate things into software architecture at the last minute. I’m also spoiled by Python and JavaScript and intimidated by the idea of code compilation. So initially I wanted to write this as an interpreted script so I wouldn’t have to think about it. But I’ve heard great things about sbt (simple-build-tool) so I figured I’d try it out.
Using sbt was definitely worth it, if only because it is really well documented and starting up my project with it got me back into the mindset of Java development enough to get Dwins to explain Maven repositories to me again. Adding dependencies to an sbt project involves writing Scala itself, which is a nice way to ease into the language.
But running my project in sbt meant I needed a main
method on by lame-o script. Ugh. That sounded like too much work for me, and args: Array[String]
looks ugly and typing it ruins my day.
Dwins recommended I try using Scala’s Application trait. He explained that this would take code from an object’s body and do some magic to turn it into a main
method. Rock on!
Of course, I didn’t bother to check the documentation or anything. Otherwise, I would have seen this:
The Application trait can be used to quickly turn objects into executable programs, but is not recommended.
For a language that is so emphatically Correct in its design, I give a lot of credit to whoever it was that had the balls to include this language feature so that newbs could hang themselves on it. If they hadn’t, I wouldn’t have had to confront hard truths about threading. (That’s foreshadowing)
println("start") val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
Sweet, a database I don’t have to start and stop on the command line! This var/graphdb
directory is made in the directory in which I run the program (for me, using sbt run
).
Next:
var tx: Transaction = neo.beginTx() var first : Node = null var second : Node = null try { first = neo.createNode() first.setProperty("name","first") second = neo.createNode() second.setProperty("name","second") first.createRelationshipTo(second, "isRelatedTo") tx.success() } finally { println("finished transaction 1") }
What I’m trying to do with this code is make two nodes and a relationship between them. Should be simple.
But it turns out that with Neo4j, all modifications to the database have to be done in a transaction context, and for that you have to do this business of creating a new Transaction:
A programmatically handled transaction. All modifying operations that work with the node space must be wrapped in a transaction. Transactions are thread confined. Transactions can either be handled programmatically, through this interface, or by a container through the Java Transaction API (JTA). The Transaction interface makes handling programmatic transactions easier than using JTA programmatically. Here’s the idiomatic use of programmatic transactions in Neo4j:
Transaction tx = graphDb.beginTx(); try { ... // any operation that works with the node space tx.success(); } finally { tx.finish(); }
No big deal.
This bit of code was a chance for Dwins to show me a Scala feature that makes the language shine. Check out this line:
first.createRelationshipTo(second, "isRelatedTo")
If you check the documentation for this method, you can see that I’m not using this method as expect. The Java type signature is:
Relationship createRelationshipTo(Node otherNode, RelationshipType type)
where RelationshipType is a Neo4j concept that’s what it sounds like. I suppose it is important to set apart from mere Properties for performance on traversals something. RelationshipTypes can be created dynamically and seem to more or less exist in the either, but you need to provide them when you create a relationship. All relationships are of a type.
In terms of their data content, though, RelationshipTypes are just wrappers around strings. Rather than doing this wrapping in the same line that I createRelationship
, Scala lets me establish a conversion from strings to RelationshipTypes in an elegant way.
You see, I lied. The above code would not have compiled had I not also included this earlier in the object’s definition:
implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)
This code uses Scala’s implicit conversions to define a conversion between Strings and RelationshipTypes.
DynamicRelationshipType.withName(x)
is one of Neo4j’s ways of making a new RelationshipType. Scala’s type inference means that the compiler knows that string2relationshipType
returns a RelationshipType.
Since I used the implicit
keyword, Scala knows that when a String is used in a method that expects a RelationshipType, it can use this function to convert it on the fly.
Check out all that majesty. Thanks, Scala!
Ok, so now I want to show that I was actually able to get something into the database. So here’s my node traversal and printing code.
tx = neo.beginTx() try{ val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL, "isRelatedTo", Direction.BOTH) for(node <- trav){ println(node.getProperty("name")) } tx.success() } finally { tx.finish() println("finished transaction 2") } neo.shutdown() println("done") }
Two observations:
traverse
takes a lot of arguments, most of which seem to be these awkwardly specified static variables. I bet there’s a way to use Scala features to wrap that and make it more elegant.- Check out that
for
loop. Concise syntax that takes an iterator. There’s one catch: Traverser is a Java.lang.Iterable iterator, whereas the loop syntax requires a scala.collection.Iterable. Remember thatimport scala.collection.JavaConversions._
line? That imported an implicit conversion from Java to Scala iterables.
All in all, pretty straightforward stuff, I thought. Here’s what I got when I used sbt to run this project:
> run [info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes... [warn] there were 1 deprecation warnings; re-run with -deprecation for details [warn] one warning found [info] Running Krow start finished transaction 1 finished transaction 2
That’s not what I wanted! Not only did I not get any printed acknowledgement of the nodes that I had made in the database, but program hangs and doesn’t finish.
What the hell?!
Asking Dwins about it, he tells me sagely about threads. Transactions need to be run in a single thread. The Application trait does a lot of bad stuff with threads. To be technically specific about it, it does…some really bad stuff with threads. I thought I had a handle on it when I started writing this blog post but instead I’m just going to copy/paste from the Application trait docs, which I should have read in the first place.
In practice the Application trait has a number of serious pitfalls:
* Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object.
Oh. Huh. That’s interesting.
It is recommended to use the App trait instead.
Now you’re talking. Let me just change that line to object Krow extends App {
and I’ll be cooking in no…
> run [info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes... [warn] there were 1 deprecation warnings; re-run with -deprecation for details [warn] one warning found [info] Running Krow start finished transaction 1 finished transaction 2
…time.
God dammit. There’s something else about App, which runs all the object code at initialization, which is causing a problem I guess. I asked Dwins what he thought.
Too much magic.
I guess I’m going to have to write a main
method after all.
After some further messing around with the code, I have something that runs and prints the desired lines.
While the code would compile, I got I wound up having to explicitly name the RelationshipType
type in the calls where I was trying to implicitly convert the strings; otherwise I got exceptions like this:
java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo
Does that make it an explicit conversion?
Overall, hacking around with this makes me excited about both Scala and Neo4j despite the setbacks and wrangling.
Complete working code appended below.
import org.neo4j.kernel.EmbeddedGraphDatabase import org.neo4j.graphdb._ import collection.JavaConversions._ object Krow { println("start") def main(args: Array[String]){ var first : Node = null var second : Node = null val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb") var tx: Transaction = neo.beginTx() implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x) try { first = neo.createNode() first.setProperty("name","first") second = neo.createNode() second.setProperty("name","second") first.createRelationshipTo(second, "isRelatedTo" : RelationshipType) tx.success() println("added nodes") } catch { case e: Exception => println(e) } finally { tx.finish() // wrap in try, finally println("finished transaction 1") } tx = neo.beginTx() try{ val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL, "isRelatedTo" : RelationshipType, Direction.BOTH) for(node <- trav){ println(node.getProperty("name")) } tx.success() } finally { tx.finish() println("finished transaction 2") } neo.shutdown() println("done") } }
Hey Seb, cool you got your Neo code working.
java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo
“var args” there is referring to a variadic method, I suppose you got it on the call to Node.traverse. I didn’t think the API docs I was looking at had mentioned a variadic method there – but I guess I missed the overload:
traverse(Traverser.Order traversalOrder, StopEvaluator stopEvaluator, ReturnableEvaluator returnableEvaluator, Object… relationshipTypesAndDirections)
“Object…”! Ouch. That means that method accepts any number of anything after the ReturnableEvaluator, even though there’s a note in the API that they need to be pairs of relationshiptypes and directions. As far as the Scala compiler can tell, passing a String doesn’t cause a problem (and thus doesn’t require an implicit, the “type ascription” you used . I guess you’ll be wanting that wrapper method. I’d probably make two separate traverse methods for the different traversal orders (node.breadthFirst and node.depthFirst,)use a little scala magic to make it accept functions for the returnable/stop evaluators, and use Tuple2 (aka “Pair”) for the relationshipTypes and directions.
If you want to get a bit fancier you could try the typesafe builder pattern: http://blog.rafaelferreira.net/2008/07/type-safe-builder-pattern-in-scala.html
Thanks for the explanation, David!
Great writeup.
You should definitely look into the new traversal API (see http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-traversal.html#_new_traversal_framework) and probably the cypher query language (which is actually also written in scala) (see http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html).
There is also a set of very nice scala bindings for Neo4j (from this guy: https://github.com/FaKod)
Glad you had fun with that and happy about you not giving up.
Cheers
Michael
Thanks, and thanks for the links. That new traversal API looks really smooth.
Cool writeup and great you solved it!
/peter
[…] the post: Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m […]
Regarding “Application”, look to the bottom of the comment in the Scala docs: “It is recommended to use the App trait instead”.