Digifesto

Category: open source software

Neo4j and Scala hacking notes

This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.

Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.

I wrote some graphical hello world code to test things out in a file called Krow.scala (don’t ask). I’ll walk you through it:

import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

I wanted to code against an embedded database, rather than code against the Neo4j server, because I have big dreams of combining Neo4j with some other web framework and don’t like have to start and stop databases. So I needed EmbeddedGraphDatabase, which implements the GraphDatabaseService interface, and persists its data to a directory of files.

I’ll talk about the JavaConversions bit later.

object Krow extends Application {

I am a lazy programmer who only bothers to encapsulate things into software architecture at the last minute. I’m also spoiled by Python and JavaScript and intimidated by the idea of code compilation. So initially I wanted to write this as an interpreted script so I wouldn’t have to think about it. But I’ve heard great things about sbt (simple-build-tool) so I figured I’d try it out.

Using sbt was definitely worth it, if only because it is really well documented and starting up my project with it got me back into the mindset of Java development enough to get Dwins to explain Maven repositories to me again. Adding dependencies to an sbt project involves writing Scala itself, which is a nice way to ease into the language.

But running my project in sbt meant I needed a main method on by lame-o script. Ugh. That sounded like too much work for me, and args: Array[String] looks ugly and typing it ruins my day.

Dwins recommended I try using Scala’s Application trait. He explained that this would take code from an object’s body and do some magic to turn it into a main method. Rock on!

Of course, I didn’t bother to check the documentation or anything. Otherwise, I would have seen this:

The Application trait can be used to quickly turn objects into executable programs, but is not recommended.

For a language that is so emphatically Correct in its design, I give a lot of credit to whoever it was that had the balls to include this language feature so that newbs could hang themselves on it. If they hadn’t, I wouldn’t have had to confront hard truths about threading. (That’s foreshadowing)

  println("start")

  val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")

Sweet, a database I don’t have to start and stop on the command line! This var/graphdb directory is made in the directory in which I run the program (for me, using sbt run).

Next:

  var tx: Transaction = neo.beginTx()

  var first : Node = null
  var second : Node = null

  try {
    first = neo.createNode()
    first.setProperty("name","first")
    
    second = neo.createNode()
    second.setProperty("name","second")

    first.createRelationshipTo(second, "isRelatedTo")

    tx.success()
  } finally {
    println("finished transaction 1")
  }

What I’m trying to do with this code is make two nodes and a relationship between them. Should be simple.

But it turns out that with Neo4j, all modifications to the database have to be done in a transaction context, and for that you have to do this business of creating a new Transaction:

A programmatically handled transaction. All modifying operations that work with the node space must be wrapped in a transaction. Transactions are thread confined. Transactions can either be handled programmatically, through this interface, or by a container through the Java Transaction API (JTA). The Transaction interface makes handling programmatic transactions easier than using JTA programmatically. Here’s the idiomatic use of programmatic transactions in Neo4j:

 Transaction tx = graphDb.beginTx();
 try
 {
        ... // any operation that works with the node space
     tx.success();
 }
 finally
 {
     tx.finish();
 }
 

No big deal.

This bit of code was a chance for Dwins to show me a Scala feature that makes the language shine. Check out this line:

first.createRelationshipTo(second, "isRelatedTo")

If you check the documentation for this method, you can see that I’m not using this method as expect. The Java type signature is:

Relationship createRelationshipTo(Node otherNode, RelationshipType type)

where RelationshipType is a Neo4j concept that’s what it sounds like. I suppose it is important to set apart from mere Properties for performance on traversals something. RelationshipTypes can be created dynamically and seem to more or less exist in the either, but you need to provide them when you create a relationship. All relationships are of a type.

In terms of their data content, though, RelationshipTypes are just wrappers around strings. Rather than doing this wrapping in the same line that I createRelationship, Scala lets me establish a conversion from strings to RelationshipTypes in an elegant way.

You see, I lied. The above code would not have compiled had I not also included this earlier in the object’s definition:

  implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

This code uses Scala’s implicit conversions to define a conversion between Strings and RelationshipTypes.

DynamicRelationshipType.withName(x) is one of Neo4j’s ways of making a new RelationshipType. Scala’s type inference means that the compiler knows that string2relationshipType returns a RelationshipType.

Since I used the implicit keyword, Scala knows that when a String is used in a method that expects a RelationshipType, it can use this function to convert it on the fly.

Check out all that majesty. Thanks, Scala!

Ok, so now I want to show that I was actually able to get something into the database. So here’s my node traversal and printing code.

  tx = neo.beginTx()

  try{
    
    val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                          StopEvaluator.END_OF_GRAPH,
                                          ReturnableEvaluator.ALL,
                                          "isRelatedTo",
                                          Direction.BOTH)

    for(node <- trav){
      println(node.getProperty("name"))
    }
    tx.success()
  } finally {
    tx.finish()
    println("finished transaction 2")
  }

  neo.shutdown()

  println("done")

}

Two observations:

  • traverse takes a lot of arguments, most of which seem to be these awkwardly specified static variables. I bet there’s a way to use Scala features to wrap that and make it more elegant.
  • Check out that for loop. Concise syntax that takes an iterator. There’s one catch: Traverser is a Java.lang.Iterable iterator, whereas the loop syntax requires a scala.collection.Iterable. Remember that import scala.collection.JavaConversions._ line? That imported an implicit conversion from Java to Scala iterables.

All in all, pretty straightforward stuff, I thought. Here’s what I got when I used sbt to run this project:

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

That’s not what I wanted! Not only did I not get any printed acknowledgement of the nodes that I had made in the database, but program hangs and doesn’t finish.

What the hell?!

Asking Dwins about it, he tells me sagely about threads. Transactions need to be run in a single thread. The Application trait does a lot of bad stuff with threads. To be technically specific about it, it does…some really bad stuff with threads. I thought I had a handle on it when I started writing this blog post but instead I’m just going to copy/paste from the Application trait docs, which I should have read in the first place.

In practice the Application trait has a number of serious pitfalls:

* Threaded code that references the object will block until static initialization is complete. However, because the entire execution of an object extending Application takes place during static initialization, concurrent code will always deadlock if it must synchronize with the enclosing object.

Oh. Huh. That’s interesting.

It is recommended to use the App trait instead.

Now you’re talking. Let me just change that line to object Krow extends App { and I’ll be cooking in no…

> run
[info] Compiling 1 Scala source to /home/sb/dev/krow/target/scala-2.9.1.final/classes...
[warn] there were 1 deprecation warnings; re-run with -deprecation for details
[warn] one warning found
[info] Running Krow 
start
finished transaction 1
finished transaction 2

…time.

God dammit. There’s something else about App, which runs all the object code at initialization, which is causing a problem I guess. I asked Dwins what he thought.

Too much magic.

I guess I’m going to have to write a main method after all.


After some further messing around with the code, I have something that runs and prints the desired lines.

While the code would compile, I got I wound up having to explicitly name the RelationshipType type in the calls where I was trying to implicitly convert the strings; otherwise I got exceptions like this:

java.lang.IllegalArgumentException: Expected RelationshipType at var args pos 0, found isRelatedTo

Does that make it an explicit conversion?

Overall, hacking around with this makes me excited about both Scala and Neo4j despite the setbacks and wrangling.

Complete working code appended below.


import org.neo4j.kernel.EmbeddedGraphDatabase
import org.neo4j.graphdb._
import collection.JavaConversions._

object Krow {

  println("start")

  def main(args: Array[String]){

    var first : Node = null
    var second : Node = null

    val neo: GraphDatabaseService = new EmbeddedGraphDatabase("var/graphdb")
    var tx: Transaction = neo.beginTx()
      
    implicit def string2relationshipType(x: String) = DynamicRelationshipType.withName(x)

    try {
      first = neo.createNode()
      first.setProperty("name","first")
    
      second = neo.createNode()
      second.setProperty("name","second")

      first.createRelationshipTo(second, "isRelatedTo" : RelationshipType)
      
      tx.success()
      println("added nodes")
    } catch {
      case e: Exception => println(e)
    } finally {
      tx.finish() // wrap in try, finally   
      
      println("finished transaction 1")
    }

    tx = neo.beginTx()

    try{
    
      val trav : Traverser = first.traverse(Traverser.Order.BREADTH_FIRST,
                                            StopEvaluator.END_OF_GRAPH,
                                            ReturnableEvaluator.ALL,
                                            "isRelatedTo" : RelationshipType,
                                            Direction.BOTH)

      for(node <- trav){
        println(node.getProperty("name"))
      }
      tx.success()
    } finally {
      tx.finish()
      println("finished transaction 2")
    }

    neo.shutdown()

    println("done")
  }
}

Open Source is Software’s Labor Movement

Software developers often have high starting salaries, but wind up not getting paid as much as managers and others who work in software. Why is this? I’d argue that it’s not because software development is any less essential or important to the software business. Rather, the reason is that software developers that work for a proprietary company give away the rights to their work to their employer.

No matter how much the proprietary developer is getting paid, they gradually experience employer lock-in. Why is this? They are the natural experts of the software that they write. This expertise is valuable. So, developers get more and more valuable to a company the longer they work for it.

Employers give developers raises, but not all is well. If the propriety developer leaves their job, they will generally not be able to find as high-paying a job elsewhere, because they will not have the same relevant expertise. That means that there is no incentive on their current employer to pay them for all they contribute. They only need to pay them enough to keep them from quitting.

That means that if you have been working for a long time for a proprietary company, you probably aren’t getting paid enough or getting the benefits you deserve.

Now consider open source developers. By now it is clear that there is a large market for open source companies and freelance consulting. If you are a developer, you should get into that market.

If you are an open source developer, then you get to have the same access to the fruits of your labor as your employer. You are not alienated from your labor, in Marx’s sense of the word. Your employer can never take away the public record of your contributions to an open source project or your standing in the community. And, importantly, the skills you have learned from your hard work are transferable to any other job using the software you have developed.

The result is that even more so than other software developers, open source developers can have their pick of jobs. That means employers have to compete for them even more.

While many companies will shower their developers with perks to keep them on board, savvy open source developers will demand time to do core community work that provides them the intangible social relationships and technical skills that make them more valuable workers–for anyone using their software. And they will demand that new code be vetted through core community processes whenever possible, as opposed to being one off applications.

Proprietary software companies should fear these trends. More and more developers are going to choose open source jobs, so proprietary companies are going to have to pay more for worse programming talent. (This is one of many reasons why proprietary software is doomed.)

Open source companies, on the other hand, should embrace this trend and do their best to satisfy their developer’s demands. Ultimately, these company’s value will be in the superiority of their technology and their developers. The developers want their software to be excellent, because it is the software and their involvement in it, not the company employing them, that guarantees them employment. In the open source economy, this loyalty is not misplaced. Rather, employing companies need to align themselves with it. In so doing, they will achieve technological superiority and community fame. They may err from this path at their peril.

Oh but of course: comparison of Google and Open Source innovation models

What is below is mere speculation. I don’t know anything about Google.

I guess this just is an admission of naivete, but one of the most surprising things about Google+ to me right now is that it’s been a surprise. Wave was a huge technical investment that flopped. Buzz happened and was meek. For some reason, they became old news. But Google is super organized on the inside, and can actually afford to annoy people mildly or lose cool points for the sake of conducting an experiment. Does Google+ use Wave technology? Well who knows because it’s locked down behind the NDA. Was it inspired by Buzz usage and feedback? Is it powered by Buzz data? It’s safe to assume so, right?

Part of Google’s power is that it isn’t just a startup operating in the market. It is a whole bunch of startups operating in a coordinated collective. A lot of its advantage is in its efficiency as an incubator, which means its ability to recycle old technology choices, talent, expertise. It’s other advantage is to have a business model that’s going to scale up with the amount of data collected, which means it can focus on user experience).

The only comparable thing I know of in terms of efficient recycling of technology and content is the “open source world.” People build technology for their application or to scratch their itch, and it sticks around for reuse. if a particular funder drops a project that has reached half of its potential, the same team can find a new funder to pick it up and take it the rest of the way. And hackers who learn something on one project are under no legal restrictions to take that human capital with them on to the next.

But there isn’t a unified business model for open source development. Its still struggling for funding, fighting the inherent free rider problems and figuring out how to get funders who are willing to take risk.

Interview on National Broadband Map architecture

Interview with Juan Marín Otero from Computech on development of the National Broadband Map, released by the National Telecommunications and Information Administration and the Federal Communications Commission.

The selection of open source software has been critical in my opinion. The entire project has been constructed with open source tools, from the operating system to the last JavaScript library used in the presentation layer. The main reasons are the flexibility and easy development that these solutions give us. When we evaluated the requirements of this project, some open source tools began with a clear advantage (functionality and easy to deploy) over most of the commercial tools.

Developing on open source has given us the flexibility needed in a project like this without having to worry about whether we had enough licenses per computing “core” or not, and choosing the right components for each requirement, rather than having to adopt monolithic suites with great functionality but poor performance and little chance of adaptation. One of the advantages that may seem trivial but in this work environment is very important is that I have the whole project on my laptop, and I can make changes from a Starbucks, if necessary (and I use this example because it happened to me once on the project).

Job opening at OpenGeo for GeoNode

It looks like OpenGeo is going to be getting a lot of business in the coming year for GeoNode–so much that we can’t handle it with the size of our current team.

So we’re hiring a new Software Engineer.

What we’re looking for is somebody who has both the engineering chops to handle our complex stack (which includes Django, GeoServer, GeoNetwork, and GeoExt) and the inclination and aptitude to provide on-site consulting, training, and support to clients. Because of our partnership with the World Bank, many of these deployments will be around the world. So this job will likely offer the opportunity to go interesting places and meet interesting people, on top of coding for a very well poised open source, open data project.

OpenGeo, I have to say, is a kickass place to work. I’ve been working for it and our parent organization OpenPlans for almost three years now, and I am challenged and inspired daily by the intelligence of the people around me and the dynamism of the organization. It’s also committed to openness with a sincerity that is extraordinarily compelling to the right kind of person.

This blog post, for example, divulges crucial information about our business strategy in an unprofessional medium. “What is going on?” you might ask. “Why is he doing this? Is it a corporate ploy?” No. Our business strategy isn’t a secret. On the contrary, we want as many people to know about it as possible so that more people will get involved in our open source projects and communities. And though our work and life itself demands humility, on rare occasions (just as when we are trying to attract applications from Software Engineers) we get to brag about how great we are.

Something else I like about us? We are scrappy. OpenPlans has subsisted primarily on donations from Mark Gorton for most of its existence. We are in a turbulent time, because the LimeWire lawsuits mean that we can no longer depend on Mark. But seriously in the nick of time, we warming up our business development machines.

The leads are coming in, and we are growing (did I mention that we are hiring?). It feels like the helicopter is lifting off just as the ground beneath us is melting into hot lava. And yet, we aren’t afraid. Yesterday I saw Vanessa Hamer give a budget report for OpenPlans. The gist of it was: even assuming that lots of things go wrong with our funding pipeline, we are well prepared to thrive in the future. I have never had more confidence in this organization than I do now.

This is a great thing. It is great because OpenPlans and OpenGeo are working on great things. OpenGeo in particular is in the early stages of transforming the geospatial software industry from one dominated by proprietary desktop software to free web software. It is providing tools that will help institutions transform their closed and cobwebby data silos to nodes in a vibrant, open geospatial web. For great justice.

“Are you for real?” Yes, I’m for real. OpenGeo is that good. I know we’re that good because we have attracted successful geospatial software industry entrepreneurs (1) (2) (3) as our upper level management in the past year. These folks aren’t just all geniuses. They are geniuses with domain expertise who are passionately dedicated to transforming an industry with an impact on millions of people.

We also have some great connections going on. I believe I’ve mentioned that we’ve been working with the World Bank’s CAPRA initiative on this GeoNode project (the one we are hiring for). The people behind CAPRA are rockstars within a superorganization with a mission to save Central America from natural disasters.

Oh, and we’re going to be working with the Global Earthquake Model soon. Those guys are a crack team of brilliant scientists with a mandate to unite all earthquake modeling under one banner using open source software. If it works, it will have an impact on everyone who is in an area that suffers earthquake risk, because earthquake modeling is used by everyone from humanitarian NGO’s who do disaster relief to insurance companies who wind up taking on a lot of the corporate/economic risk of natural disasters. It’s a mandate that would be ludicrously ambitious if it weren’t happening right now with web technology that has just recently been perfected and available.

It is so fucking cool I can hardly contain myself. I have to suppress these thoughts normally so I don’t get so distracted by how cool it is that I don’t get any work done. It is so cool that it’s hard to talk about publicly because it all just seems so unbelievable.

That is what we are hiring for right now.

Economics of Open Source

Hat tip to Paul Ramsey for the link to this blog post by Stephen O’Grady, “The Economics of Open Source: Why the Billion Dollar Barrier is Irrelevant.”

O’Grady argues that despite protests from those who haven’t seen the billion-dollar-value FOSS company of their dreams, open source business is doing great.

The question, remember, isn’t whether businesses and developers are consuming and producing open source software. They are, in droves. Nor are there questions as to whether or not the software can be sold successfully on a commercial basis: it is, every day. The only remaining questions really regarding the economics of open source are whether they can duplicate the margins of their proprietary predecessors, and frankly I think most customers hope they don’t.

This brings up an understated point about the economics of open source: that the proprietary software production model is a monopolistic model and hence bad for technology consumers. The high margins of proprietary software companies are due in part to monopolistic rents. The non-competitiveness of the proprietary software market leads to bloated, inefficiently created, and poorly supported software.

Another, related economic issue is the challenge of open source business, which O’Grady sums up like so:

Part of the challenge for open source software vendors, of course, is the fundamental difference between open source software and proprietary alternatives, not to mention other tangible goods: the primary asset to be sold is (generally) freely available.

I think this is a poor characterization of the problem. The problem is not that the asset to be sold is freely available. That would assume that software is the asset to be sold. But if an asset is “anything … capable of being owned or controlled to produce value and that is held to have positive economic value,” then free software can’t be the asset.

So what has to be primary asset for open source software vendors? The time of software experts who can do development or support.

At this point the analysis gets confused, because there are two issues at stake. Would-be open source billionaire entrepreneurs become disappointed that consulting and support around open source doesn’t scale as well as their proprietary software forebears. But this reaction is accentuated by another, independent problem: the free-rider problem around developer and support services.

Open source software vendors are in the business of shedding off public goods in the form of (improvements to) freely available software. This is why make sense for an open source vendor like OpenGeo to consider itself a social enterprise: it “does good” merely by operating.

But it also means that these services are going to be under-valued in the market because it is so difficult to capture the consumer demand for a software improvement as revenue.

How much does each user of a free software project value this new feature? Ok, sum that. That’s how much the open source vendor should be able to raise, in principle, for developing that feature–if there is only one potential supplier.

Proprietary software companies are able to capture this demand for the software improvements through the mechanism of selling the software itself. Free software developers won’t be able to raise as much–because they compete with each other as suppliers (which is good for society!)–but there is still a much bigger market there than is currently tapped into.

The kind of advance that will fuel open source business moving forward is mechanisms that allow for the capturing of this latent consumer demand.

The most literal case of this “crowdsourced microfunding,” a model that is greeted with mockery whenever I talk about it to people with industry experience, but which has recently had a preliminary success story: Diaspora’s skyrocketing funding via Kickstarter. Kickstarter, as opposed to other collaborative funding sites that have come and gone like Fundable.org, looks like it has some additional incentive structures built in that eliminate some of the Nash equilibria in the collaborative funding game in which not enough actors participate. (New York Times coverage and mass resentment towards Big Brother don’t hurt either.)

But there are other models for solving this problem as well. For GeoNode, the World Bank’s CAPRA initiative is seeking out partners to build a global community of funders to be in partnership with the developer community.

If this model works and is replicable, the potential impact on the software business world could be immense. Strategic cooperation between major funders would allow them to efficiently channel funding towards development while regulating against free-riders among themselves. The result would be a highly efficient market for software development–more efficient than either the proprietary software market or the free-rider-ridden free software market.

These speculations are consistent with O’Grady’s analysis: none of this leads to a billion-dollar corporation. But it does lead to a thriving economy of smaller scale consultancies, valued fairly according to their expertise, generating a torrent of free software available for all. Isn’t that the best outcome of all?

The Global South does IT better

A few weeks ago I visited the offices of Peru’s Comité Coordinador de la Infraestructura de Datos Espaciales del Perú, or IDEP, who are responsible for building that nation’s spatial data infrastructure system.

They have built a very impressive system with comparatively few resources using a largely open source stack of software–MapServer, MapBender, Mapfish, GeoNetwork, Joomla–and are actively looking for ways to innovate further.

In a meeting there, Max Taico, from the National Office of Electronic Government, explained why they had turned to open source software. It wasn’t just the fact that it was free–ESRI gives them free licenses of ArcGIS Server.

Open source software works for them because their government procurement practices are slow and hard to work with. But with free software (‘software libre’, as they call it), they are able to just install things on a server and get it to work. Indeed, while we were there they logged us onto the server and invited us to look around at the system and install new software if we thought it would be helpful.

Compared to the heavy bureaucracies we are used to working with, it would be an understatement to call this “refreshing.” Governments (including international governments) based in the U.S. maintain strict control over their software inventory and often stipulate what software is or is not allowed on their computers.

This is a crippling policy in a world full of great free software. It’s appalling to guess how much time (and, hence, money) is wasted by, say, the World Bank’s commitment to using outdated browser and office software.

Meanwhile, in Lima, a project with almost no permanent staff working on it was able to develop a system that is truly cutting edge. Their government IT culture there works contiguously with the global hacker culture, which is interested in getting things done with as few obstacles as possible.

An inspiring thought is that because this way of doing things is so much more effective, the Global North is learning that it should change its ways. In her keynote address to this week’s Understanding Risk conference, the World Bank’s CIO Shelley Leibowitz announced to an applauding audience that they were going to drop their mandated use of the universally loathed Lotus Notes.

Times are changing. It’s nice to know that part of that change is a long-due change in leadership.

Data Uploading on GeoNode

We’ve made great strides in GeoNode development in the past few weeks, due in part to the addition of Luke Tucker, on loan from Civic Works, to the team.

The main improvements lately have been ones slated for the 0.1.2 milestone, which focuses on basic data management through the web application interface.

The most important improvement is the data upload functionality. We provide a web interface for SHP and TIFF upload and then push the file to GeoServer using its REST configuration API and gsconfig.py. The layer is then exposed via OCG services like any normal GeoServer layer, and so is available for use by OpenLayers and GeoExt applications, for example.

Hat tip to GeoCommons for inspiring the UI for uploading the multiple files needed for Shapefile data.

You can get a taste of some of the other improvements by checking out the Data and Map information pages. In GeoNode, both data sets and maps (compilations of data, with a layer order and styles specified) are first-class content. We want to let users transition seamlessly between working with data and working with maps in order to smooth out the web GIS experience.

The “information pages” (I think we are looking for a better name for these, internally) serve as dashboards to a particular data set or map, and expose the relationships internal to the content. By viewing which maps a data set is a part of, the user can find out about who trusts the data, who finds it authoritative. This is just the first of many ways we plan to let social use of data provide relevant metadata.

What’s this GeoNode thing?

I’m going to start blogging here about the GeoNode.

If there is anything I’ve learned at The Open Planning Project and OpenGeo, it’s that it is critical for the success of open source projects that they be “truly open”–not just with an open source license “slapped on,” but with a transparency built into its development and management process.

So I am chagrined that the GeoNode, the project I’m currently most involved with, has been developed in relative obscurity. At OpenGeo, we’ve been treating it like an internal project, and though its task tracker is open, documentation is sparse.

The reason for this is simple: we have been too busy building it to properly groom it. But hopefully that will change soon.

So what is the GeoNode? Here’s the backstory, as articulated by Eddie Pickle:

The GeoNode project is an open source initiative backed by the World Bank, OpenGeo, and others with pilot programs in El Salvador and Guatemala. It aims to give a modern spin to the idea of the spatial data infrastructure with an eye to the use and reuse of data, encouraging collaboration, and promoting open data exchange as a default. Initially, the development of the GeoNode has focused on serving and visualizing the data created by the Central America Probabilistic Risk Assessment (CAPRA) to assess and mitigate the risk due to adverse Natural Disasters.

At its core, the GeoNode is based on open source components GeoServer, GeoNetwork, Django, and GeoExt that provide a platform for sophisticated web browser spatial visualization and analysis. Atop this stack, the project has built a map composer and viewer, tools for analysis, and reporting tools. The GeoNode also supports facilities for styling data as well as collaborative features like ratings, comments, and tagging for data, maps, and styles. The GeoNode is built on four key tenets: Collaboration, Distribution, Cartography and Data collection.

This is the GeoNode vision–a vision which is current under (rapid!) development. But at the time of this writing, the use of the present tense in the paragraphs above is sometimes ambitious. You can check out the demo that we update weekly from GeoNode trunk here.

In its current state, it requires some explanation. What you’re seeing is a Django site that is configured to work against a GeoServer instance that contains the spatial data hosted by the GeoNode. The map on the front page is a GeoExt map embedded as a widget. This Map-As-Embeddable-Widget is a feature of the GeoNode’s Map Composer, which is closely related to GeoExplorer, OpenGeo’s main GeoExt-based web GIS client.

The site’s theming is, as requested by our partner and client, made to look like the World Bank’s CAPRA website. Rolando Penate has been working ideas for generic GeoNode themes that we hope to implement soon. His design wireframes show off some of the potential of these themes as well as the features that we are scheduled to implement in the coming year: user registration, commenting and rating, a groups.

The purpose of all these “Web 2.0” features (cringe!) is to provide a layer of social metadata over the geospatial data and ISO metadata that we will be supporting with an integrated GeoNetwork instance. If the internet has proved anything, it’s that the best catalogs are those created dynamically from real usage. So our plan is to integrate cutting-edge web GIS tools with the social features of the site so that as users participate in the GeoNode environment, we collect data about their activity and use it to refine search results. We joke about whether the GeoNode aims to be the MediaWiki/Twitter of maps, or the Facebook/YouTube of maps, or the Yelp/Wordpress of maps…but really these features are all so commonplace on the web now that it’s not a matter of imitating the market leaders. It’s a matter of taking solid modern principles of data management and bringing them intelligently to the Spatial Data Infrastructure domain.

We are several months away from these features, however. For now we are focusing on building a solid backbone of tools that will make the GeoNode attractive to users.

If you’ve been following GeoExt development lately, you may have noticed Andreas Hocevar’s contributions of JavaScript client code that works against the Mapfish Printing module. That’s all been core development work funded by our GeoNode partnership.

Another neat budding project worth noting is gsconfig.py, developed primarily by David Winslow. It’s a Python library for interacting with GeoServer’s catalog via its REST API. We are going to be using it to provide through-the-web data upload and metadata management through a Django-powered user interface.

These two open source contributions highlight what’s for me one of the coolest parts of our GeoNode development: that we are dedicated to bringing as much as possible of the work for the GeoNode back into widely recognized open source geospatial projects or splitting parts out into general purpose libraries. This is possible because of the NGO and government community partnership around the GeoNode that is being spearheaded by Stu Gill, of World Bank CAPRA. Stu is a visionary who sees very clearly the decisive role openness will play in the future of SDI, GIS, etc. And he understands that openness will play an important role all the way down to the open source core, and has been reaching out to like minds.

OpenGeo’s GeoNode team has been hard at work on the prototype GeoNode software and the open source software improvements that it depends on, but in the coming months we hope to shift our priorities more towards documentation and community. We’ll be announcing those advances through GeoNode.org, the official site of the GeoNode project. But we’ve unintentionally left the lid on this for too long. In the short term, I’ll be blogging informally about GeoNode progress week-by-week here, because it’s high time we got the GeoNode into the open.

U.S. Federal Government Embraces Open Source

The U.S. government is definitively over open source software fear.

First, though it’s no real surprise, they are switching whitehouse.gov to use Drupal. You probably heard about it on Boing Boing already. It’s not that big a surprise, because everyone knows Obama loves everything open, and Drupal is the content management system of choice for big organizations with big websites and little in-house expertise.

But more surprising, and more significant, is the Department of Defense’s memo clarifying its relationship to open source. Its preamble opens:

“To effectively achieve its mission, the Department of Defense must develop and update its software-based capabilities faster than ever, to anticipate new threats and respond to continuously changing requirements. The use of Open Source Software (OSS) can provide advantages in this regard. This memorandum provides clarifying guidance on the use of OSS and supersedes the previous DoD CIO memorandum dated May 28, 2003.”

So it’s now acknowledged that it is sometimes in the interest of national security to use open source, for precisely the reasons the open source movement has been talking about for so long: more eyeballs with fewer bugs, no vendor lock-in, and adaptability by your own coders to scratch your own itches, lower costs, etc.

Chalk it up to open source having “won the argument.”