Wednesday, 15 February 2012

Neo4j: First Blood

That's one serious-sounding title.

This post is about my first foray into the graph database world.  I chose as my first victim/offering Neo4j.  I'll likely end up writing more than a few articles about this particular database, but, I thought I'd start with the basics, including the following:
  • Download
  • Installation
  • Configuration
  • Poking around
(And yes, "poking around" is a sanctioned technical term.)

So, a bit about Neo4j to start!
  • It's been around since 2007.
  • NUMBER ONE SELLING POINT FOR ME: It's ACID compliant!  Not too many NoSQL engines that I've seen (yet) are, though for various reasons that are well outside the scope of this post.
  • As you may have guessed, it's primarily meant for integration with Java.
  • It can also be integrated with Spring (a big plus, if you ask me) via Spring Data (POJO development FTW!).
  • Its API is REST-based, and so can be utilized by just about any platform (though you'll likely have to write your own wrapper, unless you can find one out there in the open source world).
  • There are some ready-made wrappers for some platforms available, such as Python and Ruby.
  • It's available for Windows, MacOS and Linux-based OSes.
  • It's available in both 32-bit and 64-bit for Windows and Linux.
  • It's available in 3 versions, including the Community version, the Advanced version, and the Enterprise version.  As you'd expect, the Community version is open source available under GPL (the other versions are covered under AGPL).
  • It comes ready-to-run with a version of the web/app server Jetty.
  • It comes with a built-in web admin console (hence the need for Jetty).
  • Following in the NoSQL tradition, it scales very well for Big Data.\
  • Its name lends itself well to any number of The Matrix jokes.
I strongly suggest going to their website (www.neo4j.org) to do a little research of your own.

Download

Given that I'm just looking to get my feet wet with Neo4j, I downloaded the Neo4j v1.6 Community Edition 64-bit Linux package from my ready-made VM (coincidentally named Morpheus) running CentOS (sorry Windows users).  Read: I can't be bothered downloading the source and compiling it.  Note that Java 1.6+ is required; a complete set of requirements can be found here.

The archive is only about 37MB (give or take) and so completed relatively quickly over my bonded DSL connection.

Installation

After un-tarring the package and moving it into an appropriate directory (I'm a sucker for /etc) and starting the Neo4j server from the command line via bin/neo4j start (don't worry; there's a README.txt in the root of the installation directory that has all the quickstart instructions in it), I was ready to rock!

(I should note that I did get a couple warnings, shown below, but it hasn't seemed to have affected anything just yet, likely given how small my current graph is.

WARNING: Detected a limit of 1024 for maximum open files, while a minimum value of 40000 is recommended.
WARNING: Problems with the operation of the server may occur. Please refer to the Neo4j manual regarding lifting this limitation.
)

Or so I thought.

Configuration

If there's one bone of contention I have with Neo4j, it's that finding the appropriate (and up-to-date) documentation for the config files takes a bit of digging (it's not impossible by any stretch, though).

As I quickly found out, trying to access the web admin console that comes with Neo4j (very handy, I must say) outside of localhost is a non-starter out of the box.

Did I pack it in for the day and go back to flipping through Steam for cheap games?  No!  I did some "research".

Here's the solution: In order to get Neo4j's web admin to work from somewhere outside of localhost, change the neo4j-server.properties in the install directory's /conf directory (go figure).

Commented out towards the top of the file is the property org.neo4j.server.webserver.adddress".  Uncomment it and change it to the IP you want to bind the server to (the property does note that there are security concerns to consider, so, you may want to consult the Neo4j documentation before doing this).  

You can also change other settings in this file, e.g. getting it to work over HTTPS, changing the default ports for each, etc.

(Note: The web admin defaults to running over HTTP on port 7474 and over HTTPS over 7473.)

So, after making the change to the appropriate IP and restarting the Neo4j server, I tried pointing my browser back at the Neo4j web admin.

Success!


Poking Around

Without going in to too much detail (I'll likely do that in subsequent posts), the Neo4j web admin has 5 distinct sections to help manage your installation:
  1. Dashboard
  2. Data browser
  3. Console
  4. Server info
  5. Index manager
Each one is fairly self explanatory.

The dashboard provides at-a-glance information about your server over a specified time line, such as the total number of nodes, properties, relationships and relationship types.

The data browser allows you to perform basic CRUD operations via a GUI.  You can also perform look-ups (consult the Help icon immediately to the right of the search button for more details on exactly what you can search for).  In other words, you can create a graph right then and there.

You can also flip the view to a graphical representation of the current graph and manipulate it (via click-and-drag) directly.  This is perhaps the coolest part of the web admin console (hey, we all like cool features!).


That's some serious badassery right there.

Next we have the console.  This is a great way to get familiar with the languages used to query Neo4j, including HTTP (i.e. accessing the REST calls), Gremlin (a Groovy-based querying language becoming common across multiple graph databases; it seems to be mainly for those coming from a math/graph background), and Cypher (Neo4j's own querying language; it seems to be mainly for those coming more from an SQL background).  

A quick note: At the time of this writing, Cypher only allows for read-only queries, whereas Gremlin allows for both reading and writing.

Next up, server info.  This is just a way to view (read: read-only) the server's configuration information.  No biggie.

Finally, we have the index manager.  Now, this is something I'm sure I'll be getting into a lot more as time goes on.  It's worth noting that Neo4j is built using the Lucene project for indexing, so this is very promising (especially for those familiar with Lucene and/or Solr). This makes a great deal of sense given the concept of properties for each node (full text search is going to be very important).

Regardless, you can create and manage indices for both nodes and relationships here.

So there we have it: My first venture into graph databases.  I'll admit I picked Neo4j first based on my initial research into graph databases.  It does seem to be the most popular graph database at the moment, so I look forward to seeing what it can do.

In subsequent posts, I'll be monkeying around with the querying languages, creating and modifying graphs, messing around with indices, and all kinds of other good stuff.

I hope some of you out there found this somewhat useful/informative/cool.  Well, I know I found it cool, but then I always was kind of odd...

Until next time, when we'll go Graph to the Future!  (Sorry, couldn't go an entire post without making at least one movie-based pun.)

No comments:

Post a Comment