Monday, 11 November 2013

Paths, Entities, and Types in Spring Data for Neo4j (also, I'm back)

I know it's been awhile since my last post, but, I assure you--one and all--that I have not left you kind folks for good.  I can also assure you that the cause of my literary absence has everything to do with being loaded down with work and is not a result of my gallivanting around the world whilst trying strange and wonderful new beers.

(Boy, do I ever wish that was the case.)


Strange AND wonderful!
[Courtesy: FOX]
No, I've managed to find some time in which to write a bit about something I've been looking into concerning SDN: Paths with multiple entity types.

So, without further ado, let's begin.


Yes, quite.
[Courtesy: samuelrunge.com and Monty Python)

Disclaimer

While the topics I'm about to go into could very well have solutions to them that I have yet to uncover, I'm presenting my findings and questions in the hopes that not only will someone find the discussion interesting/useful, but, that it might also help lead me to better solutions.

So, please do read on and try to keep the caveat above in mind.

Retrieving Heterogeneous Paths

If you'll recall from several posts ago, I had been attempting to write a web application based around the concept of a simple recommendation engine and a software retailer (a la Babbage's from the last century).  The implementation was being done using a Spring-based Java stack with Neo4j as the data store.

I had gotten to a point where I was able to load data into Neo4j via a method not unlike the one from Cineasts' (which can be found as part of the Spring Data Neo4j Reference).  A quick recap of the domain model follows below.

The domain model in question.

One important task of any recommendation engine is the ability to suggest entities that are relevant to a given starting point via some sequence of relationships.  Implementing such an engine, while quite doable, is something that I will eventually get to.

Another, more simple, task is to be able to see how two entities are related, i.e. "how do I get from point A to point B?".  As an example, one such question might be, "How is game A related to game B, even though the difference in publication dates is large?"  And yes, this type of question is extremely similar to the "Six Degrees of Separation" question.


I bet he's a gamer.
[Couresty: Wikipedia]

Some Basic Code

You might also recall that I had implemented a GameRepository class with the following signature:



 
public interface GameRepository extends GraphRepository<Game>, RelationshipOperationsRepository<Game>

With the above repository extending the GraphRepository and RelationshipOperationsRepository interfaces, we are provided with a host of cool (and handy) methods out of the box (tip: make sure you're comfortable with the "convention over configuration" paradigm, as there is a bit of magic that goes on with those OOTB methods).

As you'd expect, we can also add additional methods to the interface.  One example of a method you might want to add could be a custom Cypher query that returns all Game nodes with a particular property (the implementation of this is outside the scope of this post, but, it's actually pretty simple; if people want to see it, just shoot me a note!).

However, today we're looking to address the "Six Degrees of Separation" question (minus the limit on the degrees of separation), i.e. "how are node A and node B related"?

So let's give this a (very simple) shot:



 
@Query("START n=node(1), x=node(97) MATCH p = shortestPath( n-[*]-x ) RETURN p")
Iterable<EntityPath<Game, Game> > getPath();

Given that the nodes with IDs 1 and 97 are "Game" nodes, the Cypher query above is essentially determining how the two nodes are related.

(For the sake of this post, I'm ignoring the fact that there could be multiple "shortest paths" between the two nodes as it has little bearing on the goal of this post.)

Quickly going over the return type, SDN allows us to return back an EntityPath given a starting node type and an ending node type which, in this case, is the Game type.  An EntityPath is capable of storing nodes and their relationships as part of a Cypher path.  The Iterable portion of the return type is necessary unless you want to use EndResult instead of Iterable.

We can then access the individual path(s) via the Iterable return type.

(NOTE: There is currently a bug with SDN that throws an exception when calling a single path's .nodes() or .nodeEntities().  This bug has been around since SDN 2.1.RC4.)


Traversing the Returned Path

There's a reason my explanation of the code used stops where it does.  Those of you with a keen eye and/or are familiar with OOP/OOD will identify a potentially big stumbling block: How do you iterate through a path of nodes and/or relationships with potentially wholly disparate, unrelated types?  Given that this is SDN and is geared towards integrating easily with Java and POJOs, the issue becomes apparent.

How do we solve this?

Solution 1

Make sure the nodes you are after all either implement a common interface or are derived from a common class.

Seem too good to be true?  You're right, because it is.  While there may be some scenarios in which this solution might work, it is often the case in a graph database the nodes/relationships are entirely unrelated concepts/types, e.g. Game and Customer, or, Game and Developer.  This separation would imply that there are likely methods and attributes that are specific to a given type that would have no business being in a shared superclass or interface.

Solution 2

Employ some form of reflection.

There it is: The "r" word.  Reflection is generally quite costly and so immediately handicaps its appeal.

A "poor man's" reflection might be to implement a series of "if/else" blocks to check types and perform some appropriate casting.  I think we can see that this could and would become very ugly and difficult to maintain.

What about full-on automated reflection?  Well, we run into a bit of a snag with that, too: In a typical assignment operation, we have a left-hand side (LHS) and a right-hand side (RHS), e.g. TheClass theClassInstance = new TheClass();.  The RHS of the assignment is fairly straight forward with reflection.  Since SDN persists a __type__ attribute/property into a given node/relationship, we can fetch it and use it for casting (since it's typically a fully-qualified type name).  It might look something like this:


// n = Node object in question
String theType = (String)n.getProperty("__type__", "");

// we could then make the RHS of the assignment something like: (LHS) = template.convert(n, class<theType>;


But what about the LHS?  Without an "if/else" block, how can we treat the returned string theType as a first-class citizen that would declare the type for the LHS?  As far as I'm aware, there is no way to do this (of course there might be ways I've just not seen, but, I have a feeling they'd be just as expensive as the rest of the assignment).  Java is a strongly-typed language, and so I'm sure most of us would expect this outcome.

So we see that this "solution" isn't really much of one.

Solution 3

Somehow modify the query to work with a @MapResult-annotated interface to deal with the results (note: At least as of SDN 2.3.1, @MapResult has been deprecated in favour of @QueryResult.  I haven't done too much with @QueryResult so your mileage may vary).  This obviously requires more knowledge ahead of time of the types of paths, nodes and relationships you're planning on returning which may limit the kinds of heterogeneous queries you can execute.

So What Else Can Be Done?

I recently attended GraphConnect 2013 in New York City where I had a chance to meet up with Michael Hunger (a name which should require little or no introduction in the graph database/Spring Data continuum).  We had a great conversation about the very subject of this post.

His overall insights into Spring Data and its purpose, merits and detractors were very helpful, especially from a conceptual standpoint.

The number one point to take away--and perhaps it's quite obvious but it's worth reiterating--is this: Spring Data is not a magic bullet.  Given the differences in concepts here (i.e. a strongly-typed, object-oriented language and a graph-based, schema-free data store), there is bound to be limitations.

Spring Data's strong point is ease of integration.  A typical use case for SDN is likely to be one where relatively few nodes/relationships are needed to be returned.  SDN is not meant to necessarily "explore" graphs.

In order to truly resolve such differences, it would seem to me to make more sense to either layer SDN on top of another layer of abstraction, or even to go direct to the Neo4j API.

Perhaps an even better approach would be to use a Domain-Specific Language (DSL) such as Groovy or JRuby; something that is much more loosely-typed, flexible, and still able to be integrated into the Java stack.

(Shameless plug: Check out Pacer, a powerful, JRuby-based graph traversal engine.)

Summary/Conclusion

In this post, we have seen that exploring subgraphs and paths with SDN is not as straightforward as we'd perhaps like; however, it is clear that SDN was not built to accomplish such features (at least not yet).

Spring Data for Neo4j's strong suit is ease of integration.  As should be evident from this post and others, it is easy and straightforward to get SDN into existing/legacy Java applications, and to quickly stand-up Java-based applications that rely more on "end results" than "exploration" per se.

Ok, folks; as always, I definitely welcome comments,feedback, and questions.  If you can think of a better way to approach this kind of problem space or even if I have something wrong here, please do let me know and I'll be sure to make good use of such feedback.

Thanks for reading, and we'll see you on the next post!