The purpose of this post is to give a common footing for those reading to understand what I mean when I talk about a "graph". The field of graph theory is a very deep and well-explored one. The real trick here is trying to give an introduction to graphs and graph theory without getting lost in too much detail. I will endeavour to keep this post to the utter, simplified basics of graphs.
(My apologies to anyone who already knows all about graph theory; this is meant more for those who might not have been exposed to graphs before, as well as to establish some common terms and nomenclature I'll be using in this blog. If some of the definitions seem too simplistic, please bear in mind I'm trying to just get the basics out there so anyone can understand the concepts.)
(And really, Wikipedia is a great resource for those wanting a slightly more in-depth and well-structured discussion on graphs and graph theory.)
We also think of graphs like the following (which is one of my personal favourites from our good friend Jay-Z):
Alas, these are not the graphs I'm referring to in this blog. The kind I'm talking about is a little bit different; though, at its core, perhaps we can take a graph to mean (at a very oversimplified level), "a graphical representation of data".
The kind of graph I'm referring to in this blog has to do with the representation of objects and their interconnections. Because I'm terrible at drawing (and lazy), I'll show a simple graph that illustrates what I'm saying and then try to describe what's going on.
This, my graph-fiends (yes, I just made that horrible pun up), is one example of a graph. What does this graph consist of? Here's what we have:
- 6 vertices or nodes (the circles), and,
- 7 edges or relationships (the lines between the circles).
At a first glance, that's all there is to it. In mathematics, we commonly use the terms "vertices" and "edges" (the vertices being the circles/points, the edges being the lines between); however, it's likely a bit more intuitive to most to use the terms "nodes" and "relationships" in place of "vertices" and "edges", respectively. I'll try to stick to using "nodes" and "relationships", though I may occasionally use the other terms interchangeably.
Is a graph still a graph if there are no edges involved? Absolutely. This is one example of a "null graph". (For those interested, at the other end of the spectrum, a "complete graph" is one whose vertices are all connected/adjacent to each other, for a total of n(n-1)/2 edges, where n is the number of vertices.)
Edges/relationships can be "undirected" (like the example above), or "directed" (like the example below). Directed edges have a source and a destination (this is useful for showing relationships between two objects). Undirected edges can also be used to show a bi-directional relationship (i.e. one that exists in both directions); for example, Tom likes Dana, and Dana likes Tom.
Still with me?
So, we know that a graph consists of a series of vertices or nodes that can be connected via edges or relationships. These edges can be undirected or directed.
At its core, that is what a graph is.
By now, if you haven't already, you're seeing how this might be useful to represent highly connected data. Social networks are a great example (think of how you could represent your "friendships" on Facebook or MySpace with a graph; this is actually called a "Likes" graph). Try to think of other information you could represent in graph fashion.
What about a family tree? For sure. As another matter of fact, a tree is actually a type of graph that is directed and acyclical (meaning that there are no loops, i.e. if you start following the directions on a graph, you travel each node exactly once).
Weight is another concept we come across in graph theory and graph databases. Think of weight as being how important or strong a relationship is between two nodes. For example, a higher weight can imply a stronger relationship between two nodes representing, say, two friends.
This is an important concept as we can use it to determine a shortest path or least costly path between two nodes. (See an explanation of paths further below.)
One slightly more complicated idea is that of degree, which represents the number of edges attached to a node. So, a node with a high degree will be one that has a lot of relationships attached to it. This can be very useful for determining just how popular, for example, something or someone is. (Cue jokes for this blog.)
One last concept to get out there before wrapping this article up is the concept of a walk or traversal. A traversal is taken to be the resultant path found by starting at a node X and following to some other node on the graph Y and the edges and nodes visited during the trip from X to Y. This concept becomes important as we start to look in to graph-based databases as traversals are part of what we can use to extract useful information from a graph (e.g. "Who are all of my cousins?") and to even predict and recommend a product for someone based on their browsing history.
Ok, I think that's enough math-like stuff for one post. The definitions given in this post don't even scratch the surface of what's out there in the world of graph theory. While I'm no serious graph theorist academically, I'm sure there are some standard texts out there. That said, as mentioned earlier, Wikipedia is really a great resource for exploring the world of graph theory a bit more.
I'll introduce other graph-related concepts as they become needed (e.g. searching or path-finding); however, if there are requests for discussing specific graphing concepts, I'd be happy to address them (provided I can speak intelligently to them, of course).
Now go weird your friends out with jokes that their family tree isn't really a tree as it has loops!
No comments:
Post a Comment