In this post we saw how to create some graph tables with data. In this I will explore simple queries off of this data and how they compare with their relational counterparts.
The main goal behind a graph design is to help you answer queries – so what are the questions you’d ask of a movie database, if you had one? Mine would typically be like below.
1 Who are the actors in this movie?
2 Who is this movie directed by?
3 Who is the most prolific actor, according this dataset?
4 How many actors are also directors?
..and so on.
Lets answer these one by one, and see how they compare relationally. So if i were to answer the first question the typical relational way – my query would be as below:
SELECT c.actor_name from movies a, moviesactor b, actor c WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID AND a.Movie_Title = 'Jurassic Park'
This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.
SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a
where MATCH(m-(a)->p) AND m.movietitle = ‘Jurassic Park’
This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.
SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a where MATCH(m-(a)->p) AND m.movietitle = 'Jurassic Park'
The queries for other questions, with their relational counterparts, are as below.
--Most prolific actor SELECT TOP 10 c.actor_name,COUNT(1) AS moviesactedin from movies a, moviesactor b, actor c WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID GROUP BY c.actor_name ORDER BY moviesactedin desc
SELECT TOP 10 p.personname,count(1) AS moviesactedin FROM dbo.personnode p, movienode m,moviesactorlink a where MATCH(m-(a)->p) GROUP BY p.personname ORDER BY moviesactedin desc
--2 Actors who are directors SELECT c.actor_name,a.Movie_Title from movies a INNER JOIN moviesactor b ON a.MovieId = b.movieid INNER JOIN actor c ON b.actorid = c.ActorID INNER JOIN MoviesDirector d ON a.MovieId = d.movieid INNER JOIN director e ON d.directorid = e.directorid AND e.director_name = c.actor_name
SELECT p1.personname, m.movietitle FROM personnode p1, movienode m, moviesactorlink a,moviesdirectorlink d WHERE MATCH(m-(d)->p1 AND m-(a)->p1)
The advantages are
1 Fewer number of tables
2 Easy to write as opposed to a lot of joins.
The node table usually has a seek operator on it, but edge tables are scanned since it is not possible (currently) to create an index on edge id. I will explore the most useful part of this feature – shortest path, in the next post. Thanks for reading!