Simple Graph Queries

In this post we saw how to create some graph tables with data. In this I will explore simple queries off of this data and how they compare with their relational counterparts.

The main goal behind a graph design is to help you answer queries – so what are the questions you’d ask of a movie database, if you had one? Mine would typically be like below.

1 Who are the actors in this movie?

2 Who is this movie directed by?

3 Who is the most prolific actor, according this dataset?

4 How many actors are also directors?

..and so on.

Lets answer these one by one, and see how they compare relationally. So if i were to answer the first question the typical relational way – my query would be as below:

SELECT c.actor_name from movies a, moviesactor b, actor c
WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID
AND a.Movie_Title = 'Jurassic Park'

This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.

SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a

where MATCH(m-(a)->p) AND m.movietitle = ‘Jurassic Park’

This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.

SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a
where MATCH(m-(a)->p) AND m.movietitle = 'Jurassic Park'

The queries for other questions, with their relational counterparts, are as below.

--Most prolific actor
SELECT TOP 10 c.actor_name,COUNT(1) AS moviesactedin from movies a, moviesactor b, actor c
WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID GROUP BY c.actor_name ORDER BY moviesactedin desc

SELECT TOP 10 p.personname,count(1) AS moviesactedin FROM dbo.personnode p, movienode m,moviesactorlink a
where MATCH(m-(a)->p) GROUP BY p.personname ORDER BY moviesactedin desc

--2 Actors who are directors
SELECT c.actor_name,a.Movie_Title from movies a 
INNER JOIN moviesactor b
ON a.MovieId = b.movieid
INNER JOIN actor c 
ON b.actorid = c.ActorID
INNER JOIN MoviesDirector d
ON a.MovieId = d.movieid 
INNER JOIN director e ON
d.directorid = e.directorid
AND e.director_name = c.actor_name

SELECT p1.personname, m.movietitle FROM personnode p1, movienode m, moviesactorlink a,moviesdirectorlink d
WHERE MATCH(m-(d)->p1 AND m-(a)->p1)

The advantages are

1 Fewer number of tables

2 Easy to write as opposed to a lot of joins.

The node table usually has a seek operator on it, but edge tables are scanned since it is not possible (currently) to create an index on edge id. I will explore the most useful part of this feature – shortest path, in the next post. Thanks for reading!

Rate

Share

Share

Rate