Blog Post

Simple Graph Queries

,

In this post we saw how to create some graph tables with data. In this I will explore simple queries off of this data and how they compare with their relational counterparts.

The main goal behind a graph design is to help you answer queries – so what are the questions you’d ask of a movie database, if you had one? Mine would typically be like below.

1 Who are the actors in this movie?

2 Who is this movie directed by?

3 Who is the most prolific actor, according this dataset?

4 How many actors are also directors?

..and so on.

Lets answer these one by one, and see how they compare relationally. So if i were to answer the first question the typical relational way – my query would be as below:

SELECT c.actor_name from movies a, moviesactor b, actor c
WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID
AND a.Movie_Title = 'Jurassic Park'

This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.

SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a

where MATCH(m-(a)->p) AND m.movietitle = ‘Jurassic Park’

This is a very simple two table join – if we were to do the same with newly created graph tables – the query would look like below.

SELECT p.personname FROM dbo.personnode p, movienode m,moviesactorlink a
where MATCH(m-(a)->p) AND m.movietitle = 'Jurassic Park'

The queries for other questions, with their relational counterparts, are as below.

--Most prolific actor
SELECT TOP 10 c.actor_name,COUNT(1) AS moviesactedin from movies a, moviesactor b, actor c
WHERE a.MovieId = b.movieid AND b.actorid = c.ActorID GROUP BY c.actor_name ORDER BY moviesactedin desc
SELECT TOP 10 p.personname,count(1) AS moviesactedin FROM dbo.personnode p, movienode m,moviesactorlink a
where MATCH(m-(a)->p) GROUP BY p.personname ORDER BY moviesactedin desc
--2 Actors who are directors
SELECT c.actor_name,a.Movie_Title from movies a 
INNER JOIN moviesactor b
ON a.MovieId = b.movieid
INNER JOIN actor c 
ON b.actorid = c.ActorID
INNER JOIN MoviesDirector d
ON a.MovieId = d.movieid 
INNER JOIN director e ON
d.directorid = e.directorid
AND e.director_name = c.actor_name
SELECT p1.personname, m.movietitle FROM personnode p1, movienode m, moviesactorlink a,moviesdirectorlink d
WHERE MATCH(m-(d)->p1 AND m-(a)->p1)

The advantages are

1 Fewer number of tables

2 Easy to write as opposed to a lot of joins.

The node table usually has a seek operator on it, but edge tables are scanned since it is not possible (currently) to create an index on edge id. I will explore the most useful part of this feature – shortest path, in the next post. Thanks for reading!

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating