Tell me if I'm crazy - Recursive CTE + View (Materialized, indexed?)

Question

Post reply

Tell me if I'm crazy - Recursive CTE + View (Materialized, indexed?)

jace.bennett

Mr or Mrs. 500

Points: 502
More actions
July 17, 2013 at 9:09 pm

#296490

First off, I'm a .NET developer, and not a SQL wonk. Please be gentle. And thanks for your help.
So I have a lattice (a multiparent hierarchy) of objects representing sets of people sliced along some axis. Currently these sets exist as documents in RavenDB. Example:
Set
{
Id: 'some-guid',
Name: 'RegionName',
Dimension: 'Location',
ImmediateParents: ['containing-set-id', 'containing-set-id'],
ImmediateChildren: ['contained-set-id', 'contained-set-id'],
ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']
}
Person
{
...
ImmediateParents: ['sets', 'i', 'am', 'assigned', 'to', 'directly'],
ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']
}
The bulk of our queries are either direct reads from one of those lists, or "give me all the people contained by this set" query. Obviously our current implementation makes those queries really fast, but keeping all this updated is a delicate dance.
It occurred to me that having a normalized representation in SQL, and then building a view on top of it to denormalize the "contained by" might get me off the dancefloor. And that if I could materialize (usage?) that view and index it, I could get roughly the same performance profile I have now, with the workload heavily weighted toward write time.
Then tonight I learned of Recursive CTEs and the whole thing gelled for me. But I'm betting on some pretty big assumptions about features of Views that I don't understand. Am I sniffing up a good tree? Thanks for any insight.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

jace.bennett Mr or Mrs. 500 Points: 502 More actions · Answer 1

jace.bennett

Mr or Mrs. 500

Points: 502

July 18, 2013 at 12:16 pm

#1633750

bump. Any thoughts at all? Is my question clear?

Sean Lange SSC Guru Points: 286609 More actions · Answer 2

It sounds like you are thinking about having 3 columns that each have a delimited list of IDs or something like that? That is certainly not a normalized version. Or maybe I am misunderstanding what you are trying to do.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

jace.bennett Mr or Mrs. 500 Points: 502 More actions · Answer 3

I'm thinking these tables:

Set(

Id,

Name,

Dimension,

...)

Person (

Id,

Name,

...)

SetRelationships (

ParentSetId,

ChildSetId

)

PersonSets (

PersonId,

SetId

)

And then these tables give me a lot of my queries directly. The ones I want to denormalize would be

PersonContainedBy(

PersonId

SetId

)

SetContainedBy(

SetId,

AncestorId

)

Which would be a flattening of the hierarchy described in the tables.

I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.

I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?

Sean Lange SSC Guru Points: 286609 More actions · Answer 4

jace.bennett (7/18/2013)
I'm thinking these tables:
Set(
Id,
Name,
Dimension,
...)
Person (
Id,
Name,
...)
SetRelationships (
ParentSetId,
ChildSetId
)
PersonSets (
PersonId,
SetId
)
And then these tables give me a lot of my queries directly. The ones I want to denormalize would be
PersonContainedBy(
PersonId
SetId
)
SetContainedBy(
SetId,
AncestorId
)
Which would be a flattening of the hierarchy described in the tables.
I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.
I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?

I guess I am missing why you think you need a recursive cte for this. Your data is denormalized so it would seem to eliminate the need for recursion.

Your definition of the problem is so vague that I am having a hard time envisioning what you are doing here.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 5

Maybe this link will help.

http://www.sqlservercentral.com/articles/Hierarchy/94040/[/url]

jace.bennett Mr or Mrs. 500 Points: 502 More actions · Answer 6

First, thanks for your time. Sorry If I'm communicating poorly.

Imagine the sets describe locations: North America, USA, Georgia, Fulton County, Atlanta.

I might describe this relationship in a relationship table (I'm excluding the actual person and set tables and using names instead of Ids for clarity):

Parent Set | Child Set

============================

North America | USA

USA | Georgia

Georgia | Fulton County

Georgia | Dekalb County

Fulton County | Atlanta

Dekalb County | Decatur

Person | Set

================

Joe | Atlanta

Jane | Decatur

Now, I have an extremely frequent and latency-sensitive query that says "Give me all the people (or sets) under Georgia". My thought is to make this query really fast by projecting it into this shape:

Person | Contained By

==============

Joe | Atlanta

Joe | Fulton County

Joe | Georgia

Joe | USA

Joe | North America

Jane | Decatur

Jane | Dekalb County

Jane | Georgia

Jane | USA

Jane | North America

And then another one for SetsContainedBy.

Sadly, Using actual snowflake schema has two distinct disadvantages in my case, the first is that sets (and dimensions) are really dynamic, and that I can have sets be child to multiple other sets (for example, Atlanta actually lies in two counties, Fulton and Dekalb).

I hope that clarifies my goal, and why the recursion seems like an obvious win to a non-dba. Thanks again.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

Are you looking for something like this?

DECLARE @AdjacencyList TABLE

(

Parent VARCHAR(20)

,Child VARCHAR(20)

);

DECLARE @PersonsLocations TABLE

(

Person VARCHAR(20)

,Location VARCHAR(20)

);

INSERT INTO @AdjacencyList

SELECT 'North America','USA'

UNION ALL SELECT 'USA','Georgia'

UNION ALL SELECT 'Georgia','Fulton County'

UNION ALL SELECT 'Georgia','Dekalb County'

UNION ALL SELECT 'Fulton County','Atlanta'

UNION ALL SELECT 'Dekalb County','Decatur';

INSERT INTO @PersonsLocations

SELECT 'Joe','Atlanta'

UNION ALL SELECT 'Jane','Decatur';

WITH ExpandHierarchy AS (

SELECT n=1, Person, Location, Parent, Child

FROM @PersonsLocations a

INNER JOIN @AdjacencyList b ON b.Child = a.Location

UNION ALL

SELECT n+1, a.Person, a.Location, b.Parent, b.Child

FROM ExpandHierarchy a

INNER JOIN @AdjacencyList b ON a.Parent = b.Child

)

SELECT Person, Location

FROM (

SELECT n, Person, Location=Parent

FROM ExpandHierarchy

UNION ALL

SELECT 0, Person, Location

FROM @PersonsLocations) a

ORDER BY Person, n;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

jace.bennett Mr or Mrs. 500 Points: 502 More actions · Answer 8

Yup that just about does it I think. Thanks so much for that.

Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?

jace.bennett Mr or Mrs. 500 Points: 502 More actions · Answer 9

Steven Willis (7/18/2013)
Maybe this link will help.
http://www.sqlservercentral.com/articles/Hierarchy/94040/[/url]

Thanks! I found that link very helpful. The Nested Sets structure is not appropriate for me case, as I don't have a strict hierarchy, but a lattice. The solution presented in Part 2 seems promising, but I'm still trying to digest the primer materials.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

jace.bennett (7/18/2013)
Yup that just about does it I think. Thanks so much for that.
Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?

The rCTE I provided against your adjacency list will start being less performant if you have many levels in the hierarchy. But if you say, flatten your hierarchy this way in an overnight job, putting those results into another table, that might not work out too bad for you.

I guess it depends on how often you need to access the results of the rCTE.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St