July 17, 2013 at 9:09 pm
First off, I'm a .NET developer, and not a SQL wonk. Please be gentle. And thanks for your help.
So I have a lattice (a multiparent hierarchy) of objects representing sets of people sliced along some axis. Currently these sets exist as documents in RavenDB. Example:
Set
{
Id: 'some-guid',
Name: 'RegionName',
Dimension: 'Location',
ImmediateParents: ['containing-set-id', 'containing-set-id'],
ImmediateChildren: ['contained-set-id', 'contained-set-id'],
ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']
}
Person
{
...
ImmediateParents: ['sets', 'i', 'am', 'assigned', 'to', 'directly'],
ContainedBy: ['denormalized', 'list', 'of', 'all', 'sets', 'that', 'contain', 'me']
}
The bulk of our queries are either direct reads from one of those lists, or "give me all the people contained by this set" query. Obviously our current implementation makes those queries really fast, but keeping all this updated is a delicate dance.
It occurred to me that having a normalized representation in SQL, and then building a view on top of it to denormalize the "contained by" might get me off the dancefloor. And that if I could materialize (usage?) that view and index it, I could get roughly the same performance profile I have now, with the workload heavily weighted toward write time.
Then tonight I learned of Recursive CTEs and the whole thing gelled for me. But I'm betting on some pretty big assumptions about features of Views that I don't understand. Am I sniffing up a good tree? Thanks for any insight.
July 18, 2013 at 12:16 pm
bump. Any thoughts at all? Is my question clear?
July 18, 2013 at 12:52 pm
It sounds like you are thinking about having 3 columns that each have a delimited list of IDs or something like that? That is certainly not a normalized version. Or maybe I am misunderstanding what you are trying to do.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
July 18, 2013 at 1:10 pm
I'm thinking these tables:
Set(
Id,
Name,
Dimension,
...)
Person (
Id,
Name,
...)
SetRelationships (
ParentSetId,
ChildSetId
)
PersonSets (
PersonId,
SetId
)
And then these tables give me a lot of my queries directly. The ones I want to denormalize would be
PersonContainedBy(
PersonId
SetId
)
SetContainedBy(
SetId,
AncestorId
)
Which would be a flattening of the hierarchy described in the tables.
I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.
I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?
July 18, 2013 at 1:32 pm
jace.bennett (7/18/2013)
I'm thinking these tables:Set(
Id,
Name,
Dimension,
...)
Person (
Id,
Name,
...)
SetRelationships (
ParentSetId,
ChildSetId
)
PersonSets (
PersonId,
SetId
)
And then these tables give me a lot of my queries directly. The ones I want to denormalize would be
PersonContainedBy(
PersonId
SetId
)
SetContainedBy(
SetId,
AncestorId
)
Which would be a flattening of the hierarchy described in the tables.
I.E. if A contains B, and B contains C per the tables, Then C would have records for A and B in the ContainedBy view.
I guess my question is if I build a materialized (or indexed) view over a Recursive CTE, how performant will that be? If I change the underlying data, will it have to recalculate the whole view, or just the changed records?
I guess I am missing why you think you need a recursive cte for this. Your data is denormalized so it would seem to eliminate the need for recursion.
Your definition of the problem is so vague that I am having a hard time envisioning what you are doing here.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
July 18, 2013 at 1:48 pm
July 18, 2013 at 1:57 pm
First, thanks for your time. Sorry If I'm communicating poorly.
Imagine the sets describe locations: North America, USA, Georgia, Fulton County, Atlanta.
I might describe this relationship in a relationship table (I'm excluding the actual person and set tables and using names instead of Ids for clarity):
Parent Set | Child Set
============================
North America | USA
USA | Georgia
Georgia | Fulton County
Georgia | Dekalb County
Fulton County | Atlanta
Dekalb County | Decatur
Person | Set
================
Joe | Atlanta
Jane | Decatur
Now, I have an extremely frequent and latency-sensitive query that says "Give me all the people (or sets) under Georgia". My thought is to make this query really fast by projecting it into this shape:
Person | Contained By
==============
Joe | Atlanta
Joe | Fulton County
Joe | Georgia
Joe | USA
Joe | North America
Jane | Decatur
Jane | Dekalb County
Jane | Georgia
Jane | USA
Jane | North America
And then another one for SetsContainedBy.
Sadly, Using actual snowflake schema has two distinct disadvantages in my case, the first is that sets (and dimensions) are really dynamic, and that I can have sets be child to multiple other sets (for example, Atlanta actually lies in two counties, Fulton and Dekalb).
I hope that clarifies my goal, and why the recursion seems like an obvious win to a non-dba. Thanks again.
July 18, 2013 at 6:36 pm
Are you looking for something like this?
DECLARE @AdjacencyList TABLE
(
Parent VARCHAR(20)
,Child VARCHAR(20)
);
DECLARE @PersonsLocations TABLE
(
Person VARCHAR(20)
,Location VARCHAR(20)
);
INSERT INTO @AdjacencyList
SELECT 'North America','USA'
UNION ALL SELECT 'USA','Georgia'
UNION ALL SELECT 'Georgia','Fulton County'
UNION ALL SELECT 'Georgia','Dekalb County'
UNION ALL SELECT 'Fulton County','Atlanta'
UNION ALL SELECT 'Dekalb County','Decatur';
INSERT INTO @PersonsLocations
SELECT 'Joe','Atlanta'
UNION ALL SELECT 'Jane','Decatur';
WITH ExpandHierarchy AS (
SELECT n=1, Person, Location, Parent, Child
FROM @PersonsLocations a
INNER JOIN @AdjacencyList b ON b.Child = a.Location
UNION ALL
SELECT n+1, a.Person, a.Location, b.Parent, b.Child
FROM ExpandHierarchy a
INNER JOIN @AdjacencyList b ON a.Parent = b.Child
)
SELECT Person, Location
FROM (
SELECT n, Person, Location=Parent
FROM ExpandHierarchy
UNION ALL
SELECT 0, Person, Location
FROM @PersonsLocations) a
ORDER BY Person, n;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 18, 2013 at 8:35 pm
Yup that just about does it I think. Thanks so much for that.
Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?
July 18, 2013 at 8:38 pm
Steven Willis (7/18/2013)
Maybe this link will help.http://www.sqlservercentral.com/articles/Hierarchy/94040/[/url]
Thanks! I found that link very helpful. The Nested Sets structure is not appropriate for me case, as I don't have a strict hierarchy, but a lattice. The solution presented in Part 2 seems promising, but I'm still trying to digest the primer materials.
July 18, 2013 at 8:47 pm
jace.bennett (7/18/2013)
Yup that just about does it I think. Thanks so much for that.Particulars of the projection aside, my original question was about how this style query plays with views. I want the results of that query indexed, and kept largely in sync with the adjacency list (learned a new term, thanks, guys). What is my best option? Will a View over that query be performant in the face of the occasional update? Or will it have to rerun the entirety of the query and recreate the index? Should I just run the projection on a schedule or try to work out something with triggers?
The rCTE I provided against your adjacency list will start being less performant if you have many levels in the hierarchy. But if you say, flatten your hierarchy this way in an overnight job, putting those results into another table, that might not work out too bad for you.
I guess it depends on how often you need to access the results of the rCTE.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply