Lots of data + HierarchyId performanceproblems

Question

Lots of data + HierarchyId performanceproblems

oRBIT

SSC-Addicted

Points: 430
More actions
February 21, 2022 at 2:47 pm

#3986441
I've got a table with about 60,000 records that contain data that relates to eachother using a HierarchyId (builds as a treestructure). Each post has a "status"-field which tells if the row is active or nonactive.
I've got a query that wants to get all posts from this table, with the exception of posts that have status="inactive" parents somewhere in the tree above themselves.
The trouble is performance, it takes about 4 minutes or more to get the result. I've tried experimenting a little bit with indexes but no luck (some indexes only works good when I use smaller amounts of data, TOP 1000 for example, with the full table it become slower).
Hints anyone?
```
SELECT Id FROM MyTable bo
WHERE NOT EXISTS 
(SELECT Id FROM MyTable parentBO
WHERE Status = 'Inactive' 
AND bo.Hierarchy.IsDescendantOf(parentBO.Hierarchy) = 1)
```

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 1

Can you post the CREATE TABLE statement for the table, please? I might have a time-saving idea for you if the structure is right for it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

oRBIT SSC-Addicted Points: 430 More actions · Answer 2

Here it is. I've removed lots of other columns that's not relevant.

CREATE TABLE [dbo].[MyTable] (
    [Id]                   UNIQUEIDENTIFIER,
    [Status]               NVARCHAR (255)      NOT NULL,
    [Hierarchy]            [sys].[hierarchyid] NULL,
);

This reply was modified 3 years, 11 months ago by oRBIT.

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 3

I was afraid of that. I was hoping that the remnants of the parent id for each row might be left in the table. I've never used HierarchyID except for once as an experiment that convinced me not to use it. According to the number of responses you've gotten on this thread from others (i.e. no one other than me, so far), apparently few others use it, as well.

I've done this type of thing before with "electronic data discovery" for legal purposes in finding and eliminating "privileged" email chains that were not admissible for evidence. I don't know how to do it with HierarchyID but I basically started at the root and would check one full level at a time. If there was anything in an given node that was marked as "inadmissible" or "privileged", that would disqualify ALL nodes in the chain in the "downline" of that node. In the end, I ended up with a list of nodes that weren't disqualified and that's what we worked with. That was, however, using a "Adjacency List" also known as a Parent/Child list.

I know such a thing is possible to do "from the root to the leafs" with HierarchyID but I don't know how to do it. It DOES look to me like you're starting at the leaf levels and working your way to the root, though, and that's going to be the "long road". My suggestion is to start at the root and work your way "downline" to the leaf levels. It you do that by level, you can quickly eliminate whole and sometimes large branches very quickly.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

oRBIT SSC-Addicted Points: 430 More actions · Answer 4

Ok, thanks for the help anyway.

Sounds like some kind of function is needed.. My experience is that functions should be use with care in terms of performance.

Anyway, to be contiunued..

oRBIT SSC-Addicted Points: 430 More actions · Answer 5

I wonder if there might be some kind of workaround here.. Could I temporary transform the data (quickly obviously) in some way that makes calculations faster?

I doubt some recursive function will do good here, my experience of functions is that they tends to be slow (I think they can't be used in parallell either)..

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 6

oRBIT wrote:

I wonder if there might be some kind of workaround here.. Could I temporary transform the data (quickly obviously) in some way that makes calculations faster?
I doubt some recursive function will do good here, my experience of functions is that they tends to be slow (I think they can't be used in parallell either)..

There absolutely is a way to make hierarchical calculations MUCH faster. We just need to find a way to convert your HierarchyID based database back to an Adjacency List to be able to do it and, unfortunately I don't know enough about how to do it because I flat out don't use HierarchyID and I've not been able to find the old article on that subject that I ran into years ago.

Here's the faster calculation method that I'm talking about. It used to take 54 seconds on the hardware from about a decade ago. It now runs in 19 seconds and pre-calculates most of the common things you could might ask of a hierarchy and could be modified to pre-calculate even more.

https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004701 More actions · Answer 7

oRBIT - I finally figured some stuff out and used and old example of a small hierarchy and converted it to HIERARCHYID. The "ID" column in your table is a UNIQUEIDENTIFIER. Did you use Random GUIDs there or NEWSEQUENTIALID()?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2247 More actions · Answer 8

It's another case of no sample data. If the hierarchyid's are conventionally formed using '/' delimiters then they can be ordinally split. One way to derive the original adjacency could be to OUTER APPLY using the hierarchyid (sample data from my earlier forum post) to select rows where the split item number is equal to the hierarchy level

drop table if exists #adjacencies;
go
create table #adjacencies(
  id                int,
  parentid    int,
  area_name    varchar(30));
go

INSERT INTO #adjacencies VALUES
(1,Null, 'Corporate_HQ'),
(2,1, 'South_Region'),
(3,1, 'North_Region'),
(4,1, 'East_Region'),
(5,1, 'West_Region'),
(6,3, 'Chicago_District'),
(7,3, 'Milwaukee_District'),
(8,3, 'Minneapolis_District'),
(9,6, 'Gold_Coast_Dealer'),
(10,    6, 'Blue_Island_Dealer');

drop table if exists #hierarchies;
go
create table #hierarchies(
  id                int,
  parentid    int,
  area_name    varchar(30),
  [path]            hierarchyid);

with recur_cte(ID, ParentId, area_name, [path]) as (
   select ID, ParentId, area_name,
          cast(concat('/', ID, '/') as hierarchyid)
   from #adjacencies 
   where [ParentID] is null
   union all
   select child.ID, child.ParentId, child.area_name,
          cast(concat(parent.[path].ToString(), child.ID, '/') as hierarchyid)
   from #adjacencies as child 
        join recur_cte as parent on child.ParentID = parent.ID)
insert into #hierarchies
select *
from recur_cte;

/* derive parent_id from hierarchyid */
select h.*, h.[path].ToString() path_string,
       [path].GetLevel() as path_level,
       splt.parent_id
from #hierarchies h
     outer apply (select nullif(ds.Item,'') 
                  from dbo.DelimitedSplit8K_LEAD(h.[path].ToString(), '/') ds
                  where ds.ItemNumber=h.[path].GetLevel()) splt(parent_id);

This reply was modified 3 years, 11 months ago by Steve Collins.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

oRBIT SSC-Addicted Points: 430 More actions · Answer 9

oRBIT

SSC-Addicted

Points: 430

March 7, 2022 at 12:25 pm

#3994919

The Id-column is a NEWSEQUENTIALID