Lots of data + HierarchyId performanceproblems

  • I've got a table with about 60,000 records that contain data that relates to eachother using a HierarchyId (builds as a treestructure). Each post has a "status"-field which tells if the row is active or nonactive.

    I've got a query that wants to get all posts from this table, with the exception of posts that have status="inactive" parents somewhere in the tree above themselves.

    The trouble is performance, it takes about 4 minutes or more to get the result. I've tried experimenting a little bit with indexes but no luck (some indexes only works good when I use smaller amounts of data, TOP 1000 for example, with the full table it become slower).

    Hints anyone?

    SELECT Id FROM MyTable bo
    WHERE NOT EXISTS
    (SELECT Id FROM MyTable parentBO
    WHERE Status = 'Inactive'
    AND bo.Hierarchy.IsDescendantOf(parentBO.Hierarchy) = 1)

     

     

  • Can you post the CREATE TABLE statement for the table, please? I might have a time-saving idea for you if the structure is right for it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here it is. I've removed lots of other columns that's not relevant.

    CREATE TABLE [dbo].[MyTable] (
    [Id] UNIQUEIDENTIFIER,
    [Status] NVARCHAR (255) NOT NULL,
    [Hierarchy] [sys].[hierarchyid] NULL,
    );

     

    • This reply was modified 2 years, 9 months ago by  oRBIT.
  • I was afraid of that.  I was hoping that the remnants of the parent id for each row might be left in the table.  I've never used HierarchyID except for once as an experiment that convinced me not to use it.  According to the number of responses you've gotten on this thread from others (i.e. no one other than me, so far), apparently few others use it, as well.

    I've done this type of thing before with "electronic data discovery" for legal purposes in finding and eliminating "privileged" email chains that were not admissible for evidence.  I don't know how to do it with HierarchyID but I basically started at the root and would check one full level at a time.  If there was anything in an given node that was marked as "inadmissible" or "privileged", that would disqualify ALL nodes in the chain in the "downline" of that node.  In the end, I ended up with a list of nodes that weren't disqualified and that's what we worked with.  That was, however, using a "Adjacency List" also known as a Parent/Child list.

    I know such a thing is possible to do "from the root to the leafs" with HierarchyID but I don't know how to do it.  It DOES look to me like you're starting at the leaf levels and working your way to the root, though, and that's going to be the "long road".  My suggestion is to start at the root and work your way "downline" to the leaf levels.  It you do that by level, you can quickly eliminate whole and sometimes large branches very quickly.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok, thanks for the help anyway.

    Sounds like some kind of function is needed.. My experience is that functions should be use with care in terms of performance.

    Anyway, to be contiunued..

  • I wonder if there might be some kind of workaround here.. Could I temporary transform the data (quickly obviously) in some way that makes calculations faster?

    I doubt some recursive function will do good here, my experience of functions is that they tends to be slow (I think they can't be used in parallell either)..

     

  • oRBIT wrote:

    I wonder if there might be some kind of workaround here.. Could I temporary transform the data (quickly obviously) in some way that makes calculations faster?

    I doubt some recursive function will do good here, my experience of functions is that they tends to be slow (I think they can't be used in parallell either)..

    There absolutely is a way to make hierarchical calculations MUCH faster.  We just need to find a way to convert your HierarchyID based database back to an Adjacency List to be able to do it and, unfortunately I don't know enough about how to do it because I flat out don't use HierarchyID and I've not been able to find the old article on that subject that I ran into years ago.

    Here's the faster calculation method that I'm talking about.  It used to take 54 seconds on the hardware from about a decade ago.  It now runs in 19 seconds and pre-calculates most of the common things you could might ask of a hierarchy and could be modified to pre-calculate even more.

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • oRBIT - I finally figured some stuff out and used and old example of a small hierarchy and converted it to HIERARCHYID.  The "ID" column in your table is a UNIQUEIDENTIFIER.  Did you use Random GUIDs there or NEWSEQUENTIALID()?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's another case of no sample data.  If the hierarchyid's are conventionally formed using '/' delimiters then they can be ordinally split.  One way to derive the original adjacency could be to OUTER APPLY using the hierarchyid (sample data from my earlier forum post) to select rows where the split item number is equal to the hierarchy level

    drop table if exists #adjacencies;
    go
    create table #adjacencies(
    id int,
    parentid int,
    area_name varchar(30));
    go

    INSERT INTO #adjacencies VALUES
    (1,Null, 'Corporate_HQ'),
    (2,1, 'South_Region'),
    (3,1, 'North_Region'),
    (4,1, 'East_Region'),
    (5,1, 'West_Region'),
    (6,3, 'Chicago_District'),
    (7,3, 'Milwaukee_District'),
    (8,3, 'Minneapolis_District'),
    (9,6, 'Gold_Coast_Dealer'),
    (10, 6, 'Blue_Island_Dealer');

    drop table if exists #hierarchies;
    go
    create table #hierarchies(
    id int,
    parentid int,
    area_name varchar(30),
    [path] hierarchyid);

    with recur_cte(ID, ParentId, area_name, [path]) as (
    select ID, ParentId, area_name,
    cast(concat('/', ID, '/') as hierarchyid)
    from #adjacencies
    where [ParentID] is null
    union all
    select child.ID, child.ParentId, child.area_name,
    cast(concat(parent.[path].ToString(), child.ID, '/') as hierarchyid)
    from #adjacencies as child
    join recur_cte as parent on child.ParentID = parent.ID)
    insert into #hierarchies
    select *
    from recur_cte;

    /* derive parent_id from hierarchyid */
    select h.*, h.[path].ToString() path_string,
    [path].GetLevel() as path_level,
    splt.parent_id
    from #hierarchies h
    outer apply (select nullif(ds.Item,'')
    from dbo.DelimitedSplit8K_LEAD(h.[path].ToString(), '/') ds
    where ds.ItemNumber=h.[path].GetLevel()) splt(parent_id);

     

    • This reply was modified 2 years, 8 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This was removed by the editor as SPAM

  • The Id-column is a NEWSEQUENTIALID

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply