Help Converting Query For Hierarchical Data

  • I have a working query within MySQL based on (Managing Hierarchical Data in MySQL - http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). The problem I have is that I need to implement the same functionality on another piece of software which uses MSSQL.

    The two issues I am facing is that it doesn't seem possible to group/order within subqueries in MSSQL, as it produces the error

    The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

    If the ORDER is removed as stated by the error I am left with the following issue

    Column 'core_questionnaire_entries.SurveyEntryID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Is anyone able to help in trying to make this query compatible with MSSQL?

    The query I am trying to convert is

    SELECT
    node.*,
    (
    COUNT( parent.SurveyEntryID ) - ( sub_tree.depth + 1 )) AS depth
    FROM
    core_questionnaire_entries AS node,
    core_questionnaire_entries AS parent,
    core_questionnaire_entries AS sub_parent,
    (
    SELECT
    node.SurveyEntryID,
    ( COUNT( parent.SurveyEntryID ) - 1 ) AS depth
    FROM
    core_questionnaire_entries AS node,
    core_questionnaire_entries AS parent
    WHERE
    node.LeftBound BETWEEN parent.LeftBound
    AND parent.RightBound
    AND node.SurveyEntryID = @0
    AND node.IsDeleted = 0
    AND parent.IsDeleted = 0
    AND node.SurveyID = @1
    AND parent.SurveyID = @1
    GROUP BY
    node.SurveyEntryID
    ORDER BY
    node.LeftBound
    ) AS sub_tree
    WHERE
    node.IsDeleted = 0
    AND parent.IsDeleted = 0
    AND sub_parent.IsDeleted = 0
    AND node.LeftBound BETWEEN parent.LeftBound
    AND parent.RightBound
    AND node.LeftBound BETWEEN sub_parent.LeftBound
    AND sub_parent.RightBound
    AND sub_parent.SurveyEntryID = sub_tree.SurveyEntryID
    AND node.SurveyID = @1
    AND parent.SurveyID = @1
    GROUP BY
    node.SurveyEntryID
    HAVING
    depth = 1
    ORDER BY
    node.LeftBound;

    Any help is appreciated!

    Thanks

  • You can avoid that error by modifying your SELECT:

    SELECT TOP (100) PERCENT ...

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • duffdean wrote:

    I have a working query within MySQL based on (Managing Hierarchical Data in MySQL - http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). The problem I have is that I need to implement the same functionality on another piece of software which uses MSSQL.

    The two issues I am facing is that it doesn't seem possible to group/order within subqueries in MSSQL, as it produces the error

    The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

    If the ORDER is removed as stated by the error I am left with the following issue

    Column 'core_questionnaire_entries.SurveyEntryID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Is anyone able to help in trying to make this query compatible with MSSQL?

    The query I am trying to convert is

    Any help is appreciated!

    Thanks

    The link you provided leads to a really well written post.  Thanks for that.

    It doesn't, however, show you how to auto-magically convert an Adjacency List to Nested Sets.  That's important because if you make a mistake in deleting a node or adding a node or moving a node or whatever, it's not going to be an easy thing for a human to repair because every node is "aware of and related to a whole lot of other nodes".

    In an Adjacency List, you only need to worry about one node at a time and only what the manager is for that node.  In other words it's easy for a human to understand and repair if it comes to that.

    There's a trade off.

    Adjacency List

    1. Super easy for humans to understand and maintain.

    2. An entire "downline" of nodes can be easily moved just by changing the top node of the "downline" instead of updating sometimes thousands of nodes.

    3. Is terrible to query both programmatically and resource usage - wise.

    Nested Sets

    1. Nearly impossible for a human to understand and repair a "big one".

    2. Any "MACD" (Move, Add, Change, Delete) requires that all nodes to the logical right of the given node be updated and can lead to some pretty nasty errors.

    3. They're NASTY FAST for everything except for querying UPLINEs

    Both actually suck a bit for aggregates because they always have to be recalculated

    Converting an Adjacency List to Nested Sets used to take literally days using the old push-stack method for large hierarchies regardless of purpose.

    So, with that, I'll propose that the best way to manage hierarchies is to do MACD's using an Adjacency List and regenerate the Nested Sets any time there's a change using a "nasty fast" method.  Same goes for pre-aggregating dsta for reports or whatever.

    You can find such fast methods in the following two articles.  On current hardware, either of them will fully create a million node hierarchy in about 19 SECONDS. (the articles say 54 seconds but that was on an average, run of the mill laptop from more than a decade ago.

    Here are the links... the first one also does a bit of a deep dive on what is needed in Adjacency Lists, Hierarchical Path, and Nested Sets hierarchies.  The second one does a deep dive on how to pre-aggregate most of the answers to aggregation questions for all 1 million nodes, again, in about 19 seconds on modern hardware.

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

    https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

    And it's all done using T-SQL in SQL Server. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply