Help Converting Query For Hierarchical Data

Question

Help Converting Query For Hierarchical Data

SSC Veteran

Points: 217

November 5, 2021 at 12:43 pm

I have a working query within MySQL based on (Managing Hierarchical Data in MySQL - http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). The problem I have is that I need to implement the same functionality on another piece of software which uses MSSQL.

The two issues I am facing is that it doesn't seem possible to group/order within subqueries in MSSQL, as it produces the error

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

If the ORDER is removed as stated by the error I am left with the following issue

Column 'core_questionnaire_entries.SurveyEntryID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Is anyone able to help in trying to make this query compatible with MSSQL?

The query I am trying to convert is

SELECT
    node.*,
    (
    COUNT( parent.SurveyEntryID ) - ( sub_tree.depth + 1 )) AS depth 
FROM
    core_questionnaire_entries AS node,
    core_questionnaire_entries AS parent,
    core_questionnaire_entries AS sub_parent,
    (
    SELECT
        node.SurveyEntryID,
        ( COUNT( parent.SurveyEntryID ) - 1 ) AS depth 
    FROM
        core_questionnaire_entries AS node,
        core_questionnaire_entries AS parent 
    WHERE
        node.LeftBound BETWEEN parent.LeftBound 
        AND parent.RightBound 
        AND node.SurveyEntryID = @0 
        AND node.IsDeleted = 0 
        AND parent.IsDeleted = 0 
        AND node.SurveyID = @1 
        AND parent.SurveyID = @1 
    GROUP BY
        node.SurveyEntryID 
    ORDER BY
        node.LeftBound 
    ) AS sub_tree 
WHERE
    node.IsDeleted = 0 
    AND parent.IsDeleted = 0 
    AND sub_parent.IsDeleted = 0 
    AND node.LeftBound BETWEEN parent.LeftBound 
    AND parent.RightBound 
    AND node.LeftBound BETWEEN sub_parent.LeftBound 
    AND sub_parent.RightBound 
    AND sub_parent.SurveyEntryID = sub_tree.SurveyEntryID 
    AND node.SurveyID = @1 
    AND parent.SurveyID = @1 
GROUP BY
    node.SurveyEntryID 
HAVING
    depth = 1 
ORDER BY
    node.LeftBound;

Any help is appreciated!

Thanks

This topic was modified 3 years ago by Steve Jones - SSC Editor. Reason: moved

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246960 More actions · Answer 1

You can avoid that error by modifying your SELECT:

SELECT TOP (100) PERCENT ...

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Jeff Moden SSC Guru Points: 1004324 More actions · Answer 2

duffdean wrote:

I have a working query within MySQL based on (Managing Hierarchical Data in MySQL - http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/). The problem I have is that I need to implement the same functionality on another piece of software which uses MSSQL.
The two issues I am facing is that it doesn't seem possible to group/order within subqueries in MSSQL, as it produces the error
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
If the ORDER is removed as stated by the error I am left with the following issue
Column 'core_questionnaire_entries.SurveyEntryID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Is anyone able to help in trying to make this query compatible with MSSQL?
The query I am trying to convert is
Any help is appreciated!
Thanks

The link you provided leads to a really well written post. Thanks for that.

It doesn't, however, show you how to auto-magically convert an Adjacency List to Nested Sets. That's important because if you make a mistake in deleting a node or adding a node or moving a node or whatever, it's not going to be an easy thing for a human to repair because every node is "aware of and related to a whole lot of other nodes".

In an Adjacency List, you only need to worry about one node at a time and only what the manager is for that node. In other words it's easy for a human to understand and repair if it comes to that.

There's a trade off.

Adjacency List

1. Super easy for humans to understand and maintain.

2. An entire "downline" of nodes can be easily moved just by changing the top node of the "downline" instead of updating sometimes thousands of nodes.

3. Is terrible to query both programmatically and resource usage - wise.

Nested Sets

1. Nearly impossible for a human to understand and repair a "big one".

2. Any "MACD" (Move, Add, Change, Delete) requires that all nodes to the logical right of the given node be updated and can lead to some pretty nasty errors.

3. They're NASTY FAST for everything except for querying UPLINEs

Both actually suck a bit for aggregates because they always have to be recalculated

Converting an Adjacency List to Nested Sets used to take literally days using the old push-stack method for large hierarchies regardless of purpose.

So, with that, I'll propose that the best way to manage hierarchies is to do MACD's using an Adjacency List and regenerate the Nested Sets any time there's a change using a "nasty fast" method. Same goes for pre-aggregating dsta for reports or whatever.

You can find such fast methods in the following two articles. On current hardware, either of them will fully create a million node hierarchy in about 19 SECONDS. (the articles say 54 seconds but that was on an average, run of the mill laptop from more than a decade ago.

Here are the links... the first one also does a bit of a deep dive on what is needed in Adjacency Lists, Hierarchical Path, and Nested Sets hierarchies. The second one does a deep dive on how to pre-aggregate most of the answers to aggregation questions for all 1 million nodes, again, in about 19 seconds on modern hardware.

https://www.sqlservercentral.com/articles/hierarchies-on-steroids-1-convert-an-adjacency-list-to-nested-sets

https://www.sqlservercentral.com/articles/hierarchies-on-steroids-2-a-replacement-for-nested-sets-calculations-1

And it's all done using T-SQL in SQL Server. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)