Help with hierarchical structure without hierarchical relation field...

  • Hi,

    Our software has 2 tables for accounting information. There's an hierarchy between the accounts but there's no hierarchical fields, the relation is determined by the account number itself (account 111 and 112 are children of 11).

    We have a query that shows all the accounts balance where the parent has the children balances also.

    Since there's no hierarchical field we are getting the results by using the LEFT operator to compare the accounts...

    CREATE TABLE accounts (accountId VARCHAR(10))

    CREATE TABLE accountMovs (accountId VARCHAR(10), debit INT, credit INT)

    GO

    INSERT INTO dbo.accounts VALUES ('11')

    INSERT INTO dbo.accounts VALUES ('111')

    INSERT INTO dbo.accounts VALUES ('112')

    INSERT INTO dbo.accounts VALUES ('12')

    INSERT INTO dbo.accounts VALUES ('121')

    INSERT INTO accountMovs VALUES ('11', 10, 0)

    INSERT INTO accountMovs VALUES ('111', 10, 20)

    INSERT INTO accountMovs VALUES ('112', 10, 10)

    INSERT INTO accountMovs VALUES ('1121', 0, 10)

    INSERT INTO accountMovs VALUES ('1121', 5, 5)

    INSERT INTO accountMovs VALUES ('12', 0, 10)

    INSERT INTO accountMovs VALUES ('121', 10, 10)

    GO

    SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

    FROM accounts t0

    LEFT JOIN accountMovs t1 ON t0.accountId = SUBSTRING(t1.accountId, 1, LEN(t0.accountId))

    GROUP BY t0.accountId

    The results are the expected but with large tables the query takes too long to execute (over 2h or more with 3.000.000 account moves).

    We thought having just the = operator on the accountId and let the software do the SUM on the related account but still it's very slow, the only difference is that's no longer a database problem and is a software problem...).

    Is there a way of doing this with OVER or another command or use an iTVF to "simulate" the hierarchical field?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • You can try

    SELECT a.accountId

    ,(SELECT SUM(m.debit) FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') Debit

    ,(SELECT SUM(m.credit) FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') Credit

    FROM accounts a

    or may be better this:

    SELECT a.accountId

    ,am.Debit

    ,am.Credit

    FROM accounts a

    OUTER APPLY (SELECT SUM(m.debit) Debit, SUM(m.credit) Credit

    FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') am

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Hi,

    The LIKE operator is also very slow with 3.000.000 records... It's even slower than the LEFT in the tests I made.

    Right now I'm trying to do an INNER JOIN between accounts with itself to simulate the hierarchical fields, since accounts has less records than accountMovs (1:1000 ratio or higher).

    SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

    FROM accounts t0

    INNER JOIN accounts tr ON t0.accountId = LEFT(tr.accountId, LEN(t0.accountId))

    LEFT JOIN accountMovs t1 ON tr.accountId = t1.accountId

    GROUP BY t0.accountId

    It's much faster but not "perfect".... Since I can't change these tables structures due to the software I'm thinking of eventually adding a trigger on accounts to insert a record on a new table (accountRelations) that has the hierarchical relation and the use that new table instead of the accounts join with itself...

    Pedro



    If you need to work better, try working less...

  • My first choice was to change the SUBSTRING to the LIKE operator to make the query SARGable.

    SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

    FROM accounts t0

    LEFT JOIN accountMovs t1 ON t1.accountId LIKE t0.accountId + '%'

    GROUP BY t0.accountId

    Since you say it won't work, you should probably share indexes definitions and execution plan for your query as described in this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • So, what's going to happen when you have an account number of, say, 112? BOOM!!!

    My recommendation would be to correctly split the parent/child relationships into parent/child columns. That would also allow for some incredible performance methods that you're not currently enjoying. At the very least, put a separator for each level into the account so that you can easily extract the hierarchical path.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/10/2014)


    So, what's going to happen when you have an account number of, say, 112? BOOM!!!

    My recommendation would be to correctly split the parent/child relationships into parent/child columns. That would also allow for some incredible performance methods that you're not currently enjoying. At the very least, put a separator for each level into the account so that you can easily extract the hierarchical path.

    No need to add a separator since the hierarchy is the number:

    11

    111

    1111

    1112

    112

    113

    12

    121

    .. and so on...

    The SQL code I used is just a "sample", no PK, FKs....

    Right now, since I can't change the current tables, I'm thinking of adding a trigger on the accounts so another table (accountsRelations) is filled with the hierarchical data and use it on the query...

    Thanks,

    Pedro



    If you need to work better, try working less...

  • What happens if a parent needs to have more than 10 children?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (3/10/2014)


    What happens if a parent needs to have more than 10 children?

    Or 99 roots?

    I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/10/2014)


    Luis Cazares (3/10/2014)


    What happens if a parent needs to have more than 10 children?

    Or 99 roots?

    I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.

    We have over 4000 root accounts each with more than 20 children in some cases...

    I've made some tests and with a AccountRelations (parentAccountId, childAccountId) it's over 20x faster than with the LIKE or LEFT...

    From 2h it now takes 2m (in the most extreme case...) compared with the LEFT...

    LIKE is 3x slower than the relations table... (I had the wrong numbers since I used the LEFT on a SSD disk and the LIKE on SATA disk...).

    I'm going to make some tests, since the triggers have an overhead on the INSERT/UPDATE/DELETE operations...

    Pedro



    If you need to work better, try working less...

  • PiMané (3/10/2014)


    Sean Lange (3/10/2014)


    Luis Cazares (3/10/2014)


    What happens if a parent needs to have more than 10 children?

    Or 99 roots?

    I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.

    We have over 4000 root accounts each with more than 20 children in some cases...

    I've made some tests and with a AccountRelations (parentAccountId, childAccountId) it's over 20x faster than with the LIKE or LEFT...

    From 2h it now takes 2m (in the most extreme case...) compared with the LEFT...

    LIKE is 3x slower than the relations table... (I had the wrong numbers since I used the LEFT on a SSD disk and the LIKE on SATA disk...).

    I'm going to make some tests, since the triggers have an overhead on the INSERT/UPDATE/DELETE operations...

    Pedro

    I'm not talking about using "LIKE" or anything else like that (no pun intended). I'm talking about converting this bad boy to something that will blow the doors off of most other things while still giving you the same capabilities. It'll also make maintenance super easy and it'll make troubleshooting a hierarchy a whole lot easier, as well.

    AND, it can actually be done in parallel with what you already have. There's also no need for a trigger. The method I'm talking about will totally rebuild the hierarchy on a million rows in 54 seconds and give you all the advantages of an Adjacency List, Hierarchical Path (which is kind of what you have now), Nested Sets, and a 4th type of hierarchy that can be thought of as a data warehouse for you hierarchy with most of the things you want answered already done.

    Please see the following articles for what I'm talking about.

    http://www.sqlservercentral.com/articles/Hierarchy/94040/

    http://www.sqlservercentral.com/articles/T-SQL/94570/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • edit: nm

  • Our problem is that we can't change the software, it isn't ours...

    The only thing we can do is change some SPs it uses to improve it's performance.

    It's "impossible" to add a new field to any existing software table, like an hierarchical attribute, or change the way data is stored (add a separator).

    We can create new tables and add triggers to fill them, otherwise they won't be filled, and change the existing procedures.

    Pedro



    If you need to work better, try working less...

  • PiMané (3/11/2014)


    Our problem is that we can't change the software, it isn't ours...

    The only thing we can do is change some SPs it uses to improve it's performance.

    It's "impossible" to add a new field to any existing software table, like an hierarchical attribute, or change the way data is stored (add a separator).

    We can create new tables and add triggers to fill them, otherwise they won't be filled, and change the existing procedures.

    Pedro

    Then a computed column may be an excellent way to handle this. You can't insert into it so any CRUD operations would remain exactly the same. The only thing you would change is how you retrieve the data (from the existing procedures).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks,

    I'll try it, don't know if the software has "proper" SQL CRUD (columns specified in the INSERT) or "lazy" (INSERT INTO .. VALUES (..) without the columns name...).

    If there's no problem with adding a new column I'll create the ParentAccountId.

    Just got word from the accounting department that accounts can be:

    11

    112

    112001

    112002

    ...

    children aren't necessarily just another number at the end of the parent...

    If so the computed column formula isn't "simple"...

    Pedro



    If you need to work better, try working less...

  • What is the maximum account number length, practically speaking. Its a varchar(10), do you get account numbers that long?

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply