Help with hierarchical structure without hierarchical relation field...

  • Nevyn (3/11/2014)


    What is the maximum account number length, practically speaking. Its a varchar(10), do you get account numbers that long?

    20 chars is the maximum we have in our company.

    But some companies have sub account number, internal numbers, that can be 30 or more.



    If you need to work better, try working less...

  • PiMané (3/11/2014)


    Thanks,

    I'll try it, don't know if the software has "proper" SQL CRUD (columns specified in the INSERT) or "lazy" (INSERT INTO .. VALUES (..) without the columns name...).

    If there's no problem with adding a new column I'll create the ParentAccountId.

    Just got word from the accounting department that accounts can be:

    11

    112

    112001

    112002

    ...

    children aren't necessarily just another number at the end of the parent...

    If so the computed column formula isn't "simple"...

    Pedro

    Yeah that would be a definite gotcha if they don't specify the columns. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Yeah that would be a definite gotcha if they don't specify the columns. 😉

    If that is a problem, you could do a indexed view on accounts with the calculation for parent, could you not? Some of the same tradeoffs as the trigger, but still ...

  • No, you can't have an indexed view that uses an JOIN with the same table twice.

    I'd need

    SELECT a1.accountId, a2.accountId parentAccountId FROM dbo.accounts a1

    INNER JOIN dbo.accounts a2 ON a1.accountId LIKE a2.accountId + '%' AND a1.accountId <> a2.accountId

    and you do this on indexed views... just like you can't have OUTTER JOINs and other limitations..

    Pedro



    If you need to work better, try working less...

  • That wasn't what I was suggesting.

    What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?

    You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?

    EDIT: ok, just saw your note above

    So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?

  • Nevyn (3/12/2014)


    That wasn't what I was suggesting.

    What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?

    You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?

    EDIT: ok, just saw your note above

    So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?

    Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.

    It's much faster than the LEFT(..) used currently and I've already changed it...

    Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..

    With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...

    Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..

    Pedro



    If you need to work better, try working less...

  • PiMané (3/12/2014)


    Nevyn (3/12/2014)


    That wasn't what I was suggesting.

    What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?

    You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?

    EDIT: ok, just saw your note above

    So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?

    Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.

    It's much faster than the LEFT(..) used currently and I've already changed it...

    Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..

    With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...

    Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..

    Pedro

    But, you're still calculating something that might not change. If there's a way to determine the parent and child values for the account number and store them as persisted computed columns, THEN we could open up a whole new world of hierarchical ease and performance for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PiMané (3/10/2014)


    Hi,

    The LIKE operator is also very slow with 3.000.000 records... It's even slower than the LEFT in the tests I made.

    Right now I'm trying to do an INNER JOIN between accounts with itself to simulate the hierarchical fields, since accounts has less records than accountMovs (1:1000 ratio or higher).

    SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

    FROM accounts t0

    INNER JOIN accounts tr ON t0.accountId = LEFT(tr.accountId, LEN(t0.accountId))

    LEFT JOIN accountMovs t1 ON tr.accountId = t1.accountId

    GROUP BY t0.accountId

    It's much faster but not "perfect".... Since I can't change these tables structures due to the software I'm thinking of eventually adding a trigger on accounts to insert a record on a new table (accountRelations) that has the hierarchical relation and the use that new table instead of the accounts join with itself...

    Pedro

    Can I assume from the above query that all accounts in accountMovs have a record in accounts? It was not that way in your sample data, but this query implies that's what its like.

    Here's what I've come up with. Its a bother made out of stubbornness and a tad silly, and the trigger solution will probably make a faster query. Just a matter of how badly you want to avoid a trigger. I'd be curious whether it beats the like for your whole rowset.

    --First, your tables

    CREATE TABLE accounts (accountId VARCHAR(10))

    CREATE TABLE accountMovs (accountId VARCHAR(10), debit INT, credit INT)

    GO

    --Next, insert your data ... with a small change

    INSERT INTO dbo.accounts VALUES ('11')

    INSERT INTO dbo.accounts VALUES ('111')

    INSERT INTO dbo.accounts VALUES ('112')

    INSERT INTO dbo.accounts VALUES ('12')

    INSERT INTO dbo.accounts VALUES ('121')

    INSERT INTO dbo.accounts VALUES ('1121')

    INSERT INTO accountMovs VALUES ('11', 10, 0)

    INSERT INTO accountMovs VALUES ('111', 10, 20)

    INSERT INTO accountMovs VALUES ('112', 10, 10)

    INSERT INTO accountMovs VALUES ('1121', 0, 10)

    INSERT INTO accountMovs VALUES ('1121', 5, 5)

    INSERT INTO accountMovs VALUES ('12', 0, 10)

    INSERT INTO accountMovs VALUES ('121', 10, 10)

    GO

    --Now, we need a 20ish row tally table. Can't be a CTE or derived table.

    CREATE TABLE [dbo].[PossibleLevels](

    [id] [int] NOT NULL

    ) ON [PRIMARY]

    GO

    --Load it

    INSERT INTO PossibleLevels (id)

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5 UNION ALL

    SELECT 6 UNION ALL

    SELECT 7 UNION ALL

    SELECT 8 UNION ALL

    SELECT 9 UNION ALL

    SELECT 10 UNION ALL

    SELECT 11 UNION ALL

    SELECT 12 UNION ALL

    SELECT 13 UNION ALL

    SELECT 14 UNION ALL

    SELECT 15 UNION ALL

    SELECT 16 UNION ALL

    SELECT 17 UNION ALL

    SELECT 18 UNION ALL

    SELECT 19 UNION ALL

    SELECT 20

    -- Now, the view. It includes every possible parent account and itself. So, for 200k accounts, it could be up to 4million records.

    CREATE VIEW PossibleParents

    WITH SCHEMABINDING

    AS

    SELECT accountId,SUBSTRING(accountId,1,p.id) PossibleParentId

    FROM dbo.accounts a

    JOIN dbo.[PossibleLevels] p

    ON LEN(a.accountId) >= p.id

    -- and the index on the view

    CREATE UNIQUE CLUSTERED INDEX ParentIndex

    ON dbo.PossibleParents (PossibleParentId,accountId)

    -- and finally the query

    SELECT a.accountId, SUM(am.debit), SUM(am.credit)

    FROM accounts a

    JOIN PossibleParents p

    ON a.accountId = p.PossibleParentId

    LEFT JOIN accountMovs am

    ON am.accountId = p.accountId

    GROUP BY a.accountId

    Let me know what you think. Don't know if there is a way to modify it for Jeff's magic. You need the join back to the table to get the true hierarchy, and if doing Jeff's stuff you'd make the 'ON LEN(a.accountId) >= p.id' into 'ON LEN(a.accountId) > p.id' to avoid the self match.

  • This takes 34 seconds to run against 3.5M account moves.

    CREATE TABLE #accounts (accountId VARCHAR(10))

    CREATE TABLE #accountMovs (accountId VARCHAR(10), debit INT, credit INT)

    GO

    INSERT INTO #accounts VALUES ('11')

    INSERT INTO #accounts VALUES ('111')

    INSERT INTO #accounts VALUES ('112')

    INSERT INTO #accounts VALUES ('12')

    INSERT INTO #accounts VALUES ('121')

    INSERT INTO #accountMovs VALUES ('11', 10, 0)

    INSERT INTO #accountMovs VALUES ('111', 10, 20)

    INSERT INTO #accountMovs VALUES ('112', 10, 10)

    INSERT INTO #accountMovs VALUES ('1121', 0, 10)

    INSERT INTO #accountMovs VALUES ('1121', 5, 5)

    INSERT INTO #accountMovs VALUES ('12', 0, 10)

    INSERT INTO #accountMovs VALUES ('121', 10, 10)

    GO

    --SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

    --FROM #accounts t0

    --LEFT JOIN #accountMovs t1 ON t0.accountId = SUBSTRING(t1.accountId, 1, LEN(t0.accountId))

    --GROUP BY t0.accountId

    -- Blow row set out to 3.5M rows

    WITH Tally(n) AS

    (

    SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT INTO #accountMovs

    SELECT accountID, debit, credit

    FROM #accountMovs

    CROSS JOIN Tally;

    SELECT accountId=acct, SUM(debit), SUM(credit)

    FROM #accountMovs a

    CROSS APPLY

    (

    VALUES(CASE WHEN LEN(AccountID) > 2 THEN LEFT(AccountID, 2) END),(AccountID)

    ) b (acct)

    JOIN #accounts c ON c.accountId = b.acct

    WHERE acct IS NOT NULL

    GROUP BY acct;

    GO

    DROP TABLE #accounts;

    DROP TABLE #accountMovs;

    Edit: That timing includes the Tally/data setup to 3.5M rows and it is on my laptop. I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (3/13/2014)


    I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.

    If you can't change the table structure, build a "Sister Table". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply