Help with hierarchical structure without hierarchical relation field...

Question

Help with hierarchical structure without hierarchical relation field...

Viewing 10 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

PiMané SSChampion Points: 13965 More actions · Answer 1

Nevyn (3/11/2014)
What is the maximum account number length, practically speaking. Its a varchar(10), do you get account numbers that long?

20 chars is the maximum we have in our company.

But some companies have sub account number, internal numbers, that can be 30 or more.

If you need to work better, try working less...

Sean Lange SSC Guru Points: 286609 More actions · Answer 2

PiMané (3/11/2014)
Thanks,
I'll try it, don't know if the software has "proper" SQL CRUD (columns specified in the INSERT) or "lazy" (INSERT INTO .. VALUES (..) without the columns name...).
If there's no problem with adding a new column I'll create the ParentAccountId.
Just got word from the accounting department that accounts can be:
11
112
112001
112002
...
children aren't necessarily just another number at the end of the parent...
If so the computed column formula isn't "simple"...
Pedro

Yeah that would be a definite gotcha if they don't specify the columns. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Nevyn SSChampion Points: 14210 More actions · Answer 3

Yeah that would be a definite gotcha if they don't specify the columns. 😉

If that is a problem, you could do a indexed view on accounts with the calculation for parent, could you not? Some of the same tradeoffs as the trigger, but still ...

PiMané SSChampion Points: 13965 More actions · Answer 4

No, you can't have an indexed view that uses an JOIN with the same table twice.

I'd need

SELECT a1.accountId, a2.accountId parentAccountId FROM dbo.accounts a1

INNER JOIN dbo.accounts a2 ON a1.accountId LIKE a2.accountId + '%' AND a1.accountId <> a2.accountId

and you do this on indexed views... just like you can't have OUTTER JOINs and other limitations..

Pedro

If you need to work better, try working less...

Nevyn SSChampion Points: 14210 More actions · Answer 5

That wasn't what I was suggesting.

What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?

You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?

EDIT: ok, just saw your note above

So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?

PiMané SSChampion Points: 13965 More actions · Answer 6

Nevyn (3/12/2014)
That wasn't what I was suggesting.
What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?
You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?
EDIT: ok, just saw your note above
So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?

Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.

It's much faster than the LEFT(..) used currently and I've already changed it...

Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..

With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...

Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..

Pedro

If you need to work better, try working less...

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

PiMané (3/12/2014)
Nevyn (3/12/2014)
That wasn't what I was suggesting.
What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?
You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?
EDIT: ok, just saw your note above
So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?
Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.
It's much faster than the LEFT(..) used currently and I've already changed it...
Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..
With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...
Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..
Pedro

But, you're still calculating something that might not change. If there's a way to determine the parent and child values for the account number and store them as persisted computed columns, THEN we could open up a whole new world of hierarchical ease and performance for you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Nevyn SSChampion Points: 14210 More actions · Answer 8

PiMané (3/10/2014)
Hi,
The LIKE operator is also very slow with 3.000.000 records... It's even slower than the LEFT in the tests I made.
Right now I'm trying to do an INNER JOIN between accounts with itself to simulate the hierarchical fields, since accounts has less records than accountMovs (1:1000 ratio or higher).
SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
FROM accounts t0
INNER JOIN accounts tr ON t0.accountId = LEFT(tr.accountId, LEN(t0.accountId))
LEFT JOIN accountMovs t1 ON tr.accountId = t1.accountId
GROUP BY t0.accountId
It's much faster but not "perfect".... Since I can't change these tables structures due to the software I'm thinking of eventually adding a trigger on accounts to insert a record on a new table (accountRelations) that has the hierarchical relation and the use that new table instead of the accounts join with itself...
Pedro

Can I assume from the above query that all accounts in accountMovs have a record in accounts? It was not that way in your sample data, but this query implies that's what its like.

Here's what I've come up with. Its a bother made out of stubbornness and a tad silly, and the trigger solution will probably make a faster query. Just a matter of how badly you want to avoid a trigger. I'd be curious whether it beats the like for your whole rowset.

--First, your tables

CREATE TABLE accounts (accountId VARCHAR(10))

CREATE TABLE accountMovs (accountId VARCHAR(10), debit INT, credit INT)

GO

--Next, insert your data ... with a small change

INSERT INTO dbo.accounts VALUES ('11')

INSERT INTO dbo.accounts VALUES ('111')

INSERT INTO dbo.accounts VALUES ('112')

INSERT INTO dbo.accounts VALUES ('12')

INSERT INTO dbo.accounts VALUES ('121')

INSERT INTO dbo.accounts VALUES ('1121')

INSERT INTO accountMovs VALUES ('11', 10, 0)

INSERT INTO accountMovs VALUES ('111', 10, 20)

INSERT INTO accountMovs VALUES ('112', 10, 10)

INSERT INTO accountMovs VALUES ('1121', 0, 10)

INSERT INTO accountMovs VALUES ('1121', 5, 5)

INSERT INTO accountMovs VALUES ('12', 0, 10)

INSERT INTO accountMovs VALUES ('121', 10, 10)

GO

--Now, we need a 20ish row tally table. Can't be a CTE or derived table.

CREATE TABLE [dbo].[PossibleLevels](

[id] [int] NOT NULL

) ON [PRIMARY]

GO

--Load it

INSERT INTO PossibleLevels (id)

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 5 UNION ALL

SELECT 6 UNION ALL

SELECT 7 UNION ALL

SELECT 8 UNION ALL

SELECT 9 UNION ALL

SELECT 10 UNION ALL

SELECT 11 UNION ALL

SELECT 12 UNION ALL

SELECT 13 UNION ALL

SELECT 14 UNION ALL

SELECT 15 UNION ALL

SELECT 16 UNION ALL

SELECT 17 UNION ALL

SELECT 18 UNION ALL

SELECT 19 UNION ALL

SELECT 20

-- Now, the view. It includes every possible parent account and itself. So, for 200k accounts, it could be up to 4million records.

CREATE VIEW PossibleParents

WITH SCHEMABINDING

AS

SELECT accountId,SUBSTRING(accountId,1,p.id) PossibleParentId

FROM dbo.accounts a

JOIN dbo.[PossibleLevels] p

ON LEN(a.accountId) >= p.id

-- and the index on the view

CREATE UNIQUE CLUSTERED INDEX ParentIndex

ON dbo.PossibleParents (PossibleParentId,accountId)

-- and finally the query

SELECT a.accountId, SUM(am.debit), SUM(am.credit)

FROM accounts a

JOIN PossibleParents p

ON a.accountId = p.PossibleParentId

LEFT JOIN accountMovs am

ON am.accountId = p.accountId

GROUP BY a.accountId

Let me know what you think. Don't know if there is a way to modify it for Jeff's magic. You need the join back to the table to get the true hierarchy, and if doing Jeff's stuff you'd make the 'ON LEN(a.accountId) >= p.id' into 'ON LEN(a.accountId) > p.id' to avoid the self match.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 9

This takes 34 seconds to run against 3.5M account moves.

CREATE TABLE #accounts (accountId VARCHAR(10))

CREATE TABLE #accountMovs (accountId VARCHAR(10), debit INT, credit INT)

GO

INSERT INTO #accounts VALUES ('11')

INSERT INTO #accounts VALUES ('111')

INSERT INTO #accounts VALUES ('112')

INSERT INTO #accounts VALUES ('12')

INSERT INTO #accounts VALUES ('121')

INSERT INTO #accountMovs VALUES ('11', 10, 0)

INSERT INTO #accountMovs VALUES ('111', 10, 20)

INSERT INTO #accountMovs VALUES ('112', 10, 10)

INSERT INTO #accountMovs VALUES ('1121', 0, 10)

INSERT INTO #accountMovs VALUES ('1121', 5, 5)

INSERT INTO #accountMovs VALUES ('12', 0, 10)

INSERT INTO #accountMovs VALUES ('121', 10, 10)

GO

--SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)

--FROM #accounts t0

--LEFT JOIN #accountMovs t1 ON t0.accountId = SUBSTRING(t1.accountId, 1, LEN(t0.accountId))

--GROUP BY t0.accountId

-- Blow row set out to 3.5M rows

WITH Tally(n) AS

(

SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b

)

INSERT INTO #accountMovs

SELECT accountID, debit, credit

FROM #accountMovs

CROSS JOIN Tally;

SELECT accountId=acct, SUM(debit), SUM(credit)

FROM #accountMovs a

CROSS APPLY

(

VALUES(CASE WHEN LEN(AccountID) > 2 THEN LEFT(AccountID, 2) END),(AccountID)

) b (acct)

JOIN #accounts c ON c.accountId = b.acct

WHERE acct IS NOT NULL

GROUP BY acct;

GO

DROP TABLE #accounts;

DROP TABLE #accountMovs;

Edit: That timing includes the Tally/data setup to 3.5M rows and it is on my laptop. I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

dwain.c (3/13/2014)
I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.

If you can't change the table structure, build a "Sister Table". 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)