March 10, 2014 at 9:34 am
Hi,
Our software has 2 tables for accounting information. There's an hierarchy between the accounts but there's no hierarchical fields, the relation is determined by the account number itself (account 111 and 112 are children of 11).
We have a query that shows all the accounts balance where the parent has the children balances also.
Since there's no hierarchical field we are getting the results by using the LEFT operator to compare the accounts...
CREATE TABLE accounts (accountId VARCHAR(10))
CREATE TABLE accountMovs (accountId VARCHAR(10), debit INT, credit INT)
GO
INSERT INTO dbo.accounts VALUES ('11')
INSERT INTO dbo.accounts VALUES ('111')
INSERT INTO dbo.accounts VALUES ('112')
INSERT INTO dbo.accounts VALUES ('12')
INSERT INTO dbo.accounts VALUES ('121')
INSERT INTO accountMovs VALUES ('11', 10, 0)
INSERT INTO accountMovs VALUES ('111', 10, 20)
INSERT INTO accountMovs VALUES ('112', 10, 10)
INSERT INTO accountMovs VALUES ('1121', 0, 10)
INSERT INTO accountMovs VALUES ('1121', 5, 5)
INSERT INTO accountMovs VALUES ('12', 0, 10)
INSERT INTO accountMovs VALUES ('121', 10, 10)
GO
SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
FROM accounts t0
LEFT JOIN accountMovs t1 ON t0.accountId = SUBSTRING(t1.accountId, 1, LEN(t0.accountId))
GROUP BY t0.accountId
The results are the expected but with large tables the query takes too long to execute (over 2h or more with 3.000.000 account moves).
We thought having just the = operator on the accountId and let the software do the SUM on the related account but still it's very slow, the only difference is that's no longer a database problem and is a software problem...).
Is there a way of doing this with OVER or another command or use an iTVF to "simulate" the hierarchical field?
Thanks,
Pedro
March 10, 2014 at 10:37 am
You can try
SELECT a.accountId
,(SELECT SUM(m.debit) FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') Debit
,(SELECT SUM(m.credit) FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') Credit
FROM accounts a
or may be better this:
SELECT a.accountId
,am.Debit
,am.Credit
FROM accounts a
OUTER APPLY (SELECT SUM(m.debit) Debit, SUM(m.credit) Credit
FROM accountMovs m WHERE m.accountId LIKE a.accountId + '%') am
March 10, 2014 at 10:45 am
Hi,
The LIKE operator is also very slow with 3.000.000 records... It's even slower than the LEFT in the tests I made.
Right now I'm trying to do an INNER JOIN between accounts with itself to simulate the hierarchical fields, since accounts has less records than accountMovs (1:1000 ratio or higher).
SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
FROM accounts t0
INNER JOIN accounts tr ON t0.accountId = LEFT(tr.accountId, LEN(t0.accountId))
LEFT JOIN accountMovs t1 ON tr.accountId = t1.accountId
GROUP BY t0.accountId
It's much faster but not "perfect".... Since I can't change these tables structures due to the software I'm thinking of eventually adding a trigger on accounts to insert a record on a new table (accountRelations) that has the hierarchical relation and the use that new table instead of the accounts join with itself...
Pedro
March 10, 2014 at 11:07 am
My first choice was to change the SUBSTRING to the LIKE operator to make the query SARGable.
SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
FROM accounts t0
LEFT JOIN accountMovs t1 ON t1.accountId LIKE t0.accountId + '%'
GROUP BY t0.accountId
Since you say it won't work, you should probably share indexes definitions and execution plan for your query as described in this article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
March 10, 2014 at 12:51 pm
So, what's going to happen when you have an account number of, say, 112? BOOM!!!
My recommendation would be to correctly split the parent/child relationships into parent/child columns. That would also allow for some incredible performance methods that you're not currently enjoying. At the very least, put a separator for each level into the account so that you can easily extract the hierarchical path.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2014 at 1:25 pm
Jeff Moden (3/10/2014)
So, what's going to happen when you have an account number of, say, 112? BOOM!!!My recommendation would be to correctly split the parent/child relationships into parent/child columns. That would also allow for some incredible performance methods that you're not currently enjoying. At the very least, put a separator for each level into the account so that you can easily extract the hierarchical path.
No need to add a separator since the hierarchy is the number:
11
111
1111
1112
112
113
12
121
.. and so on...
The SQL code I used is just a "sample", no PK, FKs....
Right now, since I can't change the current tables, I'm thinking of adding a trigger on the accounts so another table (accountsRelations) is filled with the hierarchical data and use it on the query...
Thanks,
Pedro
March 10, 2014 at 1:38 pm
What happens if a parent needs to have more than 10 children?
March 10, 2014 at 2:41 pm
Luis Cazares (3/10/2014)
What happens if a parent needs to have more than 10 children?
Or 99 roots?
I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 10, 2014 at 4:52 pm
Sean Lange (3/10/2014)
Luis Cazares (3/10/2014)
What happens if a parent needs to have more than 10 children?Or 99 roots?
I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.
We have over 4000 root accounts each with more than 20 children in some cases...
I've made some tests and with a AccountRelations (parentAccountId, childAccountId) it's over 20x faster than with the LIKE or LEFT...
From 2h it now takes 2m (in the most extreme case...) compared with the LEFT...
LIKE is 3x slower than the relations table... (I had the wrong numbers since I used the LEFT on a SSD disk and the LIKE on SATA disk...).
I'm going to make some tests, since the triggers have an overhead on the INSERT/UPDATE/DELETE operations...
Pedro
March 10, 2014 at 5:27 pm
PiMané (3/10/2014)
Sean Lange (3/10/2014)
Luis Cazares (3/10/2014)
What happens if a parent needs to have more than 10 children?Or 99 roots?
I'm with Jeff. If you can't actually properly normalize this data then I would add a separator. That would allow you to use the DelimitedSplit8K instead of painfully slow string manipulation.
We have over 4000 root accounts each with more than 20 children in some cases...
I've made some tests and with a AccountRelations (parentAccountId, childAccountId) it's over 20x faster than with the LIKE or LEFT...
From 2h it now takes 2m (in the most extreme case...) compared with the LEFT...
LIKE is 3x slower than the relations table... (I had the wrong numbers since I used the LEFT on a SSD disk and the LIKE on SATA disk...).
I'm going to make some tests, since the triggers have an overhead on the INSERT/UPDATE/DELETE operations...
Pedro
I'm not talking about using "LIKE" or anything else like that (no pun intended). I'm talking about converting this bad boy to something that will blow the doors off of most other things while still giving you the same capabilities. It'll also make maintenance super easy and it'll make troubleshooting a hierarchy a whole lot easier, as well.
AND, it can actually be done in parallel with what you already have. There's also no need for a trigger. The method I'm talking about will totally rebuild the hierarchy on a million rows in 54 seconds and give you all the advantages of an Adjacency List, Hierarchical Path (which is kind of what you have now), Nested Sets, and a 4th type of hierarchy that can be thought of as a data warehouse for you hierarchy with most of the things you want answered already done.
Please see the following articles for what I'm talking about.
http://www.sqlservercentral.com/articles/Hierarchy/94040/
http://www.sqlservercentral.com/articles/T-SQL/94570/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 10, 2014 at 5:46 pm
edit: nm
March 11, 2014 at 10:01 am
Our problem is that we can't change the software, it isn't ours...
The only thing we can do is change some SPs it uses to improve it's performance.
It's "impossible" to add a new field to any existing software table, like an hierarchical attribute, or change the way data is stored (add a separator).
We can create new tables and add triggers to fill them, otherwise they won't be filled, and change the existing procedures.
Pedro
March 11, 2014 at 10:08 am
PiMané (3/11/2014)
Our problem is that we can't change the software, it isn't ours...The only thing we can do is change some SPs it uses to improve it's performance.
It's "impossible" to add a new field to any existing software table, like an hierarchical attribute, or change the way data is stored (add a separator).
We can create new tables and add triggers to fill them, otherwise they won't be filled, and change the existing procedures.
Pedro
Then a computed column may be an excellent way to handle this. You can't insert into it so any CRUD operations would remain exactly the same. The only thing you would change is how you retrieve the data (from the existing procedures).
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 11, 2014 at 10:33 am
Thanks,
I'll try it, don't know if the software has "proper" SQL CRUD (columns specified in the INSERT) or "lazy" (INSERT INTO .. VALUES (..) without the columns name...).
If there's no problem with adding a new column I'll create the ParentAccountId.
Just got word from the accounting department that accounts can be:
11
112
112001
112002
...
children aren't necessarily just another number at the end of the parent...
If so the computed column formula isn't "simple"...
Pedro
March 11, 2014 at 11:02 am
What is the maximum account number length, practically speaking. Its a varchar(10), do you get account numbers that long?
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply