March 11, 2014 at 11:50 am
Nevyn (3/11/2014)
What is the maximum account number length, practically speaking. Its a varchar(10), do you get account numbers that long?
20 chars is the maximum we have in our company.
But some companies have sub account number, internal numbers, that can be 30 or more.
March 11, 2014 at 12:07 pm
PiMané (3/11/2014)
Thanks,I'll try it, don't know if the software has "proper" SQL CRUD (columns specified in the INSERT) or "lazy" (INSERT INTO .. VALUES (..) without the columns name...).
If there's no problem with adding a new column I'll create the ParentAccountId.
Just got word from the accounting department that accounts can be:
11
112
112001
112002
...
children aren't necessarily just another number at the end of the parent...
If so the computed column formula isn't "simple"...
Pedro
Yeah that would be a definite gotcha if they don't specify the columns. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 11, 2014 at 8:02 pm
Yeah that would be a definite gotcha if they don't specify the columns. 😉
If that is a problem, you could do a indexed view on accounts with the calculation for parent, could you not? Some of the same tradeoffs as the trigger, but still ...
March 12, 2014 at 4:02 am
No, you can't have an indexed view that uses an JOIN with the same table twice.
I'd need
SELECT a1.accountId, a2.accountId parentAccountId FROM dbo.accounts a1
INNER JOIN dbo.accounts a2 ON a1.accountId LIKE a2.accountId + '%' AND a1.accountId <> a2.accountId
and you do this on indexed views... just like you can't have OUTTER JOINs and other limitations..
Pedro
March 12, 2014 at 7:16 am
That wasn't what I was suggesting.
What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?
You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?
EDIT: ok, just saw your note above
So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?
March 12, 2014 at 8:22 am
Nevyn (3/12/2014)
That wasn't what I was suggesting.What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?
You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?
EDIT: ok, just saw your note above
So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?
Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.
It's much faster than the LEFT(..) used currently and I've already changed it...
Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..
With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...
Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..
Pedro
March 12, 2014 at 7:12 pm
PiMané (3/12/2014)
Nevyn (3/12/2014)
That wasn't what I was suggesting.What were you going to put in the calculated Parent ID column for Sean's solution? Surely you weren't going to refer to other rows for that?
You should be able to do the same in an indexed view. Isn't parent account id just a substring of account id minus the last character? Or do levels get skipped?
EDIT: ok, just saw your note above
So the problem is levels can be skipped, so you can tell if 2 accounts are related when looking at both records, but you can't tell a child's direct parent by substringing. Is that right?
Yep.. the way to determine the parent account is the query I wrote above, with the LIKE operator.
It's much faster than the LEFT(..) used currently and I've already changed it...
Right now the only thing I'm pondering is to create the trigger and the relation table to have the parents directly..
With 100 accounts the LIKE if faster than using the table (all the accounts are in memory and there's no need to read from another table), but with 200.000 accounts the relations table is faster, at least twice as fast...
Any way, with 200.000 accounts the LIKE is 60x faster than the LEFT..
Pedro
But, you're still calculating something that might not change. If there's a way to determine the parent and child values for the account number and store them as persisted computed columns, THEN we could open up a whole new world of hierarchical ease and performance for you.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2014 at 9:26 pm
PiMané (3/10/2014)
Hi,The LIKE operator is also very slow with 3.000.000 records... It's even slower than the LEFT in the tests I made.
Right now I'm trying to do an INNER JOIN between accounts with itself to simulate the hierarchical fields, since accounts has less records than accountMovs (1:1000 ratio or higher).
SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
FROM accounts t0
INNER JOIN accounts tr ON t0.accountId = LEFT(tr.accountId, LEN(t0.accountId))
LEFT JOIN accountMovs t1 ON tr.accountId = t1.accountId
GROUP BY t0.accountId
It's much faster but not "perfect".... Since I can't change these tables structures due to the software I'm thinking of eventually adding a trigger on accounts to insert a record on a new table (accountRelations) that has the hierarchical relation and the use that new table instead of the accounts join with itself...
Pedro
Can I assume from the above query that all accounts in accountMovs have a record in accounts? It was not that way in your sample data, but this query implies that's what its like.
Here's what I've come up with. Its a bother made out of stubbornness and a tad silly, and the trigger solution will probably make a faster query. Just a matter of how badly you want to avoid a trigger. I'd be curious whether it beats the like for your whole rowset.
--First, your tables
CREATE TABLE accounts (accountId VARCHAR(10))
CREATE TABLE accountMovs (accountId VARCHAR(10), debit INT, credit INT)
GO
--Next, insert your data ... with a small change
INSERT INTO dbo.accounts VALUES ('11')
INSERT INTO dbo.accounts VALUES ('111')
INSERT INTO dbo.accounts VALUES ('112')
INSERT INTO dbo.accounts VALUES ('12')
INSERT INTO dbo.accounts VALUES ('121')
INSERT INTO dbo.accounts VALUES ('1121')
INSERT INTO accountMovs VALUES ('11', 10, 0)
INSERT INTO accountMovs VALUES ('111', 10, 20)
INSERT INTO accountMovs VALUES ('112', 10, 10)
INSERT INTO accountMovs VALUES ('1121', 0, 10)
INSERT INTO accountMovs VALUES ('1121', 5, 5)
INSERT INTO accountMovs VALUES ('12', 0, 10)
INSERT INTO accountMovs VALUES ('121', 10, 10)
GO
--Now, we need a 20ish row tally table. Can't be a CTE or derived table.
CREATE TABLE [dbo].[PossibleLevels](
[id] [int] NOT NULL
) ON [PRIMARY]
GO
--Load it
INSERT INTO PossibleLevels (id)
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9 UNION ALL
SELECT 10 UNION ALL
SELECT 11 UNION ALL
SELECT 12 UNION ALL
SELECT 13 UNION ALL
SELECT 14 UNION ALL
SELECT 15 UNION ALL
SELECT 16 UNION ALL
SELECT 17 UNION ALL
SELECT 18 UNION ALL
SELECT 19 UNION ALL
SELECT 20
-- Now, the view. It includes every possible parent account and itself. So, for 200k accounts, it could be up to 4million records.
CREATE VIEW PossibleParents
WITH SCHEMABINDING
AS
SELECT accountId,SUBSTRING(accountId,1,p.id) PossibleParentId
FROM dbo.accounts a
JOIN dbo.[PossibleLevels] p
ON LEN(a.accountId) >= p.id
-- and the index on the view
CREATE UNIQUE CLUSTERED INDEX ParentIndex
ON dbo.PossibleParents (PossibleParentId,accountId)
-- and finally the query
SELECT a.accountId, SUM(am.debit), SUM(am.credit)
FROM accounts a
JOIN PossibleParents p
ON a.accountId = p.PossibleParentId
LEFT JOIN accountMovs am
ON am.accountId = p.accountId
GROUP BY a.accountId
Let me know what you think. Don't know if there is a way to modify it for Jeff's magic. You need the join back to the table to get the true hierarchy, and if doing Jeff's stuff you'd make the 'ON LEN(a.accountId) >= p.id' into 'ON LEN(a.accountId) > p.id' to avoid the self match.
March 13, 2014 at 7:17 pm
This takes 34 seconds to run against 3.5M account moves.
CREATE TABLE #accounts (accountId VARCHAR(10))
CREATE TABLE #accountMovs (accountId VARCHAR(10), debit INT, credit INT)
GO
INSERT INTO #accounts VALUES ('11')
INSERT INTO #accounts VALUES ('111')
INSERT INTO #accounts VALUES ('112')
INSERT INTO #accounts VALUES ('12')
INSERT INTO #accounts VALUES ('121')
INSERT INTO #accountMovs VALUES ('11', 10, 0)
INSERT INTO #accountMovs VALUES ('111', 10, 20)
INSERT INTO #accountMovs VALUES ('112', 10, 10)
INSERT INTO #accountMovs VALUES ('1121', 0, 10)
INSERT INTO #accountMovs VALUES ('1121', 5, 5)
INSERT INTO #accountMovs VALUES ('12', 0, 10)
INSERT INTO #accountMovs VALUES ('121', 10, 10)
GO
--SELECT t0.accountId, SUM(t1.debit), SUM(t1.credit)
--FROM #accounts t0
--LEFT JOIN #accountMovs t1 ON t0.accountId = SUBSTRING(t1.accountId, 1, LEN(t0.accountId))
--GROUP BY t0.accountId
-- Blow row set out to 3.5M rows
WITH Tally(n) AS
(
SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #accountMovs
SELECT accountID, debit, credit
FROM #accountMovs
CROSS JOIN Tally;
SELECT accountId=acct, SUM(debit), SUM(credit)
FROM #accountMovs a
CROSS APPLY
(
VALUES(CASE WHEN LEN(AccountID) > 2 THEN LEFT(AccountID, 2) END),(AccountID)
) b (acct)
JOIN #accounts c ON c.accountId = b.acct
WHERE acct IS NOT NULL
GROUP BY acct;
GO
DROP TABLE #accounts;
DROP TABLE #accountMovs;
Edit: That timing includes the Tally/data setup to 3.5M rows and it is on my laptop. I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
March 14, 2014 at 5:48 pm
dwain.c (3/13/2014)
I agree that an INDEXed, computed PERSISTED column is the way to go if you can change the table structure.
If you can't change the table structure, build a "Sister Table". 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply