July 22, 2008 at 5:52 am
Christopher Stobbs (7/22/2008)
HI Jeff,I think he doesn't mind which is returned if they are all different.
I added an extra ordering to my rank query so that the first is returned but that is up to him really.
Thanks
Chris
Thanks for that, Chris, but we need the OP to tell us so we can be sure. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 6:29 am
I think this works but gosh it's pretty horrible...
DROP TABLE #Customers
CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))
INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)
SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46786, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'
SELECT d1.AccountNum, MIN(d1.Address) AS Address, d1.Copies AS CopyCount
FROM (SELECT AccountNum, Address, COUNT(*) AS Copies
FROM #Customers
GROUP BY AccountNum, Address
) d1
INNER JOIN (SELECT AccountNum, MAX(Copies) AS MaxCopies
FROM (SELECT AccountNum, Address, COUNT(*) AS Copies
FROM #Customers
GROUP BY AccountNum, Address
) d3 GROUP BY AccountNum
) d2 ON d2.AccountNum = d1.AccountNum AND d2.MaxCopies = d1.Copies
GROUP BY d1.AccountNum, d1.Copies
It would take another aggregated SELECT to bring in MAX or MIN DM_Source_Id.
Of course this is a SQL2k5 forum section...
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 22, 2008 at 7:17 am
Chris Morris (7/22/2008)
I think this works but gosh it's pretty horrible...It would take another aggregated SELECT to bring in MAX or MIN DM_Source_Id.
Of course this is a SQL2k5 forum section...
Thank you for setting up the test data, Chris... here's my take on it and it does bring out DM_Source_ID. I also broke out the two separate examples that Neal originally had by assigning them to a different AccountNum. Heh... ya just gotta love ROW_NUMBER with a descending sort in the OVER clause...
--===== Create a test table with some test data
-- This is NOT part of the solution.
CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))
INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)
SELECT 46785, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46785, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46785, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46785, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46785, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'
;WITH
cteRankedAddresses AS
(--==== Note the descending order in the OVER clause...
SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,
AccountNum, Address, Qty
FROM (--==== Get a count for each address by account number
SELECT AccountNum, Address, COUNT(*) AS Qty
FROM #Customers
GROUP BY AccountNum, Address)q
)
SELECT c.AccountNum, MAX(c.DM_Source_ID) AS MaxDM_Source_ID, c.Address
FROM #Customers c
INNER JOIN cteRankedAddresses ra
ON c.AccountNum = ra.AccountNum
AND c.Address = ra.Address
AND ra.MyRank = 1
GROUP BY c.AccountNum,c.Address
Neal... as a bit of a sidebar... the code above shows one of the better ways to post data. See the link in my signature line below for how and why. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 7:20 am
Top work, Jeff! It looks so much nicer in SQL2k5.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 22, 2008 at 7:30 am
Chris Morris (7/22/2008)
Top work, Jeff! It looks so much nicer in SQL2k5.
Top work yourself, Chris... it was a real bugger to pull off using 2k code, but you did it! 🙂 Nicely done!
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 7:32 am
HI Jeff,
I think I might have found another solumtion
[Code]
;WITH MyCTE AS
(
SELECT
RANK() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC)
+ ROW_NUMBER() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC) as [Rank]
,[AccountNum]
,[Address]
FROM #customers
GROUP BY [AccountNum],[Address] )
SELECT DISTINCT * FROM MyCTE WHERE [Rank] = 2
[/Code]
I add the Row number to the Rank in case everything is different.
I've run yours and mine aagainst a 1mil rows and it seems
Results
Mine:
SQL Server Execution Times:
CPU time = 4219 ms, elapsed time = 9937 ms.
Yours:
SQL Server Execution Times:
CPU time = 7156 ms, elapsed time = 15779 ms.
However I'm not sure what affect using the RANK and ROW_Number will have on a server?
Any idea?
thanks
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
July 22, 2008 at 7:42 am
Jeff Moden (7/21/2008)
Jack Corbett (7/21/2008)
I think this may do what you want (I created some dummy data):Jack... after you get done building it, change all the Addresses to 0 for where the DM_Source_ID = 1... see what happens... 🙂 I think the data you built is great but I also think it leads to an incorrect result.
Jeff,
I could make excuse, but stupidity is no excuse (I obviously needed to read up on Order By):D. Here is what I came up with to fix it.
[font="Courier New"]DECLARE @customers TABLE(AccountNum INT, DM_Source_Id INT, Address VARCHAR(10))
DECLARE @i INT,
@j INT
SET @i = 1
SET @j = 1
WHILE @i <= 10
BEGIN
WHILE @j <= 10
BEGIN
IF @i = 1
BEGIN
INSERT INTO @customers
SELECT
@i,
@j,
@j
END
ELSE
BEGIN
IF @j % 3 = 0
BEGIN
INSERT INTO @customers
SELECT
@i,
@j,
@j
END
ELSE
BEGIN
INSERT INTO @customers
SELECT
@i,
@j,
@i
END
END
SET @j = @j+ 1
END
SET @i = @i + 1
SET @j = 1
END
UPDATE @customers
SET Address = 0
WHERE
DM_Source_ID = 1
;WITH cteTopAddresses AS
(
SELECT
Row_Number() OVER(Partition BY AccountNum ORDER BY COUNT(*) DESC) AS row_Id,
AccountNum,
Address,
COUNT(*) AS AddressCount
FROM
@customers
GROUP BY
AccountNum,
Address
)
SELECT * FROM cteTopAddresses WHERE row_id = 1[/font]
Of course while I was working on this there were other solutions offered that are similar.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 22, 2008 at 7:50 am
Here's another one to add to the pile...
; WITH cteRankedAddresses AS (
SELECT AccountNum, MAX(DM_Source_ID) AS MaxDM_Source_ID, Address,
ROW_NUMBER() OVER (PARTITION BY AccountNum ORDER BY count(*) DESC, MAX(DM_Source_ID) DESC) AS MyRank
FROM #Customers GROUP BY AccountNum, Address)
SELECT AccountNum, MaxDM_Source_ID, Address FROM cteRankedAddresses WHERE MyRank = 1 ORDER BY AccountNum
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
July 22, 2008 at 8:02 am
Christopher Stobbs (7/22/2008)
I've run yours and mine aagainst a 1mil rows and it seemsResults
Mine:
SQL Server Execution Times:
CPU time = 4219 ms, elapsed time = 9937 ms.
Yours:
SQL Server Execution Times:
CPU time = 7156 ms, elapsed time = 15779 ms.
However I'm not sure what affect using the RANK and ROW_Number will have on a server?
Any idea?
The difference between your code and mine is that I need the extra code to return the DM_Source_ID. Without returning that, your extra ROW_NUMBER causes two extra steps in the execution plan... the execution plan says that they don't amount to a hill of beans, though, so no problems there.
Here's my code modified to only return what you've returned...
;WITH
cteRankedAddresses AS
(--==== Note the descending order in the OVER clause...
SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,
AccountNum, Address, Qty
FROM (--==== Get a count for each address by account number
SELECT AccountNum, Address, COUNT(*) AS Qty
FROM Customers
GROUP BY AccountNum, Address)q
)
SELECT * FROM cteRankedAddresses WHERE MyRank = 1
Lemme know how that does for performance against your test code.
A couple of favors please, Chris. Would you edit your post to break the long line at some mathematical operators so the line isn't so long? Those types of long lines make it so you have to pan right just to read a paragraph.
Also, would you mind posting your million row code generator? I'd like to test against the same data you are.
By the way, very nice shortcut in the CTE... I've gotta study that one. Looks like a good trick to know.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 8:06 am
HI All,
Wow this has been fun:
I have tested all the solutions with 1Mil rows of Jack's Data and here is what happened:
[Code]
DECLARE @customers TABLE(AccountNum INT, DM_Source_Id INT, Address VARCHAR(10))
DECLARE @i INT,
@j-2 INT
SET @i = 1
SET @j-2 = 1
WHILE @i <= 1000
BEGIN
WHILE @j-2 <= 1000
BEGIN
IF @i = 1
BEGIN
INSERT INTO @customers
SELECT
@i,
@j-2,
END
ELSE
BEGIN
IF @j-2 % 3 = 0
BEGIN
INSERT INTO @customers
SELECT
@i,
@j-2,
END
ELSE
BEGIN
INSERT INTO @customers
SELECT
@i,
@j-2,
@i
END
END
END
SET @i = @i + 1
SET @j-2 = 1
END
UPDATE @customers
SET Address = 0
WHERE
DM_Source_ID = 1
SELECT COUNT(*)
FROM @customers
SET STATISTICS TIME ON
PRINT '----------Chris-----------'
;WITH MyCTE AS
(
SELECT
RANK() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address]) DESC)
+ ROW_NUMBER() OVER (PARTITION BY [AccountNum] ORDER BY COUNT([Address])DESC ,MAX(DM_Source_ID) ASC) as [Rank]
,[AccountNum]
,[Address]
FROM @customers
GROUP BY [AccountNum],[Address] )
SELECT DISTINCT * FROM MyCTE WHERE [Rank] = 2
PRINT '--------------------------'
PRINT '---------Jeff-------------'
;WITH
cteRankedAddresses AS
(--==== Note the descending order in the OVER clause...
SELECT ROW_NUMBER() OVER (PARTITION BY q.AccountNum ORDER BY q.Qty DESC) AS MyRank,
AccountNum, Address, Qty
FROM (--==== Get a count for each address by account number
SELECT AccountNum, Address, COUNT(*) AS Qty
FROM @customers
GROUP BY AccountNum, Address)q
)
SELECT c.AccountNum, MAX(c.DM_Source_ID) AS MaxDM_Source_ID, c.Address
FROM @customers c
INNER JOIN cteRankedAddresses ra
ON c.AccountNum = ra.AccountNum
AND c.Address = ra.Address
AND ra.MyRank = 1
GROUP BY c.AccountNum,c.Address
ORDER BY AccountNum
PRINT '--------------------------'
PRINT '------------Ryan----------'
; WITH cteRankedAddresses2 AS (
SELECT AccountNum, MAX(DM_Source_ID) AS MaxDM_Source_ID, Address,
ROW_NUMBER() OVER (PARTITION BY AccountNum ORDER BY count(*) DESC, MAX(DM_Source_ID) DESC) AS MyRank
FROM @customers GROUP BY AccountNum, Address)
SELECT AccountNum, MaxDM_Source_ID, Address FROM cteRankedAddresses2 WHERE MyRank = 1 ORDER BY AccountNum
PRINT '--------------------------'
PRINT '-----_-----Jack-----------'
;WITH cteTopAddresses AS
(
SELECT
Row_Number() OVER(Partition BY AccountNum ORDER BY COUNT(*) DESC) AS row_Id,
AccountNum,
Address,
COUNT(*) AS AddressCount
FROM
@customers
GROUP BY
AccountNum,
Address
)
SELECT * FROM cteTopAddresses WHERE row_id = 1
PRINT '--------------------------'
SET STATISTICS TIME OFF
[/Code]
WELLDONE Jack:
Your is the fastest, here are all the results:
----------Chris-----------
SQL Server Execution Times:
CPU time = 6938 ms, elapsed time = 14458 ms.
--------------------------
---------Jeff-------------
SQL Server Execution Times:
CPU time = 8953 ms, elapsed time = 17680 ms.
--------------------------
------------Ryan----------
SQL Server Execution Times:
CPU time = 5672 ms, elapsed time = 11342 ms.
--------------------------
-----------Jack-----------
SQL Server Execution Times:
CPU time = 5484 ms, elapsed time = 9636 ms.
--------------------------
Just a last note, that Ryan your AccountNum one does not bring back the First DM_Source_ID address.
Otherwise thanks all for sharing I know I'm not the OP on this one but have learn't alot 🙂
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
July 22, 2008 at 8:06 am
Jack Corbett (7/22/2008)
I could make excuse, but stupidity is no excuse (I obviously needed to read up on Order By):D. Here is what I came up with to fix it.
Heh... No, not stupid, Jack... You create some data, write some code, get the answer you want,and think you're done. I don't know anyone who hasn't done a similar thing at one time or another. I sure do appreciate the feedback!
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 9:09 am
Christopher Stobbs (7/22/2008)
HI All,Wow this has been fun:
I have tested all the solutions with 1Mil rows of Jack's Data and here is what happened:
Not trying to take anything away form one, but if you really want to have some fun, make it so you're testing just apples... right now, you're testing apples and oranges... some of the code returns DM_Source_ID (like mine does and Ryan's do) and some doesn't (like Chris' and Jack's).
You'll also find that some of the returns are not correct...
--Jeff Moden
Change is inevitable... Change for the better is not.
July 22, 2008 at 9:14 am
Hey Chris, if you still have your test environment handy, would you mind comparing the SQL2k version against the 2k5 versions?
Cheers
ChrisM
DROP TABLE #Customers
CREATE TABLE #Customers (AccountNum int, DM_Source_Id int, Address varchar(60))
INSERT INTO #Customers (AccountNum, DM_Source_Id, Address)
SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 2, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46786, 5, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 1, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 2, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46786, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46786, 4, 'Varleys/NMcmullen Road/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 1, 'McMahn Road/N/n/nDarlington/nDurham' UNION ALL
SELECT 46787, 2, 'McDonald Road/N/46787 /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 3, 'Mcmullen Road/N /n/nDarlington/nDurham' UNION ALL
SELECT 46787, 4, 'St. Ives Road/N /n/nDarlington/nDurham'
SELECT c.AccountNum, MIN(c.DM_Source_Id) AS DM_Source_Id, c.Address
FROM #Customers c
INNER JOIN (SELECT d1.AccountNum, MIN(d1.Address) AS Address, d1.Copies AS CopyCount
FROM (SELECT AccountNum, Address, COUNT(*) AS Copies
FROM #Customers
GROUP BY AccountNum, Address
) d1
INNER JOIN (SELECT AccountNum, MAX(Copies) AS MaxCopies
FROM (SELECT AccountNum, Address, COUNT(*) AS Copies
FROM #Customers
GROUP BY AccountNum, Address
) d3 GROUP BY AccountNum
) d2 ON d2.AccountNum = d1.AccountNum AND d2.MaxCopies = d1.Copies
GROUP BY d1.AccountNum, d1.Copies
) d4 ON d4.AccountNum = c.AccountNum AND d4.Address = c.Address
GROUP BY c.AccountNum, c.Address
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 22, 2008 at 9:34 am
HI Chris,
I have run 2k version on 10000 rows and it takes 3000ms where the 2k5 versions run in sub 50ms
I tried to run it on a 1Mil but it took for ever.
Thanks
Chris
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
July 22, 2008 at 9:37 am
Cheers Chris - it's pretty much what you'd expect really.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply