February 15, 2008 at 11:19 am
Please can someone please give me some assistance.
I have a table with a document number and line number in. I require these to be unique however from the legacy system there are duplicate lines - ie the same document number and line number.
For example
Document Number Line No
ZC12345 1
ZC12345 2
ZC12345 2
ZC12346 1
ZC12347 1
ZC12348 1
ZC12348 1
I need to recode the lineno field so it is unique
Document Number Line No
ZC12345 1
ZC12345 2
ZC12345 3
ZC12346 1
ZC12347 1
ZC12348 1
ZC12348 2
Any help in ways I can do this - I am going round in circles.
February 15, 2008 at 12:24 pm
--===== Create a test table with data
-- THIS IS NOT PART OF THE SOLUTION
DECLARE @YourTable TABLE (DocNo VARCHAR(20),LiNum INT)
INSERT INTO @YourTable (DocNo,LiNum)
SELECT 'ZC12345',1 UNION ALL
SELECT 'ZC12345',2 UNION ALL
SELECT 'ZC12345',2 UNION ALL
SELECT 'ZC12346',1 UNION ALL
SELECT 'ZC12347',1 UNION ALL
SELECT 'ZC12348',1 UNION ALL
SELECT 'ZC12348',1
--===== Solve the "Grouped Count" or "Ordinal Ranking" Problem
SELECT y.DocNo,t.Number
FROM (SELECT DocNo, COUNT(*) AS N FROM @YourTable GROUP BY DocNo) y,
Master.dbo.spt_Values t
WHERE t.Number BETWEEN 1 AND y.N
AND t.Type = 'P'
If you have line numbers greater than 255, you will need to build a Tally table instead of using Master.dbo.spt_Values.
Also, the above code uses a limited cross join known as a Triangular Join. While it looks perfect for this type of application and can be quite fast in this particular instance, there is actually an even faster method... see "Grouped Running Count" in the following article...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2008 at 1:48 am
Thanks alot - this did the job
February 17, 2008 at 8:32 am
Hassie (2/17/2008)
Thanks alot - this did the job
You bet... Thanks for the feedback.
Just remember...if the count of rows for each group starts to grow, the performance will suffer greatly very quickly.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2008 at 2:40 pm
This is off-topic but does anyone have time to test the use of CROSS APPLY for SQL Server 2005?
-- 2000
SELECTy.DocNo,
t.Number
FROM(
SELECTDocNo,
COUNT(*) AS N
FROM@YourTable
GROUP BYDocNo
) AS y
INNER JOINMaster.dbo.spt_Values AS t ON t.Type = 'P'
WHEREt.Number BETWEEN 1 AND y.N
-- 2005
SELECTy.DocNo,
t.Number
FROM(
SELECTDocNo,
COUNT(*) AS N
FROM@YourTable
GROUP BYDocNo
) AS y
CROSS APPLY(
SELECTNumber
FROMMaster.dbo.spt_Values
WHEREType = 'p'
AND Number BETWEEN 1 AND y.N
) AS t
N 56°04'39.16"
E 12°55'05.25"
February 17, 2008 at 6:20 pm
I had to make a bigger table to get some timing stats.... also means I had to convert both scripts to use a Tally table and make a large enough Tally table to handle the load. I also threw in the rownumber solution.... ranking solution didn't work 'cause not enough info to partition by... here's the code.
--===== Nice safe place to test
USE TempDB
--===== Presets
SET NOCOUNT ON
--===== Create and populate the Tally table on the fly (2k5 version)
SELECT TOP 200000
N = IDENTITY(INT,1,1)
INTO dbo.Tally
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
--===== Add the necessary Clustered PK for blinding speed
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow everyone to use the table
GRANT SELECT ON dbo.Tally TO PUBLIC
--===== Create a test table with data
-- THIS IS NOT PART OF THE SOLUTION
DECLARE @YourTable TABLE (DocNo VARCHAR(20),LiNum INT)
INSERT INTO @YourTable (DocNo,LiNum)
SELECT 'ZC12345',1 UNION ALL
SELECT 'ZC12345',2 UNION ALL
SELECT 'ZC12345',2 UNION ALL
SELECT 'ZC12346',1 UNION ALL
SELECT 'ZC12347',1 UNION ALL
SELECT 'ZC12348',1 UNION ALL
SELECT 'ZC12348',1
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
INSERT INTO @YourTable SELECT * FROM @YourTable
SET NOCOUNT OFF
-- 2000
PRINT REPLICATE('=',78)
PRINT 'Join Solution'
PRINT REPLICATE('=',78)
SET STATISTICS TIME ON
SELECT y.DocNo, t.N
FROM (SELECT DocNo, COUNT(*) AS N
FROM @YourTable
GROUP BY DocNo
) AS y
INNER JOIN dbo.Tally t
ON t.N BETWEEN 1 AND y.N
SET STATISTICS TIME OFF
-- 2005
PRINT REPLICATE('=',78)
PRINT 'Cross Apply Solution'
PRINT REPLICATE('=',78)
SET STATISTICS TIME ON
SELECT y.DocNo, t.N
FROM (SELECT DocNo, COUNT(*) AS N
FROM @YourTable
GROUP BY DocNo
) AS y
CROSS APPLY
(SELECTN
FROM Tally
WHERE N BETWEEN 1 AND y.N
) AS t
SET STATISTICS TIME OFF
--===== Rownumber
PRINT REPLICATE('=',78)
PRINT 'Join Solution'
PRINT REPLICATE('=',78)
SET STATISTICS TIME ON
SELECT y.DocNo,
Row_Number() OVER (PARTITION BY y.DocNo ORDER BY y.DocNo)
FROM @YourTable y
SET STATISTICS TIME OFF
DROP TABLE dbo.Tally
... and here's the results...
==============================================================================
Join Solution
==============================================================================
(229376 row(s) affected)
SQL Server Execution Times:
CPU time = 484 ms, elapsed time = 6142 ms.
==============================================================================
Cross Apply Solution
==============================================================================
(229376 row(s) affected)
SQL Server Execution Times:
CPU time = 593 ms, elapsed time = 7502 ms.
==============================================================================
Join Solution
==============================================================================
(229376 row(s) affected)
SQL Server Execution Times:
CPU time = 1907 ms, elapsed time = 7714 ms.
Observation... Tally table solution rules either way and although the Join method did beat out the Cross Apply method by ~20%. Row_Number solution took more than 3 times longer for CPU time.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2008 at 3:02 pm
Jeff Moden (2/15/2008)
If you have line numbers greater than 255, you will need to build a Tally table instead of using Master.dbo.spt_Values.Also, the above code uses a limited cross join known as a Triangular Join. While it looks perfect for this type of application and can be quite fast in this particular instance, there is actually an even faster method... see "Grouped Running Count" in the following article...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Jeff,
Not sure if you know this or not, but in learning what the spt_values table was, I discovered that in SS2005 the maximum number is now 2047. I thought I would mention that in case someone wanted to do this in 2005.
In this case, is there a certain number where a tally table would be the better way to go?
Thanks,
Ian.
"If you are going through hell, keep going."
-- Winston Churchill
February 27, 2008 at 3:27 pm
Yep... I'm aware of that, but thanks for making sure, Ian.
So far as using spt_Values as a source of numbers in production, I NEVER do it... I always use a Tally table even if I have to build one on the fly. Reason being is that I almost always need at least 8k numbers for a split or something similar. And, a super narrow, single column Tally table with a clustered index will be faster even if only by a smidge especially since I use a fill factor of 100 on the Tally table. It's a very compacted table compared to most others. Tally table of 9,000 or 14,000 (more than 30 years worth of days), or 19,000 (more than 50 years worth of days) has 0 unused bytes reported by sp_spaceused. That's a tight little table.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2008 at 3:31 pm
Sorry... just as a courtesy, I should post the code for making a Tally table, huh?
--===== Create and populate the Tally table on the fly
SELECT TOP 19000 --equates to more than 50 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2008 at 4:17 pm
What the heck is it that you're showing in the last test? a repeating pattern with 2.29M rows affected? It's a low caffeine part of the day - I think I'm missing the purpose...
For better or worse - the ROW_NUMBER() gets hurt due to the lack of an index due to the ORDER BY. WITH a non-clustered index - it's the fastest solution. (Realizing you'd need to use a temp table instead of a tablevar for that).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 27, 2008 at 5:40 pm
Heh... nah... it's only 0.229 M rows... same as the other two tests...
I did mislabel it though... probably due to low Caffeine with a Niquil kicker, as well... the last test should have been labeled as the "Row_number OVER" test instead of another join test.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2008 at 8:52 am
Jeff Moden (2/27/2008)
Yep... I'm aware of that, but thanks for making sure, Ian.So far as using spt_Values as a source of numbers in production, I NEVER do it... I always use a Tally table even if I have to build one on the fly. Reason being is that I almost always need at least 8k numbers for a split or something similar. And, a super narrow, single column Tally table with a clustered index will be faster even if only by a smidge especially since I use a fill factor of 100 on the Tally table. It's a very compacted table compared to most others. Tally table of 9,000 or 14,000 (more than 30 years worth of days), or 19,000 (more than 50 years worth of days) has 0 unused bytes reported by sp_spaceused. That's a tight little table.
Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).
Ian.
"If you are going through hell, keep going."
-- Winston Churchill
February 28, 2008 at 9:01 am
Ian Crandell (2/28/2008)
Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).
Actually, I've gotten lazy... spt_Values is available on every machine and a Tally table might not... instead of going through the usual speal of "First, ya gotta create a Tally table..." and the normal explanation I go through, I just post code...
... I should get out of that lazy habit and make a "boiler plate" for the Tally table code.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 28, 2008 at 9:02 am
Jeff Moden (2/28/2008)
Ian Crandell (2/28/2008)
Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).Actually, I've gotten lazy... spt_Values is available on every machine and a Tally table might not... instead of going through the usual speal of "First, ya gotta create a Tally table..." and the normal explanation I go through, I just post code...
... I should get out of that lazy habit and make a "boiler plate" for the Tally table code.
Maybe make it part of your signature? Crude but effective.....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 28, 2008 at 9:12 am
Matt Miller (2/28/2008)
Maybe make it part of your signature? Crude but effective.....
I am surprised he hasn't included the link to his article on Forum Etiquette in his signature block.;)
Ian.
"If you are going through hell, keep going."
-- Winston Churchill
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply