Sequential numbering issue

  • Please can someone please give me some assistance.

    I have a table with a document number and line number in. I require these to be unique however from the legacy system there are duplicate lines - ie the same document number and line number.

    For example

    Document Number Line No

    ZC12345 1

    ZC12345 2

    ZC12345 2

    ZC12346 1

    ZC12347 1

    ZC12348 1

    ZC12348 1

    I need to recode the lineno field so it is unique

    Document Number Line No

    ZC12345 1

    ZC12345 2

    ZC12345 3

    ZC12346 1

    ZC12347 1

    ZC12348 1

    ZC12348 2

    Any help in ways I can do this - I am going round in circles.

  • --===== Create a test table with data

    -- THIS IS NOT PART OF THE SOLUTION

    DECLARE @YourTable TABLE (DocNo VARCHAR(20),LiNum INT)

    INSERT INTO @YourTable (DocNo,LiNum)

    SELECT 'ZC12345',1 UNION ALL

    SELECT 'ZC12345',2 UNION ALL

    SELECT 'ZC12345',2 UNION ALL

    SELECT 'ZC12346',1 UNION ALL

    SELECT 'ZC12347',1 UNION ALL

    SELECT 'ZC12348',1 UNION ALL

    SELECT 'ZC12348',1

    --===== Solve the "Grouped Count" or "Ordinal Ranking" Problem

    SELECT y.DocNo,t.Number

    FROM (SELECT DocNo, COUNT(*) AS N FROM @YourTable GROUP BY DocNo) y,

    Master.dbo.spt_Values t

    WHERE t.Number BETWEEN 1 AND y.N

    AND t.Type = 'P'

    If you have line numbers greater than 255, you will need to build a Tally table instead of using Master.dbo.spt_Values.

    Also, the above code uses a limited cross join known as a Triangular Join. While it looks perfect for this type of application and can be quite fast in this particular instance, there is actually an even faster method... see "Grouped Running Count" in the following article...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks alot - this did the job

  • Hassie (2/17/2008)


    Thanks alot - this did the job

    You bet... Thanks for the feedback.

    Just remember...if the count of rows for each group starts to grow, the performance will suffer greatly very quickly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is off-topic but does anyone have time to test the use of CROSS APPLY for SQL Server 2005?

    -- 2000

    SELECTy.DocNo,

    t.Number

    FROM(

    SELECTDocNo,

    COUNT(*) AS N

    FROM@YourTable

    GROUP BYDocNo

    ) AS y

    INNER JOINMaster.dbo.spt_Values AS t ON t.Type = 'P'

    WHEREt.Number BETWEEN 1 AND y.N

    -- 2005

    SELECTy.DocNo,

    t.Number

    FROM(

    SELECTDocNo,

    COUNT(*) AS N

    FROM@YourTable

    GROUP BYDocNo

    ) AS y

    CROSS APPLY(

    SELECTNumber

    FROMMaster.dbo.spt_Values

    WHEREType = 'p'

    AND Number BETWEEN 1 AND y.N

    ) AS t


    N 56°04'39.16"
    E 12°55'05.25"

  • I had to make a bigger table to get some timing stats.... also means I had to convert both scripts to use a Tally table and make a large enough Tally table to handle the load. I also threw in the rownumber solution.... ranking solution didn't work 'cause not enough info to partition by... here's the code.

    --===== Nice safe place to test

    USE TempDB

    --===== Presets

    SET NOCOUNT ON

    --===== Create and populate the Tally table on the fly (2k5 version)

    SELECT TOP 200000

    N = IDENTITY(INT,1,1)

    INTO dbo.Tally

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    --===== Add the necessary Clustered PK for blinding speed

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

    --===== Allow everyone to use the table

    GRANT SELECT ON dbo.Tally TO PUBLIC

    --===== Create a test table with data

    -- THIS IS NOT PART OF THE SOLUTION

    DECLARE @YourTable TABLE (DocNo VARCHAR(20),LiNum INT)

    INSERT INTO @YourTable (DocNo,LiNum)

    SELECT 'ZC12345',1 UNION ALL

    SELECT 'ZC12345',2 UNION ALL

    SELECT 'ZC12345',2 UNION ALL

    SELECT 'ZC12346',1 UNION ALL

    SELECT 'ZC12347',1 UNION ALL

    SELECT 'ZC12348',1 UNION ALL

    SELECT 'ZC12348',1

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    INSERT INTO @YourTable SELECT * FROM @YourTable

    SET NOCOUNT OFF

    -- 2000

    PRINT REPLICATE('=',78)

    PRINT 'Join Solution'

    PRINT REPLICATE('=',78)

    SET STATISTICS TIME ON

    SELECT y.DocNo, t.N

    FROM (SELECT DocNo, COUNT(*) AS N

    FROM @YourTable

    GROUP BY DocNo

    ) AS y

    INNER JOIN dbo.Tally t

    ON t.N BETWEEN 1 AND y.N

    SET STATISTICS TIME OFF

    -- 2005

    PRINT REPLICATE('=',78)

    PRINT 'Cross Apply Solution'

    PRINT REPLICATE('=',78)

    SET STATISTICS TIME ON

    SELECT y.DocNo, t.N

    FROM (SELECT DocNo, COUNT(*) AS N

    FROM @YourTable

    GROUP BY DocNo

    ) AS y

    CROSS APPLY

    (SELECTN

    FROM Tally

    WHERE N BETWEEN 1 AND y.N

    ) AS t

    SET STATISTICS TIME OFF

    --===== Rownumber

    PRINT REPLICATE('=',78)

    PRINT 'Join Solution'

    PRINT REPLICATE('=',78)

    SET STATISTICS TIME ON

    SELECT y.DocNo,

    Row_Number() OVER (PARTITION BY y.DocNo ORDER BY y.DocNo)

    FROM @YourTable y

    SET STATISTICS TIME OFF

    DROP TABLE dbo.Tally

    ... and here's the results...

    ==============================================================================

    Join Solution

    ==============================================================================

    (229376 row(s) affected)

    SQL Server Execution Times:

    CPU time = 484 ms, elapsed time = 6142 ms.

    ==============================================================================

    Cross Apply Solution

    ==============================================================================

    (229376 row(s) affected)

    SQL Server Execution Times:

    CPU time = 593 ms, elapsed time = 7502 ms.

    ==============================================================================

    Join Solution

    ==============================================================================

    (229376 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1907 ms, elapsed time = 7714 ms.

    Observation... Tally table solution rules either way and although the Join method did beat out the Cross Apply method by ~20%. Row_Number solution took more than 3 times longer for CPU time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/15/2008)


    If you have line numbers greater than 255, you will need to build a Tally table instead of using Master.dbo.spt_Values.

    Also, the above code uses a limited cross join known as a Triangular Join. While it looks perfect for this type of application and can be quite fast in this particular instance, there is actually an even faster method... see "Grouped Running Count" in the following article...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    Jeff,

    Not sure if you know this or not, but in learning what the spt_values table was, I discovered that in SS2005 the maximum number is now 2047. I thought I would mention that in case someone wanted to do this in 2005.

    In this case, is there a certain number where a tally table would be the better way to go?

    Thanks,

    Ian.

    "If you are going through hell, keep going."
    -- Winston Churchill

  • Yep... I'm aware of that, but thanks for making sure, Ian.

    So far as using spt_Values as a source of numbers in production, I NEVER do it... I always use a Tally table even if I have to build one on the fly. Reason being is that I almost always need at least 8k numbers for a split or something similar. And, a super narrow, single column Tally table with a clustered index will be faster even if only by a smidge especially since I use a fill factor of 100 on the Tally table. It's a very compacted table compared to most others. Tally table of 9,000 or 14,000 (more than 30 years worth of days), or 19,000 (more than 50 years worth of days) has 0 unused bytes reported by sp_spaceused. That's a tight little table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry... just as a courtesy, I should post the code for making a Tally table, huh?

    --===== Create and populate the Tally table on the fly

    SELECT TOP 19000 --equates to more than 50 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What the heck is it that you're showing in the last test? a repeating pattern with 2.29M rows affected? It's a low caffeine part of the day - I think I'm missing the purpose...

    For better or worse - the ROW_NUMBER() gets hurt due to the lack of an index due to the ORDER BY. WITH a non-clustered index - it's the fastest solution. (Realizing you'd need to use a temp table instead of a tablevar for that).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... nah... it's only 0.229 M rows... same as the other two tests...

    I did mislabel it though... probably due to low Caffeine with a Niquil kicker, as well... the last test should have been labeled as the "Row_number OVER" test instead of another join test.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2008)


    Yep... I'm aware of that, but thanks for making sure, Ian.

    So far as using spt_Values as a source of numbers in production, I NEVER do it... I always use a Tally table even if I have to build one on the fly. Reason being is that I almost always need at least 8k numbers for a split or something similar. And, a super narrow, single column Tally table with a clustered index will be faster even if only by a smidge especially since I use a fill factor of 100 on the Tally table. It's a very compacted table compared to most others. Tally table of 9,000 or 14,000 (more than 30 years worth of days), or 19,000 (more than 50 years worth of days) has 0 unused bytes reported by sp_spaceused. That's a tight little table.

    Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).

    Ian.

    "If you are going through hell, keep going."
    -- Winston Churchill

  • Ian Crandell (2/28/2008)


    Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).

    Actually, I've gotten lazy... spt_Values is available on every machine and a Tally table might not... instead of going through the usual speal of "First, ya gotta create a Tally table..." and the normal explanation I go through, I just post code...

    ... I should get out of that lazy habit and make a "boiler plate" for the Tally table code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/28/2008)


    Ian Crandell (2/28/2008)


    Thanks for the info. I have seen you use or refer to the tally table in many of your posts and now that I read this I am a little surprised you used spt_values in your post. I think I understand that you would use it for a one-time fix (and that might be rare depending on the number of rows).

    Actually, I've gotten lazy... spt_Values is available on every machine and a Tally table might not... instead of going through the usual speal of "First, ya gotta create a Tally table..." and the normal explanation I go through, I just post code...

    ... I should get out of that lazy habit and make a "boiler plate" for the Tally table code.

    Maybe make it part of your signature? Crude but effective.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (2/28/2008)


    Maybe make it part of your signature? Crude but effective.....

    I am surprised he hasn't included the link to his article on Forum Etiquette in his signature block.;)

    Ian.

    "If you are going through hell, keep going."
    -- Winston Churchill

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply