Inner and Outer Loop to add count to existing table data

  • It seems the more I get into SQL the more it appears I am also getting into Programming! Being "old school", I tend to be more "linear" in my thinking, doing things "one step at a time", which is why I was never any good at more abstract languages such as C++ or Java!

    We recently discovered a major bug in an accounting system due to improper indexes/keys set on a table, thankfully way before my time here. Basically, the table can have an customer invoice with multiple Items and charges (or credits) per invoice. There are cases when a ItemNumber, charge and charge code can be duplicated when it is posted with additional items, i.e,

    Invoice1 ItemNo1 $21 ChargeType: MainItem

    Invoice1 ItemNo1 $1.5 ChargeType: Tax

    Invoice1 ItemNo1 $3.0 ChargeType: Delivery

    ... (further in the file with weekly invoices)...

    Invoice1 ItemNo1 $21 ChargeType: MainItem

    Invoice1 ItemNo1 $1.5 ChargeType: Tax

    Invoice1 ItemNo1 $4.0 ChargeType: Fuel

    Invoice1 ItemNo1 -$3.0 ChargeType: Delivery

    ...

    Unfortunately, since an outside system creates the invoice file we have no control over the content coming to us, just the ability

    My task is to try and put a unique numbered sequence for all of the above items in the existing table. So, for the example above, I will need 1 through 7 for Invoice1 ItemNo1 entries in the table. There are no other fields that will make each record "unique".

    My thought was to loop by doing the following:

    1) Create a new table, dbo.Inv_NewCustCharge, with the exact same structure but with a field INT IDENTITY(1,1) called "primary_key". I could then take all of the existing records and number each field 1 through the end of the data.

    2) Get a list of all of the Invoice(s) and ItemNo, along with a count of the number of existing item(s) in the database (#temptable).

    3) Have an outer loop that gets the Invoice, ItemNo and count.

    4) Have an inner loop that will number each record based on the "count" of ItemNo(s), then insert that into the table.

    5) Delete the "primary_Key" field after all of the items in (3) and (4) are numbered.

    New enteries into the table would get their numbering from the application that enters the data into the database, so I just need to deal with the existing data. Here was my solution:

    SET NOCOUNT ON

    DECLARE @Statement NVARCHAR(4000),

    @MyItemNumber NVARCHAR(128),

    @MyInvoice NVARCHAR(128),

    @MyCount NVARCHAR(128),

    @DataCntr INT,

    @LoopCntr INT,

    @RowCounter INT,

    @MinRow BIGINT,

    @MaxRow BIGINT;

    IF OBJECT_ID(N'tempdb..#temptable', N'U') IS NOT NULL

    DROP TABLE #temptable;

    CREATE TABLE #temptable (primary_key INT IDENTITY(1,1) NOT NULL,

    CntRecords int NOT NULL, ItemNo varchar(20) NOT NULL, InvoiceNo char(9) NOT NULL);

    INSERT INTO #temptable

    select count(*) as Count, ItemNo, InvoiceNo

    from dbo.Inv_CustCharge

    GROUP BY ItemNo, InvoiceNo

    -- Sent Counters for looping purposes

    SET @DataCntr = ISNULL((Select count(*) from #temptable),0)

    SET @LoopCntr = 1

    WHILE @DataCntr > 0 AND @LoopCntr <= @DataCntr

    BEGIN -- WHILE LOOP

    SELECT @MyCount = CntRecords, @MyItemNumber = TrackingNo, @MyInvoice = InvoiceNo

    FROM #temptable

    WHERE primary_key = @LoopCntr

    SELECT @MinRow = MIN(primary_key), @MaxRow = MAX(primary_key)

    FROM dbo.Inv_NewCustCharge--#Temp_CustCharge

    WHERE ItemNo = @MyItemNumber AND InvoiceNo = @MyInvoice;

    SET @RowCounter = 1

    WHILE @RowCounter <= @MyCount AND @MinRow <= @MaxRow

    BEGIN

    UPDATE dbo.Inv_NewCustCharge

    SET RowNumber = @RowCounter

    FROM dbo.Inv_NewCustCharge --#Temp_CustCharge

    WHERE ItemNo = @MyItemNumber AND InvoiceNo = @MyInvoice

    AND Primary_Key = @MinRow;

    SET @RowCounter = @RowCounter + 1;

    SET @MinRow = @MinRow + 1

    END -- Inner While Loop

    SET @LoopCntr = @LoopCntr + 1;

    END -- WHILE LOOP

    DROP TABLE #temptable;

    The biggest issue is the time it takes to perform the work. For 116K records, it took 2 hours 34 minutes to "re-number" the table. Some databases can have over 1 million records in the table for a specific customer!

    I was hoping someone might have some good "out of the box" ideas to try and perform that above in a much faster fashion. My boss seems to think I can get the above process down to a 5-10 minute process... personally, I think he is off his rocker!

    Thanks!

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Thinking about it some more, we came up with a solution that might be a bit faster... instead of doing an update for ever record (processing row by row may be necessary but is NEVER good! See the "gots" list courtesy of Jeff Moden!), we created a new table using the primary key and row counter,

    CREATE TABLE #Temp_NewChargeCount (Primary_key BIGINT NOT NULL, NewCount INT NOT NULL);

    We then modified the code to do an insert,

    INSERT INTO #Temp_NewChargeCount

    VALUES(@MinRow, @RowCounter);

    SET @RowCounter = @RowCounter + 1;

    SET @MinRow = @MinRow + 1

    We join the #Temp_NewChargeCount with the table having the Primary_key field and insert the row numer and original records into a new table.

    It now takes 35 minutes 34 seconds instead of 2 hours 34 minutes.... a definite improvement!

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • You might want to take a look at the Row_Number, Rank and Dense_Rank functions in Books Online. I'm not sure, but they seem to fit what you're looking for here.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Applying the Rownumber process to your data for example:

    ;with numbered as(SELECT rowno=row_number()

    over (partition by Invoice, Item, Type order by Invoice),

    Invoice,Item, Charge, type from Sales)

    select * from numbered

    Will give you these results:

    rownoInvoiceItemChargetype

    1113.00Delivery

    211-3.00Delivery

    1114.00Fuel

    11121.00MainItem

    21121.00MainItem

    1111.50Tax

    2111.50Tax

    Remvoing the "Item" from the above code will result in

    ;with numbered as(SELECT rowno=row_number()

    over (partition by Invoice, Item order by Invoice),

    Invoice,Item, Charge, type from Sales)

    select * from numbered

    Producing:

    rownoInvoiceItemChargetype

    11121.00MainItem

    2111.50Tax

    3113.00Delivery

    41121.00MainItem

    5111.50Tax

    6114.00Fuel

    711-3.00Delivery

    81121.00MainItem

    9111.50Tax

    10113.00Delivery

    111121.00MainItem

    12111.50Tax

    13114.00Fuel

    1411-3.00Delivery

    Hopefully this will be of some use to you

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Oh... oh.... now why did I not think about that! I use the Row_Number() function in other SQL statements... never thought about it in this case. Another case of posting here yeilding excellent results! 🙂 🙂

    I used SELECT ROW_NUMBER() OVER(PARTITION BY ItemNo ORDER BY InvoiceNo) and compared the results with my programmatic solution.. identical counts.

    The speed? 1 second. WOOO HOOO!

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • The cool thing about this is that your boss was right, 😉

    Here's a solution I was building, but I'm a bit late,

    USE tempdb

    GO

    IF EXISTS (SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].Invoice')

    AND TYPE IN (N'U'))

    DROP TABLE [dbo].Invoice

    GO

    CREATE TABLE Invoice (

    ID INT IDENTITY ( 1 , 1 ),

    SequenceNum INT,

    InvoiceNum NVARCHAR(20),

    ItemNum NVARCHAR(20),

    Price MONEY,

    ChargeType NVARCHAR(20))

    INSERT INTO Invoice

    (InvoiceNum,

    ItemNum,

    Price,

    ChargeType)

    SELECT 'Invoice1',

    'ItemNo1',

    21,

    'MainItem'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    1.5,

    'Tax'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    3.0,

    'Delivery'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    21,

    'MainItem'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    1.5,

    'Tax'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    1.5,

    'Tax'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    3.0,

    'Delivery'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    21,

    'MainItem'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    1.5,

    'Tax'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    4.0,

    'Fuel'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    4.0,

    'Fuel'

    UNION ALL

    SELECT 'Invoice1',

    'ItemNo1',

    -3.0,

    'Delivery'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    21,

    'MainItem'

    UNION ALL

    SELECT 'Invoice2',

    'ItemNo1',

    -3.0,

    'Delivery'

    CREATE CLUSTERED INDEX Clus_Invoice_ID ON Invoice (

    ID ASC)

    CREATE NONCLUSTERED INDEX IX_Invoice_InvoiceNum ON Invoice (

    InvoiceNum ASC)

    UPDATE Invoice

    SET SequenceNum = NewSequence.NewSequence

    FROM (SELECT row_number()

    OVER(PARTITION BY InvoiceNum ORDER BY InvoiceNum ASC) AS NewSequence,

    ID,

    SequenceNum,

    InvoiceNum,

    ItemNum,

    Price,

    ChargeType

    FROM Invoice) AS NewSequence

    WHERE Invoice.ID = NewSequence.ID

    SELECT *

    FROM Invoice

    ORDER BY invoiceNum,

    SequenceNum

    Hope it helps,

    Cheers,

    J-F

  • DB_Newbie2007 (1/26/2009)


    It seems the more I get into SQL the more it appears I am also getting into Programming! Being "old school", I tend to be more "linear" in my thinking, doing things "one step at a time", which is why I was never any good at more abstract languages such as C++ or Java!

    We recently discovered a major bug in an accounting system due to improper indexes/keys set on a table, thankfully way before my time here. Basically, the table can have an customer invoice with multiple Items and charges (or credits) per invoice. There are cases when a ItemNumber, charge and charge code can be duplicated when it is posted with additional items, i.e,

    Invoice1 ItemNo1 $21 ChargeType: MainItem

    Invoice1 ItemNo1 $1.5 ChargeType: Tax

    Invoice1 ItemNo1 $3.0 ChargeType: Delivery

    ... (further in the file with weekly invoices)...

    Invoice1 ItemNo1 $21 ChargeType: MainItem

    Invoice1 ItemNo1 $1.5 ChargeType: Tax

    Invoice1 ItemNo1 $4.0 ChargeType: Fuel

    Invoice1 ItemNo1 -$3.0 ChargeType: Delivery

    ...

    Unfortunately, since an outside system creates the invoice file we have no control over the content coming to us, just the ability

    My task is to try and put a unique numbered sequence for all of the above items in the existing table. So, for the example above, I will need 1 through 7 for Invoice1 ItemNo1 entries in the table. There are no other fields that will make each record "unique".

    Ok... I've got to ask... once you have those unique numbers on the rows, what are you going to do? Let the duplicates continue to exist or ??? Obviously, the purpose of adding such numbers is to do something with them... what is that purpose?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Good question. First, the problem was the Invoice Total (a seperate record in the file we receive) did not always match the sum of the charge(s) (sum was sometimes lower)... which is when we discovered the "duplicates".

    Now that we can bring "all" of the data in, we can analyze it and figure out if some of these companies are being incorrectly charged twice for the same items. I wrote another script to compare all of the invoice total(s) versus the sum of the charges and went back to 2007.... less than 1% of the invoices have a problem, but it is still something that needs to be analyzed.

    Being a former Engineer/Scientist, I was alway taught to never throw out data, always bring it in and analyze it to try and figure out what is going on (much like your comment about RBAR versus set based queries). But bringing in all of these records causes a new problem... there are 10 reports that rely on each record in the "charges" table being unique. Any time it was not, it was considered a duplicate billing. Now things have changed.... another challenge, another day! 🙂

    Now part of the challenge is to ensure all records are imported but that a User cannot mistakely re-process the same invoice file (which can contain multiple invoices). We do that by assigning each record in the invoice file a unique number. If we see the same invoice with the same row number come in, we know someone is trying to re-process. If we see an previous invoice come in but is in a different location (row number) in a new invoice file, we know that is a true duplicate billing....

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Thanks for the info... just trying to understand so we can do this in as high a performance method as possible...

    In your original post on this thread, you have (obviously) some dupes... would you mark which ones are dupes just to be absolutely clear? I have an idea that may automate your "analysis". Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply