Inner and Outer Loop to add count to existing table data

Question

Inner and Outer Loop to add count to existing table data

DB_Newbie2007

SSCarpal Tunnel

Points: 4449
More actions
January 26, 2009 at 6:22 am

#127588

It seems the more I get into SQL the more it appears I am also getting into Programming! Being "old school", I tend to be more "linear" in my thinking, doing things "one step at a time", which is why I was never any good at more abstract languages such as C++ or Java!
We recently discovered a major bug in an accounting system due to improper indexes/keys set on a table, thankfully way before my time here. Basically, the table can have an customer invoice with multiple Items and charges (or credits) per invoice. There are cases when a ItemNumber, charge and charge code can be duplicated when it is posted with additional items, i.e,
Invoice1 ItemNo1 $21 ChargeType: MainItem
Invoice1 ItemNo1 $1.5 ChargeType: Tax
Invoice1 ItemNo1 $3.0 ChargeType: Delivery
... (further in the file with weekly invoices)...
Invoice1 ItemNo1 $21 ChargeType: MainItem
Invoice1 ItemNo1 $1.5 ChargeType: Tax
Invoice1 ItemNo1 $4.0 ChargeType: Fuel
Invoice1 ItemNo1 -$3.0 ChargeType: Delivery
...
Unfortunately, since an outside system creates the invoice file we have no control over the content coming to us, just the ability
My task is to try and put a unique numbered sequence for all of the above items in the existing table. So, for the example above, I will need 1 through 7 for Invoice1 ItemNo1 entries in the table. There are no other fields that will make each record "unique".
My thought was to loop by doing the following:
1) Create a new table, dbo.Inv_NewCustCharge, with the exact same structure but with a field INT IDENTITY(1,1) called "primary_key". I could then take all of the existing records and number each field 1 through the end of the data.
2) Get a list of all of the Invoice(s) and ItemNo, along with a count of the number of existing item(s) in the database (#temptable).
3) Have an outer loop that gets the Invoice, ItemNo and count.
4) Have an inner loop that will number each record based on the "count" of ItemNo(s), then insert that into the table.
5) Delete the "primary_Key" field after all of the items in (3) and (4) are numbered.
New enteries into the table would get their numbering from the application that enters the data into the database, so I just need to deal with the existing data. Here was my solution:
SET NOCOUNT ON
DECLARE @Statement NVARCHAR(4000),
@MyItemNumber NVARCHAR(128),
@MyInvoice NVARCHAR(128),
@MyCount NVARCHAR(128),
@DataCntr INT,
@LoopCntr INT,
@RowCounter INT,
@MinRow BIGINT,
@MaxRow BIGINT;
IF OBJECT_ID(N'tempdb..#temptable', N'U') IS NOT NULL
DROP TABLE #temptable;
CREATE TABLE #temptable (primary_key INT IDENTITY(1,1) NOT NULL,
CntRecords int NOT NULL, ItemNo varchar(20) NOT NULL, InvoiceNo char(9) NOT NULL);
INSERT INTO #temptable
select count(*) as Count, ItemNo, InvoiceNo
from dbo.Inv_CustCharge
GROUP BY ItemNo, InvoiceNo
-- Sent Counters for looping purposes
SET @DataCntr = ISNULL((Select count(*) from #temptable),0)
SET @LoopCntr = 1
WHILE @DataCntr > 0 AND @LoopCntr <= @DataCntr
BEGIN -- WHILE LOOP
SELECT @MyCount = CntRecords, @MyItemNumber = TrackingNo, @MyInvoice = InvoiceNo
FROM #temptable
WHERE primary_key = @LoopCntr
SELECT @MinRow = MIN(primary_key), @MaxRow = MAX(primary_key)
FROM dbo.Inv_NewCustCharge--#Temp_CustCharge
WHERE ItemNo = @MyItemNumber AND InvoiceNo = @MyInvoice;
SET @RowCounter = 1
WHILE @RowCounter <= @MyCount AND @MinRow <= @MaxRow
BEGIN
UPDATE dbo.Inv_NewCustCharge
SET RowNumber = @RowCounter
FROM dbo.Inv_NewCustCharge --#Temp_CustCharge
WHERE ItemNo = @MyItemNumber AND InvoiceNo = @MyInvoice
AND Primary_Key = @MinRow;
SET @RowCounter = @RowCounter + 1;
SET @MinRow = @MinRow + 1
END -- Inner While Loop
SET @LoopCntr = @LoopCntr + 1;
END -- WHILE LOOP
DROP TABLE #temptable;
The biggest issue is the time it takes to perform the work. For 116K records, it took 2 hours 34 minutes to "re-number" the table. Some databases can have over 1 million records in the table for a specific customer!
I was hoping someone might have some good "out of the box" ideas to try and perform that above in a much faster fashion. My boss seems to think I can get the above process down to a 5-10 minute process... personally, I think he is off his rocker!
Thanks!
Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 1

Thinking about it some more, we came up with a solution that might be a bit faster... instead of doing an update for ever record (processing row by row may be necessary but is NEVER good! See the "gots" list courtesy of Jeff Moden!), we created a new table using the primary key and row counter,

CREATE TABLE #Temp_NewChargeCount (Primary_key BIGINT NOT NULL, NewCount INT NOT NULL);

We then modified the code to do an insert,

INSERT INTO #Temp_NewChargeCount

VALUES(@MinRow, @RowCounter);

SET @RowCounter = @RowCounter + 1;

SET @MinRow = @MinRow + 1

We join the #Temp_NewChargeCount with the table having the Primary_key field and insert the row numer and original records into a new table.

It now takes 35 minutes 34 seconds instead of 2 hours 34 minutes.... a definite improvement!

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

GSquared SSC Guru Points: 260824 More actions · Answer 2

You might want to take a look at the Row_Number, Rank and Dense_Rank functions in Books Online. I'm not sure, but they seem to fit what you're looking for here.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

Applying the Rownumber process to your data for example:

;with numbered as(SELECT rowno=row_number()

over (partition by Invoice, Item, Type order by Invoice),

Invoice,Item, Charge, type from Sales)

select * from numbered

Will give you these results:

rownoInvoiceItemChargetype

1113.00Delivery

211-3.00Delivery

1114.00Fuel

11121.00MainItem

21121.00MainItem

1111.50Tax

2111.50Tax

Remvoing the "Item" from the above code will result in

;with numbered as(SELECT rowno=row_number()

over (partition by Invoice, Item order by Invoice),

Invoice,Item, Charge, type from Sales)

select * from numbered

Producing:

rownoInvoiceItemChargetype

11121.00MainItem

2111.50Tax

3113.00Delivery

41121.00MainItem

5111.50Tax

6114.00Fuel

711-3.00Delivery

81121.00MainItem

9111.50Tax

10113.00Delivery

111121.00MainItem

12111.50Tax

13114.00Fuel

1411-3.00Delivery

Hopefully this will be of some use to you

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 4

Oh... oh.... now why did I not think about that! I use the Row_Number() function in other SQL statements... never thought about it in this case. Another case of posting here yeilding excellent results! 🙂 🙂

I used SELECT ROW_NUMBER() OVER(PARTITION BY ItemNo ORDER BY InvoiceNo) and compared the results with my programmatic solution.. identical counts.

The speed? 1 second. WOOO HOOO!

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 5

The cool thing about this is that your boss was right, 😉

Here's a solution I was building, but I'm a bit late,

USE tempdb

GO

IF EXISTS (SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'[dbo].Invoice')

AND TYPE IN (N'U'))

DROP TABLE [dbo].Invoice

GO

CREATE TABLE Invoice (

ID INT IDENTITY ( 1 , 1 ),

SequenceNum INT,

InvoiceNum NVARCHAR(20),

ItemNum NVARCHAR(20),

Price MONEY,

ChargeType NVARCHAR(20))

INSERT INTO Invoice

(InvoiceNum,

ItemNum,

Price,

ChargeType)

SELECT 'Invoice1',

'ItemNo1',

21,

'MainItem'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

1.5,

'Tax'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

3.0,

'Delivery'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

21,

'MainItem'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

1.5,

'Tax'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

1.5,

'Tax'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

3.0,

'Delivery'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

21,

'MainItem'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

1.5,

'Tax'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

4.0,

'Fuel'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

4.0,

'Fuel'

UNION ALL

SELECT 'Invoice1',

'ItemNo1',

-3.0,

'Delivery'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

21,

'MainItem'

UNION ALL

SELECT 'Invoice2',

'ItemNo1',

-3.0,

'Delivery'

CREATE CLUSTERED INDEX Clus_Invoice_ID ON Invoice (

ID ASC)

CREATE NONCLUSTERED INDEX IX_Invoice_InvoiceNum ON Invoice (

InvoiceNum ASC)

UPDATE Invoice

SET SequenceNum = NewSequence.NewSequence

FROM (SELECT row_number()

OVER(PARTITION BY InvoiceNum ORDER BY InvoiceNum ASC) AS NewSequence,

ID,

SequenceNum,

InvoiceNum,

ItemNum,

Price,

ChargeType

FROM Invoice) AS NewSequence

WHERE Invoice.ID = NewSequence.ID

SELECT *

FROM Invoice

ORDER BY invoiceNum,

SequenceNum

Hope it helps,

Cheers,

J-F

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 6

DB_Newbie2007 (1/26/2009)
It seems the more I get into SQL the more it appears I am also getting into Programming! Being "old school", I tend to be more "linear" in my thinking, doing things "one step at a time", which is why I was never any good at more abstract languages such as C++ or Java!
We recently discovered a major bug in an accounting system due to improper indexes/keys set on a table, thankfully way before my time here. Basically, the table can have an customer invoice with multiple Items and charges (or credits) per invoice. There are cases when a ItemNumber, charge and charge code can be duplicated when it is posted with additional items, i.e,
Invoice1 ItemNo1 $21 ChargeType: MainItem
Invoice1 ItemNo1 $1.5 ChargeType: Tax
Invoice1 ItemNo1 $3.0 ChargeType: Delivery
... (further in the file with weekly invoices)...
Invoice1 ItemNo1 $21 ChargeType: MainItem
Invoice1 ItemNo1 $1.5 ChargeType: Tax
Invoice1 ItemNo1 $4.0 ChargeType: Fuel
Invoice1 ItemNo1 -$3.0 ChargeType: Delivery
...
Unfortunately, since an outside system creates the invoice file we have no control over the content coming to us, just the ability
My task is to try and put a unique numbered sequence for all of the above items in the existing table. So, for the example above, I will need 1 through 7 for Invoice1 ItemNo1 entries in the table. There are no other fields that will make each record "unique".

Ok... I've got to ask... once you have those unique numbers on the rows, what are you going to do? Let the duplicates continue to exist or ??? Obviously, the purpose of adding such numbers is to do something with them... what is that purpose?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DB_Newbie2007 SSCarpal Tunnel Points: 4449 More actions · Answer 7

Jeff,

Good question. First, the problem was the Invoice Total (a seperate record in the file we receive) did not always match the sum of the charge(s) (sum was sometimes lower)... which is when we discovered the "duplicates".

Now that we can bring "all" of the data in, we can analyze it and figure out if some of these companies are being incorrectly charged twice for the same items. I wrote another script to compare all of the invoice total(s) versus the sum of the charges and went back to 2007.... less than 1% of the invoices have a problem, but it is still something that needs to be analyzed.

Being a former Engineer/Scientist, I was alway taught to never throw out data, always bring it in and analyze it to try and figure out what is going on (much like your comment about RBAR versus set based queries). But bringing in all of these records causes a new problem... there are 10 reports that rely on each record in the "charges" table being unique. Any time it was not, it was considered a duplicate billing. Now things have changed.... another challenge, another day! 🙂

Now part of the challenge is to ensure all records are imported but that a User cannot mistakely re-process the same invoice file (which can contain multiple invoices). We do that by assigning each record in the invoice file a unique number. If we see the same invoice with the same row number come in, we know someone is trying to re-process. If we see an previous invoice come in but is in a different location (row number) in a new invoice file, we know that is a true duplicate billing....

Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 8

Thanks for the info... just trying to understand so we can do this in as high a performance method as possible...

In your original post on this thread, you have (obviously) some dupes... would you mark which ones are dupes just to be absolutely clear? I have an idea that may automate your "analysis". Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)