August 21, 2006 at 11:26 pm
Hey guys,
I have a difficult data transformation issue that is in serious need of an overhaul. The table structure, which I have no control over, is:
Table: IM_BarCodes
Item_no char(15),
Barcode char(20),
Color_Index char(2),
Size_Index char(2)
Table: IM_Quantities [quantities for each item / color / size]
Item_No char(15),
Color_Index char(2),
Size_Index char(2),
QuantityOnHand int
Table: IM_Grid [sizes available for each store / item / color / size]
Item_no char(15),
Store_No char(15),
color_1 char(9),
color_2 char(9),
color_3 char(9),
color_4 char(9), (there are actually 40 of these columns)
Size_1 char(5),
Size_2 char(5),
Size_3 char(5),
Size_4 char(5) (there are actually 40 of these columns)
The above tables feed into the one below:
Table: INV_Barcodes
Store_no char(15),
item_no char(15),
Barcode char(20),
Color_Index char(9),
Size_Index char(5),
QuantityOnHand int
First off, the first two tables (IM_Barcodes and IM_Quantities) makes total sense and is easy to join together. However, due to memory and security constraints, I can only send the relevant data to each store... So, if a particular store doesn't carry the specific color / size combination then they don't get the prices. Which leads to the tie in on the IM_Grid table and my problem.
I have a stored procedure executing nightly that uses a combination of a cursor and dynamic sql (yes, I know, both are bad) to pull out which stores get the pricing / barcode data for a particular item; however, the process takes about 8 hours to run.
Does anyone see a better, or even just a different, way?
Thanks,
Chris
August 22, 2006 at 1:27 am
OK, after re-reading your post I decided to post this.. I'm not sure whether it helps, but let's hope so.
You didn't mention number of rows in table IM_Grid, but since it is taking such a long time, it will be a huge table probably. Did you ever try to write a procedure that normalizes data from this table, and inserts the result into a "normal" table"? You could run this procedure every night before the one that prepares pricing, and then you could rewrite your pricing procedure to use JOIN with the new table. I am not sure whether this will improve the time considerably or not, but it could be worth trying.
Of course the best way would be to replace the horrible Grid table with something more useful... maybe there could be a way to put pressure on those who CAN change the tables? I know it is hard to achieve such things, but you might just get lucky :-).
August 22, 2006 at 1:42 am
By the way, I'm not sure I understand everything.. in the table IM_Grid, color is CHAR(9) and size is CHAR(5). In the first two tables, you have color_index and size_index, both with CHAR(2)... and in your output, the columns have again 5 and 9 respectively. Are there "hidden" tables for Color and Size, which translate the respective index into a value?
August 23, 2006 at 12:17 am
Step 1, I think, (normally not so adamant) would be to make sure that whoever designed this mess needs to be identified, captured, disemboweled in public, and have his/her pulsating remains shot butt first out of a cannon into a stone wall.
There… I feel much better now…
This is a classic problem of taking someone else’s denormalized garbage and normalizing it. Obviously, I’ve not tested it, but the following code should be very close to what you need and it should be light years faster than the 8 hour cursor method that you identified because of the "divide and conquer" methods used. You are correct… Cursors pretty much stink up the house especially if someone tried to join these three tables to create the fourth using, what I suspect is, nested cursors or at least a cursor nested in a WHILE loop. Dynamic SQL isn’t nearly as bad if it is properly formed and contains few, if any, joins.
Here’s the code… lemme know how it works… read the comments… this pit-wuffie drops a table...
--===== If the build table exists, drop it IF OBJECT_ID('dbo.INV_Barcodes_Work') IS NOT NULL DROP TABLE dbo.INV_Barcodes_Work
--===== Create the build table schema CREATE TABLE dbo.INV_Barcodes_Work Store_No CHAR(15), Item_No CHAR(15), Barcode CHAR(20), Color_Index CHAR(9), Size_Index CHAR(5), QuantityOnHand INT
--===== Declare the local variables DECLARE @SQL VARCHAR(8000) DECLARE @Counter = INT DECLARE @strCounter VARCHAR(2)
--===== Build the basic data by normalizing the data in the IM_Grid table SET @Counter = 1 WHILE @Counter <= 40 BEGIN SET @strCounter = CAST(@Counter AS VARCHAR(2)) SET @SQL = ' INSERT INTO dbo.INV_Barcodes_Work (Store_No,Item_No,Color_Index,Size_Index)' + CHAR(13) SET @SQL = @SQL + 'SELECT Store_No,Item_No,Color_'+@strCounter+',Size_'+@strCounter+' FROM dbo.IM_Grid WITH (NOLOCK)'+CHAR(13) SET @SQL = @SQL + 'WHERE Color_'+@strCounter+' IS NOT NULL AND Size_'+@strCounter+' IS NOT NULL' EXEC @SQL SET @Counter = @Counter + 1 END
--===== Update the Barcode column in the build table UPDATE w SET Barcode = bc.Barcode FROM dbo.INV_Barcodes_Work w, --Cannot NOLOCK object of update dbo.IM_BarCodes bc WITH (NOLOCK) WHERE w.Item_No = bc.Item_No AND w.Color_Index = bc.Color_Index, AND w.Size_Index = bc.SizeIndex
--===== Update the QuantityOnHand column UPDATE w SET QuantityOnHand = q.QuantityOnHand FROM dbo.INV_Barcodes_Work w, --Cannot NOLOCK object of update dbo.IM_Quantities q WITH (NOLOCK) WHERE w.Item_No = q.Item_No AND w.Color_Index = q.Color_Index, AND w.Size_Index = q.SizeIndex
--====== Cascade rename the tables.... (takes about 65 milliseconds) TRUNCATE TABLE dbo.INV_Barcodes_Work --Saves log space because is not logged DROP TABLE dbo.INV_Barcodes_Work EXEC dbo.sp_Rename 'INV_Barcodes_Work','INV_Barcodes'
--====== Add any indexes you want here but at least add a primary key...
{Edit} By the way, the result table is actually denormalized because it contains data from the first 2 tables. But, like someone said, "Normalize 'till it hurts... denormalized 'till it works". Since the table is rebuilt every night and you only use 1 column from each of the tables, it's "useful" denormalization that should give you some pretty good performance for whatever the 4th table is used for.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2006 at 9:10 pm
Hey Jeff,
I think you're method is pretty close to what I need. The only difference I see off the top of my head is that the WHILE loop really needs to have another WHILE loop in it to capture (color_1, size_1), (color_1,size_2), etc. And it looks like you drop the work table prior to renaming it, I'm guessing that was just a typo.
Believe me, I tried step 1. However, the IM_ tables are copied from a popular Point of Sale system. It's backend database is actually Pervasive SQL 7, which is really just a front for BTrieve... I tried to convince my client that they were probably better off switching POS vendors, but that didn't work.
August 23, 2006 at 10:01 pm
Yep... copy and paste error... the cascade rename should look like...
--====== Cascade rename the tables.... (takes about 65 milliseconds) TRUNCATE TABLE dbo.INV_Barcodes --Saves log space because is not logged DROP TABLE dbo.INV_Barcodes EXEC dbo.sp_Rename 'INV_Barcodes_Work','INV_Barcodes'
It didn't make sense to me to do a Nested WHILE loop because you said there were 40 colors and 40 sizes... I surmized that Color_1 went with Size_1... it just didn't make sense to have 1600 color and size combinations per item and THAT might be part of why it takes your cursor so long... are you sure you need to cross-join colors and sizes like that?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2006 at 9:09 am
Just out of curiousity, how many rows are you dealing with?
Sven
August 24, 2006 at 9:11 am
Better yet......
1) How many rows are you dealing with
2) What indexes, if any, exist on the source data
3) Are you able to recommend the creation of any indexes on the source data
4) Are you able to dictate the format and design of the destination table(s)
Sounds like an interesting problem to resolve.
Sven
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply