August 6, 2008 at 7:41 am
OK guys, I have a question, it's something I should definitely know the answer too but I can't think of the best way to do it... probably due to being brain dead after a day's coding... the solution will probably come to me as I explain the problem to you.
Anyway here goes.
I have a simple xls file, containing two tabs with different data in each.
My boss has asked me to write a tool to "upload" the data into a SQL Server database. The data should be normalised to optimise performance in various SPs, which will be accessed by a web app.
First stage, I have created an SSIS package which splits the data (ie does two "uploads", one for each tab.
The problem is, it takes the "denormalised" data from the Excel file and adds it directly to a table.
So the table has eg Country, Item Type, Description, Quantity.
What is the best way to subsequently "normalise" the data?
I have a list of countries anyway so that's fine, I can just map the country to the list for each record.
The problem is for eg Description and Item Type. I need to dynamically create these tables, then map them back to the recordset in the main "upload" table.
So, for example, I have in my upload table:
Country ItemType Description Quantity
England ItemA Box 1
England ItemA Envelope 1
England ItemB Suitcase 1
England ItemC Box 1
France ItemA Box 1
I then want to split this into tables:
tblCountry
ID Name
1 England
2 France
tblItemType
ID Description
1 ItemA
2 ItemB
3 ItemC
tblDescription
ID Description
1 Box
2 Envelope
3 Suitcase
.....then I want to remap my upload table to look something like:
Country ItemType Description Quantity
1 1 1 1
1 1 2 1
1 2 3 1
1 3 1 1
2 1 1 1
Solution
So, what's the best way to do it?
I guess I could create a SP that would:
- Select distinct ItemTypes from [upload table].[ItemType] and create a table tblItemType from this
- repeat this for tblDescription
- run a query along the lines of:
SELECT tblCountry.Id, tblItemType.Id, tblDescription .Id, [upload table].Quantity
INTO #ActualDataTable
FROM [upload table] inner join on tblItemType on [upload table].[Item Type] = tblItemType.Description inner join tblDescription on [upload table].[Item Type] = tblDescription.Description inner join tblCountry on tblCountry.Name = [upload table].Country
(tblCountry already exists)
Actually yeah, I think that would do it, but it's pretty stinky, especially joining tables on description etc, but I can't really think of anything else... first time I've had to "reverse normalise" a database... so any better solutions would be appreciated 🙂
August 6, 2008 at 8:35 am
Build tables for the columns you want to normalize. Don't bother populating them, just build them and add the necessary PKs, constraints and indexes.
Then import your data into a staging table, use that to populate the tables you built for the specific columns, update back into the staging table with the PKs from the sub-tables, then move the result into a final table with FKs to the sub-table PKs.
If you can review the columns in the Excel file, and plan a bit based on what you want to normalize out of there and the procs you want to run on the data, this should be pretty easy.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 6, 2008 at 8:47 am
Perfect, thanks GSquared. That's actually what I'd started doing - only issue I'm having is trying to update back into the staging table - so was going to create second staging table at the end of the process, based on a query joining the first staging table and the newly populated item tables.
I'll try to get the code I was using to attempt to update and post that, might show where I was going wrong.
August 6, 2008 at 8:51 am
OK, this was what I was trying to do (don't laugh!) 😀
UPDATE #tblUpload1
Set Section = (SELECT tblSection.ID
FROM #tblUpload1
inner join tblSection
ON #tblUpload1.Section=tblSection.Description)
It's trying to update on the results of the subquery, which it doesn't like. Not very efficient either. Any ideas?
August 6, 2008 at 9:51 am
UPDATE #tblUpload1
Set Section = tblSection.ID
FROM #tblUpload1
inner join tblSection
ON #tblUpload1.Section=tblSection.Description
Should do it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply