General data -> database question

  • OK guys, I have a question, it's something I should definitely know the answer too but I can't think of the best way to do it... probably due to being brain dead after a day's coding... the solution will probably come to me as I explain the problem to you.

    Anyway here goes.

    I have a simple xls file, containing two tabs with different data in each.

    My boss has asked me to write a tool to "upload" the data into a SQL Server database. The data should be normalised to optimise performance in various SPs, which will be accessed by a web app.

    First stage, I have created an SSIS package which splits the data (ie does two "uploads", one for each tab.

    The problem is, it takes the "denormalised" data from the Excel file and adds it directly to a table.

    So the table has eg Country, Item Type, Description, Quantity.

    What is the best way to subsequently "normalise" the data?

    I have a list of countries anyway so that's fine, I can just map the country to the list for each record.

    The problem is for eg Description and Item Type. I need to dynamically create these tables, then map them back to the recordset in the main "upload" table.

    So, for example, I have in my upload table:

    Country ItemType Description Quantity

    England ItemA Box 1

    England ItemA Envelope 1

    England ItemB Suitcase 1

    England ItemC Box 1

    France ItemA Box 1

    I then want to split this into tables:

    tblCountry

    ID Name

    1 England

    2 France

    tblItemType

    ID Description

    1 ItemA

    2 ItemB

    3 ItemC

    tblDescription

    ID Description

    1 Box

    2 Envelope

    3 Suitcase

    .....then I want to remap my upload table to look something like:

    Country ItemType Description Quantity

    1 1 1 1

    1 1 2 1

    1 2 3 1

    1 3 1 1

    2 1 1 1

    Solution

    So, what's the best way to do it?

    I guess I could create a SP that would:

    - Select distinct ItemTypes from [upload table].[ItemType] and create a table tblItemType from this

    - repeat this for tblDescription

    - run a query along the lines of:

    SELECT tblCountry.Id, tblItemType.Id, tblDescription .Id, [upload table].Quantity

    INTO #ActualDataTable

    FROM [upload table] inner join on tblItemType on [upload table].[Item Type] = tblItemType.Description inner join tblDescription on [upload table].[Item Type] = tblDescription.Description inner join tblCountry on tblCountry.Name = [upload table].Country

    (tblCountry already exists)

    Actually yeah, I think that would do it, but it's pretty stinky, especially joining tables on description etc, but I can't really think of anything else... first time I've had to "reverse normalise" a database... so any better solutions would be appreciated 🙂

  • Build tables for the columns you want to normalize. Don't bother populating them, just build them and add the necessary PKs, constraints and indexes.

    Then import your data into a staging table, use that to populate the tables you built for the specific columns, update back into the staging table with the PKs from the sub-tables, then move the result into a final table with FKs to the sub-table PKs.

    If you can review the columns in the Excel file, and plan a bit based on what you want to normalize out of there and the procs you want to run on the data, this should be pretty easy.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Perfect, thanks GSquared. That's actually what I'd started doing - only issue I'm having is trying to update back into the staging table - so was going to create second staging table at the end of the process, based on a query joining the first staging table and the newly populated item tables.

    I'll try to get the code I was using to attempt to update and post that, might show where I was going wrong.

  • OK, this was what I was trying to do (don't laugh!) 😀

    UPDATE #tblUpload1

    Set Section = (SELECT tblSection.ID

    FROM #tblUpload1

    inner join tblSection

    ON #tblUpload1.Section=tblSection.Description)

    It's trying to update on the results of the subquery, which it doesn't like. Not very efficient either. Any ideas?

  • UPDATE #tblUpload1

    Set Section = tblSection.ID

    FROM #tblUpload1

    inner join tblSection

    ON #tblUpload1.Section=tblSection.Description

    Should do it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply