Normals not my friend anymore....

  • Sorry if this is a newbe question. I'm trying...

    I got given 10GB of data in flat files which I imported into SQL and normalized.

    Everything works fine but I'm now getting more non normal data (in the original flat file form) and I dont know how to get it into the new normalized form without extensive processing.

    At the moment I'm using SSIS to load the new data and the process goes something like:

    Does an entry from this instrument already exist. Yes: get ID, no add new entry get ID.

    Does this instrument have a parameter called "Operator". Yes: get ID, no add new entry get ID. etc etc for 40 or so parameters that may be in each line entry.

    I do this line by line. I can use a staging table but the amount of data I received varies so there's not an obvious performance increase to use staging.

    Is my method of adding new data correct? The processing time is very lonnnnnng....

    Thanks,

    Mark.

  • medwar19 (4/10/2011)


    Sorry if this is a newbe question. I'm trying...

    I got given 10GB of data in flat files which I imported into SQL and normalized.

    Everything works fine but I'm now getting more non normal data (in the original flat file form) and I dont know how to get it into the new normalized form without extensive processing.

    At the moment I'm using SSIS to load the new data and the process goes something like:

    Does an entry from this instrument already exist. Yes: get ID, no add new entry get ID.

    Does this instrument have a parameter called "Operator". Yes: get ID, no add new entry get ID. etc etc for 40 or so parameters that may be in each line entry.

    I do this line by line. I can use a staging table but the amount of data I received varies so there's not an obvious performance increase to use staging.

    Is my method of adding new data correct? The processing time is very lonnnnnng....

    Thanks,

    Mark.

    My question pertains to the following from above...

    Does this instrument have a parameter called "Operator". Yes: get ID, no add new entry get ID. etc etc for 40 or so parameters that may be in each line entry.

    Is ID unique but simply has different parameters? And why do you need to find if "this instrument {has} a parameter called "Operator""? Why do you need to return the ID if it is?

    My recommendation would be to provide a smaller test setup and your performance problem code. Please see the article at the first link in my signature below for how to do that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey,

    thanks for your reply

    To answer your question I need the IDs to enter new data that has previous entries. Maybe this is where I'm going wrong? If I use joins I can find the plain text and enter the IDs directly into the normalized db. For all lines where the plain text does not exist I can do a create. Think I might get it now. Stage the incoming data which will split into two tables based on if a parameter already exists. Then do a create or update function?

    Thanks,

    Mark.

  • If it's the row-by-row nature of your current process that's creating the performance issues, then staging tables may help you make the process set-based. It sounds like you have a lot of foreign keys that need to be looked up for each row of new data; doing this one row at a time probably introduces a lot of redundant data fetches and handcuffs the qeuery optimizer.

    Something like this might be more efficient:

    1) Bring external data into staging table(s)

    2) Using a SELECT with a NOT EXISTS clause, insert new rows for any lookup value (e.g. instrument/operator) that does not exist in production

    3) Run an update to look up and store Foreign Keys from production for existing values (if 2 was done correctly, every value should now have a key)

    4) Move staged data into production

    You may need to repeat steps 2 & 3 for all your lookups, but you only have to do it once per related table, not per row. There's no need to cursor through the data for row-by-row processing.

  • medwar19 (4/11/2011)


    Hey,

    thanks for your reply

    To answer your question I need the IDs to enter new data that has previous entries. Maybe this is where I'm going wrong? If I use joins I can find the plain text and enter the IDs directly into the normalized db. For all lines where the plain text does not exist I can do a create. Think I might get it now. Stage the incoming data which will split into two tables based on if a parameter already exists. Then do a create or update function?

    Thanks,

    Mark.

    Hmmmm... a lot of people think they need to do such checks column by column so let me ask... do the new records contain data in all the columns and, if so, is that data the same in both the old and the new records? If so, just delete the old record and replace it with the new. It'll be a whole lot faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey, thanks for your thought. Sadly not all records contain all parameters. The data is actually event logs so no records are updated, just endless new ones created.

    Thanks again.

    Mark.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply