Best method for moving constant flow of data into SQL

  • Lowell (5/17/2011)


    Don't Panic: MERGE is just a fancy way to do what you've already done a zillion times:, but as separate steps...INSERT new rows, UPDATE matching rows, DELETE rows that were pruned away.

    just a new, nice way to handle those operations together. a shiny new tool for us geeks, as it were.

    The real question is if you compared yesterdays data to todays data, how many rows are new, versus how many changed.

    can you post a table structure exampel that shows us the primary key?

    witht eh PK, you should(in theory, anyway) be able to compare old vs new to identify the changes.

    once that is done, you can evaluate just how many rows are REALLY changing, vs the total load.

    I'll have to explore the MERGE statement a bit further then.

    I could show some table structures however theres 140 tables. Are you mainly talking about the UD screen tables?

    Also, I'm not sure if we're talking about the ETL process or the real-time(ish) process when discussing changes made.

    The ETL process is always everything, however I did notice after reading up on the MERGE statement that it can be combined with a BULK INSERT statement to only load the changes... interesting...!?! Unfortunately it look like it relies on the primary key of the source being defined AND being in order. I don't have any control over the ordering of the extraction.

    The real-time process is based on audit data so I know exactly what has changed.

    Unfortunately if I analysed updates over inserts it would vary from Client to Client depending on the type of applications they have developed with our software.

    I was thinking that the best way might be delete the data from the staging area after the ETL completion. Then use my Windows Service to load the real-time(ish) data into the staging area. Then process this data from time to time. I could batch changes up into data that is destined for the same tables. I'll also know if they are deletes/updates/inserts so know how to handle the data.

    Ideally I'd like to use asynchornous triggers however I need to support 2005 and will lose the performance benfits of batching changes up. This led me to think about the SQL Broker but now I'm thinking a SQL Agent task every few seconds...

  • Drammy,

    From this:

    1, Extraction of everything from source database into text files.

    2, Analysis of text files, reading metadata -> creation of a .sql file containing T-SQL to drop and recreate the staging area as well as a .fmt format file for each source table. This ensures any source db structure changes are coped with by the process.

    3, Execution of .sql to recreate stagin area.

    4, Bulk load with TABLOCK to load data from each text file into staging area using format files created in step 2.

    5, Indexing of the staging area prior to reporting schema generation.

    6, Generate the reporting schema (Stored procs). If a table is to be recreated it is first dropped.

    7, Index the reporting schema.

    I wouldn't recommend changing this process, personally. What I might do, however, is mess with schemas. Bear with me a second, it'll make sense in a moment. 🙂

    First, you'll have to make sure all of your calls from your reporting system are schema qualified to call dbo.

    Then, at step 2, build everything in schema 'new'. Procs, tables, the works. Index away.

    Add a step 8. Once everything is done, create a loop process that will rename all the older items to an old. schema, and swap in your new. into dbo. This should be a very fast process (a few seconds) and if you feel the need use an application type lock to give the front end some kind of warning this is going to happen.

    This way you can do all the background work you need to for however long it's necessary, but because you've got schema changes and the like that you need to process you can't assume your primary tables will stay available, and you won't need to try to do schema alterations mid-day.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (5/17/2011)


    Add a step 8. Once everything is done, create a loop process that will rename all the older items to an old. schema, and swap in your new. into dbo. This should be a very fast process (a few seconds) and if you feel the need use an application type lock to give the front end some kind of warning this is going to happen.

    Just a thought... repointing Synonyms would take a little more tracking work but would be incredibly fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply