Selectively Updating a Warehouse ?

  • We have a pretty big User database in our warehouse. The tables I am looking at are anywhere from 9gb-91gb, 21 million to 1.03 billion rows.

    So one of my tables with 21 million rows is a table that I want to update from the source server. However, one thing they do on the source server is allow the user to delete records. When the data comes over, we want to ignore the deletes. That way we have a record of that user in the warehouse, even if they are gone from the production server.

    Currently what they have set up is a comparison of the two databases to find out which records are missing. That is taking hours to process, so I am looking for a better way. The DBAs are talking about log shipping, but I do not see a way to have the log shipping ignore the deletion of records. I've also been told to do a Differential Update, but I'm not sure how that would work.

    So summed up, I want to add the new records from production to the warehouse and ignore the deletes from production. Any suggestions?

    Thanks

  • What you need is a proper ETL for your data. Build your logic of ETL around MERGE.

    MERGE

    [ TOP ( expression ) [ PERCENT ] ]

    [ INTO ] <target_table> [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias ]

    USING <table_source>

    ON <merge_search_condition>

    [ WHEN MATCHED [ AND <clause_search_condition> ]

    THEN <merge_matched> ] [ ...n ]

    [ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]

    THEN <merge_not_matched> ]

    -- [ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]

    -- THEN <merge_matched> ] [ ...n ]

    [ <output_clause> ]

    [ OPTION ( <query_hint> [ ,...n ] ) ]

    ;

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply