Using ETL for data feeds, not data warehouse

  • Hello experts,

    Does anyone have any suggestions or references for how I can use ETL concepts for a data feed, as opposed to a data warehouse? A lot of the material I find online seems tailored toward taking data from an operational system and transforming it for a data warehouse. Interesting and useful stuff, but I am not looking to do that.

    I just want to pull data based on certain criteria and use that data for a developer to upload to a system. Part of the data feed includes filtering for certain data that has changed since the previous data feed. So I have come up with a version that uses the MERGE statement but it has led me to make a setup with (I feel) too much complexity. I'm looking for a more basic approach that might use just non-MERGE techniques of checking for changes, logging history of changes, and delivering the data to a final staging table for upload rather transformed into dimensions for a data warehouse.

    I know there is a lot more detail to work out but for starters I just would be grateful for any advice about use of ETL concepts for non-data warehouse needs if anyone know of any.

    Thanks for any help!

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner (12/4/2015)


    Hello experts,

    Does anyone have any suggestions or references for how I can use ETL concepts for a data feed, as opposed to a data warehouse? A lot of the material I find online seems tailored toward taking data from an operational system and transforming it for a data warehouse. Interesting and useful stuff, but I am not looking to do that.

    I just want to pull data based on certain criteria and use that data for a developer to upload to a system. Part of the data feed includes filtering for certain data that has changed since the previous data feed. So I have come up with a version that uses the MERGE statement but it has led me to make a setup with (I feel) too much complexity. I'm looking for a more basic approach that might use just non-MERGE techniques of checking for changes, logging history of changes, and delivering the data to a final staging table for upload rather transformed into dimensions for a data warehouse.

    I know there is a lot more detail to work out but for starters I just would be grateful for any advice about use of ETL concepts for non-data warehouse needs if anyone know of any.

    Thanks for any help!

    - webrunner

    Very common need.

    By far the easiest way to accomplish this "update what changed, insert what is new (and perhaps delete deleted records)" is a simple trigger on the source table that captures the PK of the table and the action performed on the record (simple single-character or tinyint or even 3 bit fields will suffice and keep the driver-table narrow and efficient). You may wish to include a datetime stamp value too. Now when you get records to migrate it is simply using that table to get the distinct (for UPDATEs anyway, as they could have multiple records) set of 3 records you need: first do INSERTS then DELETEs and finally UPDATES. This is the most efficient way to do it that doesn't lead to bad data. Then clean out driver table records.

    You have concurrency concerns here, so will need to deal with that or you could again wind up with bad data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • ETL is ETL. It doesn't have to be to a data warehouse.

    Merge can work but comes with overhead as do most things.

    Source to staging. Perform your upsert commands. Done.

    If merge isn't your cup of [beverage of choice] then do it in separate steps.

    I'm not sure why you think the same concepts can't be applied to datamarts.

  • Thanks! That's pointed me in the right direction.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • JustMarie (12/4/2015)


    ETL is ETL. It doesn't have to be to a data warehouse.

    Merge can work but comes with overhead as do most things.

    Source to staging. Perform your upsert commands. Done.

    If merge isn't your cup of [beverage of choice] then do it in separate steps.

    I'm not sure why you think the same concepts can't be applied to datamarts.

    Thanks for your response. It wasn't so much that I thought the same concepts can't be applied, just that all the material I found involved deciding which data had to be treated as a slowly changing dimensions (SCD) and included a lot of discussion about how SCDs are to be used for data warehouse-type destinations.

    In other words, I feel it is becoming overkill for what I need to do, because the requirement is to pull an initial set of data from the operational database for an API upload - after which records are chosen and uploaded (updated) only if certain column values have changed since the previous feed.

    I've gotten pretty far with MERGE but now am seeing what I think is some unnecessary (for this feed) complexity in the staging table regarding which record is active, debugging why new records aren't added or updated, etc. I just feel I am now in the weeds because I have overhead for the SCD behavior that I am not sure I even need.

    Thanks again.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply