Continuous hours of missing data

  • Hi,

    In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.

    It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.

    Thanks,

    Ami

  • Anamika (6/15/2014)


    Hi,

    In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.

    It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.

    Thanks,

    Ami

    Unless your source data feed is pre-sorted in Customer/Date order (don't sort in SSIS), there is no obvious way of doing this in SSIS that will give you decent performance.

    You are much more likely to achieve something acceptable by loading the data and then querying it using T-SQL.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (6/15/2014)


    Anamika (6/15/2014)


    Hi,

    In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.

    It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.

    Thanks,

    Ami

    Unless your source data feed is pre-sorted in Customer/Date order (don't sort in SSIS), there is no obvious way of doing this in SSIS that will give you decent performance.

    You are much more likely to achieve something acceptable by loading the data and then querying it using T-SQL.

    +1

    Do this in TSQL.

    This is a typical "gaps and islands"[/url] problem.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • This sounds like a reasonably good use case for STreamInsight - came out in 2008R2, still there in 2012 and 2014, can't speak to it's true longevity in the product though. Assuming you can get some data flowing through it, you simply keep your tickers ticking over and write some monitors to alarm if they haven't ticked in 5 continuous hours.... or so the theory goes......

    Link to the TechNet page..

    Completely off topic - I have waited a LONG time to see/hear a 'real world' use case for this, would love to know if you can make it work for you.

    Steve.

  • stevefromOZ (6/16/2014)


    This sounds like a reasonably good use case for STreamInsight - came out in 2008R2, still there in 2012 and 2014, can't speak to it's true longevity in the product though. Assuming you can get some data flowing through it, you simply keep your tickers ticking over and write some monitors to alarm if they haven't ticked in 5 continuous hours.... or so the theory goes......

    Link to the TechNet page..

    Completely off topic - I have waited a LONG time to see/hear a 'real world' use case for this, would love to know if you can make it work for you.

    I believe TSQL will be easier, because it is a well documented problem. (and according to the question it seems the data has already been loaded, instead of being a continuous stream)

    StreamInsight however ... :-S

    It doesn't even have a GUI.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply