June 15, 2014 at 11:23 pm
Hi,
In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.
It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.
Thanks,
Ami
June 15, 2014 at 11:44 pm
Anamika (6/15/2014)
Hi,In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.
It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.
Thanks,
Ami
Unless your source data feed is pre-sorted in Customer/Date order (don't sort in SSIS), there is no obvious way of doing this in SSIS that will give you decent performance.
You are much more likely to achieve something acceptable by loading the data and then querying it using T-SQL.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 15, 2014 at 11:56 pm
Phil Parkin (6/15/2014)
Anamika (6/15/2014)
Hi,In my ETL feed i need to identify whether the data is missing for continuous 5 hours. What could be the efficient way of doing this in SSIS. Data will be in millions of records per customer and it has to be processed for many customers.
It can be one day data or many days data. I have to find out in a day if there is any 5 continuous hours of missing data.
Thanks,
Ami
Unless your source data feed is pre-sorted in Customer/Date order (don't sort in SSIS), there is no obvious way of doing this in SSIS that will give you decent performance.
You are much more likely to achieve something acceptable by loading the data and then querying it using T-SQL.
+1
Do this in TSQL.
This is a typical "gaps and islands"[/url] problem.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 16, 2014 at 8:09 am
This sounds like a reasonably good use case for STreamInsight - came out in 2008R2, still there in 2012 and 2014, can't speak to it's true longevity in the product though. Assuming you can get some data flowing through it, you simply keep your tickers ticking over and write some monitors to alarm if they haven't ticked in 5 continuous hours.... or so the theory goes......
Completely off topic - I have waited a LONG time to see/hear a 'real world' use case for this, would love to know if you can make it work for you.
Steve.
June 16, 2014 at 10:36 am
stevefromOZ (6/16/2014)
This sounds like a reasonably good use case for STreamInsight - came out in 2008R2, still there in 2012 and 2014, can't speak to it's true longevity in the product though. Assuming you can get some data flowing through it, you simply keep your tickers ticking over and write some monitors to alarm if they haven't ticked in 5 continuous hours.... or so the theory goes......Completely off topic - I have waited a LONG time to see/hear a 'real world' use case for this, would love to know if you can make it work for you.
I believe TSQL will be easier, because it is a well documented problem. (and according to the question it seems the data has already been loaded, instead of being a continuous stream)
StreamInsight however ... :-S
It doesn't even have a GUI.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply