May 21, 2012 at 6:12 pm
I haven’t worked with data islands and gaps much in the past, and have been given a task to determine how many times a device interface was at > 95% utilization for at least 10 minutes continuously.
The data I'm querying is performance data from network devices such as routers. Every minute thousands of device interfaces are polled and we get statistics back like bandwidth utilization %.
Utilization can spike up and down pretty significantly so it’s only continuous high utilization longer than 10 minutes I need to count. A device could have high utilization for hours and hours possibly, and that would count as one high utilization event. So I guess the “end” of a high utilization event would be when the next record after a > 95% record is the first <= 95%.
I’m assuming I’d have to use something like row_number over() but I haven’t been able to work it out. o_O
Here's a sample set of records to help visualize the data:
TheDateTimeInterfaceNameUtilizationPercent
12/1/2010 6:00WAN Interface99
12/1/2010 6:01WAN Interface99
12/1/2010 6:02WAN Interface97
12/1/2010 6:03WAN Interface96
12/1/2010 6:04WAN Interface99
12/1/2010 6:05WAN Interface93
12/1/2010 6:06WAN Interface93
12/1/2010 6:07WAN Interface87
12/1/2010 6:08WAN Interface85
12/1/2010 6:09WAN Interface85
12/1/2010 6:10WAN Interface86
12/1/2010 6:11WAN Interface87
12/1/2010 6:12WAN Interface87
12/1/2010 6:13WAN Interface87
12/1/2010 6:00LAN Interface99
12/1/2010 6:01LAN Interface99
12/1/2010 6:02LAN Interface97
12/1/2010 6:03LAN Interface96
12/1/2010 6:04LAN Interface99
12/1/2010 6:05LAN Interface98
12/1/2010 6:06LAN Interface98
12/1/2010 6:07LAN Interface99
12/1/2010 6:08LAN Interface97
12/1/2010 6:09LAN Interface96
12/1/2010 6:10LAN Interface98
12/1/2010 6:11LAN Interface99
12/1/2010 6:12LAN Interface99
12/1/2010 6:13LAN Interface93
So I need to take the above data (could be millions of records with thousands of different interfaces) and come up with a result set that looks like this:
Interface Name# of Times device was continously at > 95% for > 10 minutes
WAN Interface0
LAN Interface1
While "WAN Interface" was at greater than 95% for a few polls, it didn't last for 10 minutes. The "LAN interface" was at > 95% for longer than ten minutes until a 93% poll happened.
Hope that makes sense. Thanks!
S
May 21, 2012 at 6:48 pm
I'm not sure exactly if this fits the definition of islands and gaps because you seem to have complete data for each minute of time interval.
This approach should work if you indeed have no such gaps. Data setup:
DECLARE @t TABLE (TheDateTime DATETIME, InterfaceName VARCHAR(30), UtilizationPercent INT)
INSERT INTO @t
SELECT '12/1/2010 6:00','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','WAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','WAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:06','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:07','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:08','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:09','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:10','WAN Interface', 86
UNION ALL SELECT '12/1/2010 6:11','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:12','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:13','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:00','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:06','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:07','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:08','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:09','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:10','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:11','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:12','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:13','LAN Interface', 93
Solution:
;WITH CTE AS (
SELECT InterfaceName, TheDateTime, UtilizationPercent
,ROW_NUMBER() OVER (
PARTITION BY InterfaceName, CASE WHEN UtilizationPercent > 95 THEN 1 ELSE 0 END
ORDER BY TheDateTime) AS rn
FROM @t
)
SELECT InterfaceName, COUNT(CASE WHEN rn = 10 THEN 1 ELSE NULL END)
FROM CTE
GROUP BY InterfaceName
If you have some data anomalies that cause it to not work, please post some readily consumable INSERTs and I'll have another look.
Note that since your table has "millions of rows," I recommend that you look for the best performing solution you can find. My experience tells me that after 2-3 people look at this problem, you may have a similar number of solutions that you can test for best performance in your case.
Look for a single table scan (or INDEX scan if you've got a proper primary key) in the execution plan and those should perform best. There's more than one solution out there that could do 2 table scans (mine does 1).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
May 21, 2012 at 11:54 pm
dwain.c (5/21/2012)
I'm not sure exactly if this fits the definition of islands and gaps because you seem to have complete data for each minute of time interval.This approach should work if you indeed have no such gaps. Data setup:
DECLARE @t TABLE (TheDateTime DATETIME, InterfaceName VARCHAR(30), UtilizationPercent INT)
INSERT INTO @t
SELECT '12/1/2010 6:00','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','WAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','WAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:06','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:07','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:08','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:09','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:10','WAN Interface', 86
UNION ALL SELECT '12/1/2010 6:11','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:12','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:13','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:00','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:06','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:07','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:08','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:09','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:10','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:11','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:12','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:13','LAN Interface', 93
Solution:
;WITH CTE AS (
SELECT InterfaceName, TheDateTime, UtilizationPercent
,ROW_NUMBER() OVER (
PARTITION BY InterfaceName, CASE WHEN UtilizationPercent > 95 THEN 1 ELSE 0 END
ORDER BY TheDateTime) AS rn
FROM @t
)
SELECT InterfaceName, COUNT(CASE WHEN rn = 10 THEN 1 ELSE NULL END)
FROM CTE
GROUP BY InterfaceName
If you have some data anomalies that cause it to not work, please post some readily consumable INSERTs and I'll have another look.
Note that since your table has "millions of rows," I recommend that you look for the best performing solution you can find. My experience tells me that after 2-3 people look at this problem, you may have a similar number of solutions that you can test for best performance in your case.
Look for a single table scan (or INDEX scan if you've got a proper primary key) in the execution plan and those should perform best. There's more than one solution out there that could do 2 table scans (mine does 1).
Nice!!
Very nice work with the Ordering Dwain. 🙂
May 21, 2012 at 11:57 pm
vinu512 (5/21/2012)
Nice!!Very nice work with the Ordering Dwain. 🙂
Thank you sir! I did enjoy that one.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
May 22, 2012 at 12:21 am
dwain.c (5/21/2012)
vinu512 (5/21/2012)
Nice!!Very nice work with the Ordering Dwain. 🙂
Thank you sir! I did enjoy that one.
I couldn't think of it the way you thought.....its a good trick and will keep it in mind...btw...I enjoyed it too. 🙂
May 22, 2012 at 12:34 am
As it happens once in a while, I posted at the same time someone else did and I quoted the wrong post... so removed from here and put right post below.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2012 at 12:36 am
dwain.c (5/21/2012)
I'm not sure exactly if this fits the definition of islands and gaps because you seem to have complete data for each minute of time interval.This approach should work if you indeed have no such gaps. Data setup:
DECLARE @t TABLE (TheDateTime DATETIME, InterfaceName VARCHAR(30), UtilizationPercent INT)
INSERT INTO @t
SELECT '12/1/2010 6:00','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','WAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','WAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:06','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:07','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:08','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:09','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:10','WAN Interface', 86
UNION ALL SELECT '12/1/2010 6:11','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:12','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:13','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:00','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:06','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:07','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:08','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:09','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:10','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:11','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:12','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:13','LAN Interface', 93
Solution:
;WITH CTE AS (
SELECT InterfaceName, TheDateTime, UtilizationPercent
,ROW_NUMBER() OVER (
PARTITION BY InterfaceName, CASE WHEN UtilizationPercent > 95 THEN 1 ELSE 0 END
ORDER BY TheDateTime) AS rn
FROM @t
)
SELECT InterfaceName, COUNT(CASE WHEN rn = 10 THEN 1 ELSE NULL END)
FROM CTE
GROUP BY InterfaceName
If you have some data anomalies that cause it to not work, please post some readily consumable INSERTs and I'll have another look.
Note that since your table has "millions of rows," I recommend that you look for the best performing solution you can find. My experience tells me that after 2-3 people look at this problem, you may have a similar number of solutions that you can test for best performance in your case.
Look for a single table scan (or INDEX scan if you've got a proper primary key) in the execution plan and those should perform best. There's more than one solution out there that could do 2 table scans (mine does 1).
Check again... 😉
DECLARE @t TABLE (TheDateTime DATETIME, InterfaceName VARCHAR(30), UtilizationPercent INT)
INSERT INTO @t
SELECT '12/1/2010 6:00','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','WAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','WAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','WAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:06','WAN Interface', 93
UNION ALL SELECT '12/1/2010 6:07','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:08','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:09','WAN Interface', 85
UNION ALL SELECT '12/1/2010 6:10','WAN Interface', 86
UNION ALL SELECT '12/1/2010 6:11','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:12','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:13','WAN Interface', 87
UNION ALL SELECT '12/1/2010 6:00','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:01','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:02','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:03','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:04','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:05','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:06','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:07','LAN Interface', 10 -- CHANGED DATA
UNION ALL SELECT '12/1/2010 6:08','LAN Interface', 97
UNION ALL SELECT '12/1/2010 6:09','LAN Interface', 96
UNION ALL SELECT '12/1/2010 6:10','LAN Interface', 98
UNION ALL SELECT '12/1/2010 6:11','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:12','LAN Interface', 99
UNION ALL SELECT '12/1/2010 6:13','LAN Interface', 93
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2012 at 12:47 am
Doesn't it still work Mr. Jeff Moden??....The CTE give the following result set:
InterfaceName TheDateTime UP rn
LAN Interface2010-12-01 06:07:00.000 10 1
LAN Interface2010-12-01 06:13:00.000 93 2
LAN Interface2010-12-01 06:00:00.000 99 1
LAN Interface2010-12-01 06:01:00.000 99 2
LAN Interface2010-12-01 06:02:00.000 97 3
LAN Interface2010-12-01 06:03:00.000 96 4
LAN Interface2010-12-01 06:04:00.000 99 5
LAN Interface2010-12-01 06:05:00.000 98 6
LAN Interface2010-12-01 06:06:00.000 98 7
LAN Interface2010-12-01 06:08:00.000 97 8
LAN Interface2010-12-01 06:09:00.000 96 9
LAN Interface2010-12-01 06:10:00.000 98 10
LAN Interface2010-12-01 06:11:00.000 99 11
LAN Interface2010-12-01 06:12:00.000 99 12
WAN Interface2010-12-01 06:05:00.000 93 1
WAN Interface2010-12-01 06:06:00.000 93 2
WAN Interface2010-12-01 06:07:00.000 87 3
WAN Interface2010-12-01 06:08:00.000 85 4
WAN Interface2010-12-01 06:09:00.000 85 5
WAN Interface2010-12-01 06:10:00.000 86 6
WAN Interface2010-12-01 06:11:00.000 87 7
WAN Interface2010-12-01 06:12:00.000 87 8
WAN Interface2010-12-01 06:13:00.000 87 9
WAN Interface2010-12-01 06:00:00.000 99 1
WAN Interface2010-12-01 06:01:00.000 99 2
WAN Interface2010-12-01 06:02:00.000 97 3
WAN Interface2010-12-01 06:03:00.000 96 4
WAN Interface2010-12-01 06:04:00.000 99 5
And the final result set is:
InterfaceName Count
LAN Interface1
WAN Interface0
May 22, 2012 at 12:55 am
vinu512 (5/22/2012)
Doesn't it still work Mr. Jeff Moden??....The CTE give the following result set:
InterfaceName TheDateTime UP rn
LAN Interface2010-12-01 06:07:00.000 10 1
LAN Interface2010-12-01 06:13:00.000 93 2
LAN Interface2010-12-01 06:00:00.000 99 1
LAN Interface2010-12-01 06:01:00.000 99 2
LAN Interface2010-12-01 06:02:00.000 97 3
LAN Interface2010-12-01 06:03:00.000 96 4
LAN Interface2010-12-01 06:04:00.000 99 5
LAN Interface2010-12-01 06:05:00.000 98 6
LAN Interface2010-12-01 06:06:00.000 98 7
LAN Interface2010-12-01 06:08:00.000 97 8
LAN Interface2010-12-01 06:09:00.000 96 9
LAN Interface2010-12-01 06:10:00.000 98 10
LAN Interface2010-12-01 06:11:00.000 99 11
LAN Interface2010-12-01 06:12:00.000 99 12
WAN Interface2010-12-01 06:05:00.000 93 1
WAN Interface2010-12-01 06:06:00.000 93 2
WAN Interface2010-12-01 06:07:00.000 87 3
WAN Interface2010-12-01 06:08:00.000 85 4
WAN Interface2010-12-01 06:09:00.000 85 5
WAN Interface2010-12-01 06:10:00.000 86 6
WAN Interface2010-12-01 06:11:00.000 87 7
WAN Interface2010-12-01 06:12:00.000 87 8
WAN Interface2010-12-01 06:13:00.000 87 9
WAN Interface2010-12-01 06:00:00.000 99 1
WAN Interface2010-12-01 06:01:00.000 99 2
WAN Interface2010-12-01 06:02:00.000 97 3
WAN Interface2010-12-01 06:03:00.000 96 4
WAN Interface2010-12-01 06:04:00.000 99 5
And the final result set is:
InterfaceName Count
LAN Interface1
WAN Interface0
Look at the very data you've presented. Where is 6:07 and has more than 10 minutes gone by since 6:00 with the levels indicated? I think not. The final answer here should be zero's for both.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2012 at 12:58 am
Dwaine absolutely had the correct idea... we just need to take it one step further to get it to work. Here's the code. It's mostly Dwaine's with some tweaks to make it work on short runs...
;WITH
cteGroup AS
(
SELECT InterfaceName, TheDateTime, UtilizationPercent,
ContiguousGroup =
ROW_NUMBER() OVER
(PARTITION BY InterfaceName, CASE WHEN UtilizationPercent > 95 THEN 1 ELSE 0 END
ORDER BY TheDateTime)
- ROW_NUMBER() OVER
(ORDER BY InterFaceName,TheDateTime)
FROM #t
),
cteSpan AS
(
SELECT InterfaceName,
Span =
ROW_NUMBER() OVER
(PARTITION BY InterfaceName, ContiguousGroup
ORDER BY TheDateTime)
FROM cteGroup
)
SELECT InterfaceName, SUM(CASE WHEN Span = 10 THEN 1 ELSE 0 END)
FROM cteSpan
GROUP BY InterfaceName
;
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2012 at 1:01 am
Look at the very data you've presented. Where is 6:07 and has more than 10 minutes gone by since 6:00 with the levels indicated? I think not. The final answer here should be zero's for both.
Aaah!! yes!!... I forgot about the Time Factor.
May 22, 2012 at 4:55 am
While I was on my way home I kicked myself for not realizing the case it wouldn't work on.
Glad to see someone rose to the occasion and corrected me before the OP used it and didn't notice.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
May 22, 2012 at 9:34 am
OK, after getting it in place and testing some data something came to light that makes this more challenging (I think:)).
The records are not always one minute apart, they can actually be time stamped fairly irregularly because the polling servers that record the data can have delays before writing to the database. So now it's getting tricky.
Rather than just being able to partition and count if there is more than ten in a row, I have to know if there is more than a ten minute time difference from the first timestamp and the last timestamp of a consecutive set of records that had greater than 95% utilization (for a given interface of course).
I think my brain might explode.
May 22, 2012 at 2:34 pm
I'm a dork, I completely missed the last chunk of code that was posted, I was looking at an earlier version. That did the trick! Thank you all again!
Shaunna
May 22, 2012 at 4:10 pm
Nope... not a dork. And that last piece of code doesn't take irregular times in to consideration, either. We'll need to fix it.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply