September 3, 2011 at 11:49 am
clubbavich (9/3/2011)
I haven't even attempted it. To do so would be to execute 3600 select statements connected by UNION ALL. I cannot imagine that that would do anything good for performance.
reading back thro the thread, I am still not clear what your real problem is...pls accept the fact that somethings take a long time to sink in to the little grey matter I have left 😛
do you want code that will optimise the return of serve/device/ max(date)....or is there another question.
sorry if I have misunderstood.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 3, 2011 at 2:28 pm
I'm simply trying to figure out why the RBAR solution to this problem is orders of magnitude faster than the obvious set-based solution. I'd like to figure out what exactly is going wrong here.
September 3, 2011 at 6:33 pm
<pokingheadoutofthetrench>
I disagree that there is an RBAR method here.
The UNION gets a result out of an 18M+ rows table with only 4 reads - you can't get less RBAR can you?
The MAX method on the other hand has to read every row to find the result - hence it is slower - or am I missing something here?
</pokingheadoutofthetrench>
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 3, 2011 at 8:22 pm
clubbavich (9/3/2011)
@JeffHere is how the devices are distributed.
Server_Id, Device_Id
1 has 1540 Device_Id's
2 has 742 Device_Id's
3 has 357 Device_Id's
4 has 26 Device_Id's
5 has 917 Device_Id's
Thanks. I'll put something together tomorrow. Like I said, I think the problem is with the way the indexes are laid out and I'll build a wad of test data to see if I'm right. The information above helps a lot.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 3, 2011 at 9:10 pm
@mister.magoo
In this case, each select statement is returning a single row, instead of a set. While the UNION method works, in the long-term it will simply become another method that needs maintenance each time a server is added or removed, or will require dynamic SQL, which I avoid whenever possible. My initial hope was that there was some language construct that I hadn't thought of that would get me these results in an intelligent manner in a single statement. At this point it's looking unlikely.
Also, depending on future needs, I may end up needing a method to get the latest data timestamp per device per server, which the UNION ALL method is wholly unsuited for.
September 3, 2011 at 11:17 pm
(3630 row(s) affected)
Table 'HOURLY_STATS'. Scan count 3, logical reads 27408, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5413 ms, elapsed time = 2950 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Server\Device\Maxdate....based on 10million rows....is this acceptable?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 4, 2011 at 3:38 pm
How about ~1 milli-second first time, everytime along with shorter times to rebuild the clustered index and maybe no need to ever rebuild it? 😉
The code below is in 3 sections...
1. Build 6 months of test data (more than 15 million rows... takes about 2 minutes to build)
2. Build the indexes I recommend you have including a change to the PK (takes about 4 minutes to build)
3. The code to find the MAX timestamp for each Server_ID (takes 1 milli-second to run after the correct indexes have been built).
The details are in the comments in the code. Of course, everything is done as Temp Tables so as to not accidently whack any of your real tables. 😉
--=====================================================================================================================
-- Build six months worth of test data (more than 15 million rows)
-- NOTHING in this section is a part of the solution. We're just building test data here.
-- Takes 00:02:09 to build the data (15,474,240 rows or six months of samples)
--=====================================================================================================================
--===== Conditionally drop all the temp tables to make reruns easier in SSMS
IF OBJECT_ID('tempdb..#Tally','U') IS NOT NULL DROP TABLE #Tally;
IF OBJECT_ID('tempdb..#SD','U') IS NOT NULL DROP TABLE #SD;
IF OBJECT_ID('tempdb..#SingleSample','U') IS NOT NULL DROP TABLE #SingleSample;
IF OBJECT_ID('tempdb..#SampleTimes','U') IS NOT NULL DROP TABLE #SampleTimes;
IF OBJECT_ID('tempdb..#Hourly_Stats','U') IS NOT NULL DROP TABLE #Hourly_Stats;
GO
--===== Build a temporary Tally Table to help us easily build the test data.
-- As a sidebar, you should have a permanent Tally Table.
--===== Create and populate the Tally table on the fly.
SELECT TOP 11001
IDENTITY(INT,0,1) AS N
INTO #Tally
FROM sys.ALL_Columns ac1,
sys.ALL_Columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE #Tally
ADD CONSTRAINT PK_#Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== This only builds the given requirements for the data and stores it in
-- a working table so we can expand it according to the device count for
-- each Server_ID
SELECT *
INTO #SD
FROM
( --=== Servers and Device count per server
SELECT 1, 1540 UNION ALL
SELECT 2, 742 UNION ALL
SELECT 3, 357 UNION ALL
SELECT 4, 26 UNION ALL
SELECT 5, 917
) d (Server_ID, DeviceCount)
;
--===== This creates a table consisting of the Server_ID and the given number
-- of Device_ID's for each Server_ID
SELECT *
INTO #SingleSample
FROM
( --=== Server/Device ID's per hourly sample)
SELECT sd.Server_ID, t.N
FROM #Tally t
CROSS JOIN #SD sd
WHERE t.N BETWEEN 1 AND sd.DeviceCount
) d (Server_ID, Device_ID)
;
--===== This builds 6 months of hourly sample date/times
SELECT *
INTO #SampleTimes
FROM
( --=== Hourly sample date/times
SELECT TOP (180*24) --180 day times 24 hours
Data_Timestamp = CAST(DATEADD(hh,t.N,'2011') AS SMALLDATETIME)
FROM #Tally t
) d (Data_Timestamp)
;
--===== This cross join builds out the full 6 months of data (over 15 million rows)
SELECT Server_ID = ISNULL(ss.Server_ID,0),
Device_ID = ISNULL(ss.Device_ID,0),
Data_Timestamp = ISNULL(st.Data_Timestamp,0),
Datapoint_A = CAST(RAND(CHECKSUM(NEWID()))* 100 AS REAL),
Datapoint_B = CAST(RAND(CHECKSUM(NEWID()))*1000 AS REAL)
INTO #Hourly_Stats
FROM #SingleSample ss
CROSS JOIN #SampleTimes st
;
GO
--=====================================================================================================================
-- I was correct about your indexes... they make a bit of a mess. The PK sufferes many, many page splits over
-- time and the non-clustered index really doesn't do much for speed when it comes to the particular query that
-- you're trying to run quickly.
--
-- With that in mind, here's what I recommend for indexes on your table to make index rebuilding shorter, prevent
-- fragementation on the PK during most days, and execute the query I wrote to solve your problem in
-- 1 milli-second.
--=====================================================================================================================
--===== Add the corrected PK which cuts page fragmentation to near zero
-- so that defragging it does takes so long.
-- Takes 00:02:29 to run
ALTER TABLE #Hourly_Stats
ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED
(
[Data_Timestamp] ASC,
[Server_Id] ASC,
[Device_Id] ASC --Device_Id is only unique on a per-server basis.
)
;
--===== Add the correct covering index to make the correlated query run super fast.
-- This index will fragment a lot less than you would think even thought
-- the time stamp is in descending order.
-- Takes 00:01:36 to run
CREATE INDEX IX_Hourly_Stats_Cover01
ON #Hourly_Stats (Server_ID ASC, Data_Timestamp DESC)
;
GO
--=====================================================================================================================
-- Then we come to your actual problem. While RBAR is bad, RBR isn't necessarily bad because the optimizer will
-- make it all part of the query as if it were written in a single, well formed view. The following query uses
-- such RBR (not RBAR) and could easily be used as a CROSS APPLY, instead.
--
-- The real key to this bad boy is that 99.99% of the work is already done by derived table "s". If the number
-- of servers doesn't change much, then hardcoding may be ok. If they change more often, it may be worth
-- maintaining the Server_ID's in a separate table which could be updated once a day by a scheduled job. (Simto "data warehousing" tricks)
--=====================================================================================================================
--===== Then comes the magic of a correlated subquery.
-- This takes only 1 millisecond to run.
SELECT s.Server_ID,
Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID)
FROM
( --=== These are the known Server_ID's
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) s (Server_ID)
;
--Jeff Moden
Change is inevitable... Change for the better is not.
September 4, 2011 at 5:04 pm
Hi clubbavich
Please find my response below, it is not intended to be confrontational but I am concerned that people will think (reading this thread) that a set based query such as yours (albeit hard coded for certain server ids), that is so good at achieving the intended result, should be discarded for some other method just because of confusion over the terminology.
I cannot agree that there is anything wrong with a query that performs 5 reads on an 18M+ rows table and I certainly wouldn't consider it to be RBAR.
In that context, here is my reply:
clubbavich (9/3/2011)
@mister.magooIn this case, each select statement is returning a single row, instead of a set.
I have to dispute the language and the premise - a single row result does not mean the code is not set based.
RBAR is used commonly to describe a situation where your code is systematically reading multiple rows unnecessarily.
In your case, you are using the existing indexes perfectly - you couldn't ask for better than 1 read per server id.
While the UNION method works, in the long-term it will simply become another method that needs maintenance each time a server is added or removed, or will require dynamic SQL, which I avoid whenever possible. My initial hope was that there was some language construct that I hadn't thought of that would get me these results in an intelligent manner in a single statement. At this point it's looking unlikely.
Fair point, but your question (repeated later by yourself for clarity) was about why the UNION method was quicker, was it not? That is the question I am addressing.
As for the future, yes the current method requires ongoing maintenance and will not suit a server/device requirement without modification, but for the moment it is quicker because it is surgical in accuracy within the current design.
I am sure Jeff or one of the other experts on the site will help to find a way to speed up the non-hard-coded methods with you.
Best wishes
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 4, 2011 at 5:33 pm
Jeff,
I tried your test data/sample query on my desktop with different results:
(Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) )
The final query took 4 seconds of cpu (and 1.2 seconds elapsed with parallelism allowed) and performed 29076 logical reads.
I played with it a bit and came up with this, which more closely matches the original UNION query but performed in 0 ms with just 18 logical reads... on my machine:
SELECT s.Server_ID,
Max_Data_Timestamp
FROM
( --=== These are the known Server_ID's
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) s (Server_ID)
OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID) ) AS x
;
I don't understand why the performance difference with your query on my machine, but would be interested in your thoughts on the OUTER APPLY method?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 4, 2011 at 5:50 pm
Adding this index to Jeff's table:
CREATE INDEX IX_Hourly_Stats_By_Device_Cover01
ON #Hourly_Stats (Device_ID ASC, Server_ID ASC, Data_Timestamp DESC);
Allows this Device specific query to return data for every server/device in just over 3 secs cpu/1 sec elapsed on my desktop.
SELECT s.Server_ID, s.Device_ID,
Max_Data_Timestamp
FROM
(
SELECT Device_ID, Server_ID
FROM #Hourly_Stats
GROUP BY Device_ID, Server_ID
) s (Device_ID,Server_ID)
OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID AND hs.Device_ID = s.Device_ID) ) AS x
;
IO Stats:
Table '#Hourly_Stats___...____000000000023'. Scan count 3587, logical reads 40620, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Of those 40620 reads, 29096 are from the "s" subquery which is gathering the device/server ids - if they were already in a table, this would be even quicker. (The elapsed time for the subquery is 99% of the whole query)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 4, 2011 at 6:09 pm
mister.magoo (9/4/2011)
Jeff,I tried your test data/sample query on my desktop with different results:
(Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) )
The final query took 4 seconds of cpu (and 1.2 seconds elapsed with parallelism allowed) and performed 29076 logical reads.
I played with it a bit and came up with this, which more closely matches the original UNION query but performed in 0 ms with just 18 logical reads... on my machine:
I'm not sure what the difference is... here're the stats from my machine for the "Server_ID" based code I wrote...
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(5 row(s) affected)
Table '#Hourly_Stats_______________________________________________________________________________________________________00000000007B'.
Scan count 5, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
The OUTER APPLY does, infact, seem a bit easier on IO (according to the execution plan) than the CROSS APPLY.
Maybe 64 bit isn't all that it's cracked up to be: 😀
Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86)
Nov 24 2008 13:01:59
Copyright (c) 1988-2005 Microsoft Corporation
Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 3)
--Jeff Moden
Change is inevitable... Change for the better is not.
September 4, 2011 at 6:40 pm
Jeff,
Attached query execution plan for the two queries, your version first, then mine from my machine.
From your stats, I would say that the optimizer has decided on a different plan for your query on my machine...how does it compare to yours?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 5, 2011 at 10:55 am
I have run Jeff's code on two different machines both with 2008R2 (both dev ed)
and I cannot get the results he did.
My results are almost identical to MM's and the execution plans are the same as MM.
......however, just for fun (and curiousity) I also ran the code on a 2000 Std Ed box.
guess what...same peformance as Jeff described...almost instantaneous.
have attached a jpg of the sqlplan (not sure how to save it in 2000)
Edit>>> have run a 2005 Std Ed.....nearly as quick as 2000 🙂
plan attached
(5 row(s) affected)
Table '#Hourly_Stats_______________000000000008'. Scan count 5, logical reads 15, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 28 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
...thoughts anyone??
regards
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 5, 2011 at 2:31 pm
Have been playing around a little more and was keen to see if any improvement could be made on SQL 2008 R2...especially getting the server AND device\max date
ended up with an idea on indexed views....so had to alter Jeffs code to create the tables in tempdb
the indexes are
ALTER TABLE Hourly_Stats
ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED
(
[Data_Timestamp] ASC,
[Server_Id] ASC,
[Device_Id] ASC --Device_Id is only unique on a per-server basis.
)
;
CREATE NONCLUSTERED INDEX [IX_HS1] ON [dbo].[Hourly_Stats]
(
[Server_ID] ASC,
[Device_ID] ASC
)
;
GO
---+++++++ Added following view and index on view
CREATE VIEW DBO.VW_GAH
WITH SCHEMABINDING
AS SELECT DBO.HOURLY_STATS.SERVER_ID AS COL_1 ,
DBO.HOURLY_STATS.DEVICE_ID AS COL_2 ,
COUNT_BIG( * )AS COL_3
FROM DBO.HOURLY_STATS
GROUP BY DBO.HOURLY_STATS.SERVER_ID ,
DBO.HOURLY_STATS.DEVICE_ID;
GO
CREATE UNIQUE CLUSTERED INDEX [IX_vwgah] ON [dbo].[vw_gah]
(
[col_1] ASC,
[col_2] ASC
)
GO
using MMs version
SELECT S.SERVER_ID ,
S.DEVICE_ID ,
MAX_DATA_TIMESTAMP
FROM
(
SELECT DEVICE_ID ,
SERVER_ID
FROM HOURLY_STATS
GROUP BY DEVICE_ID ,
SERVER_ID )AS S( DEVICE_ID , SERVER_ID )
OUTER APPLY(
SELECT MAX_DATA_TIMESTAMP = (
SELECT MAX( DATA_TIMESTAMP )
FROM HOURLY_STATS AS HS
WHERE HS.SERVER_ID = S.SERVER_ID
AND HS.DEVICE_ID = S.DEVICE_ID ))AS X;
am getting results
(3582 row(s) affected)
Table 'Hourly_Stats'. Scan count 3582, logical reads 11442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'VW_GAH'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 114 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
complete set up code:
USE tempdb
GO
---=====creates permananet tables version
--=====================================================================================================================
-- Build six months worth of test data (more than 15 million rows)
-- NOTHING in this section is a part of the solution. We're just building test data here.
-- Takes 35 secs to build the data (15,474,240 rows or six months of samples)
--=====================================================================================================================
--===== Conditionally drop all the temp tables to make reruns easier in SSMS
IF OBJECT_ID('tempdb..Tally','U') IS NOT NULL DROP TABLE Tally;
IF OBJECT_ID('tempdb..SD','U') IS NOT NULL DROP TABLE SD;
IF OBJECT_ID('tempdb..SingleSample','U') IS NOT NULL DROP TABLE SingleSample;
IF OBJECT_ID('tempdb..SampleTimes','U') IS NOT NULL DROP TABLE SampleTimes;
IF OBJECT_ID('tempdb..Hourly_Stats','U') IS NOT NULL DROP TABLE Hourly_Stats;
GO
--===== Build a temporary Tally Table to help us easily build the test data.
-- As a sidebar, you should have a permanent Tally Table.
--===== Create and populate the Tally table on the fly.
SELECT TOP 11001
IDENTITY(INT,0,1) AS N
INTO Tally
FROM sys.ALL_Columns ac1,
sys.ALL_Columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== This only builds the given requirements for the data and stores it in
-- a working table so we can expand it according to the device count for
-- each Server_ID
SELECT *
INTO SD
FROM
( --=== Servers and Device count per server
SELECT 1, 1540 UNION ALL
SELECT 2, 742 UNION ALL
SELECT 3, 357 UNION ALL
SELECT 4, 26 UNION ALL
SELECT 5, 917
) d (Server_ID, DeviceCount)
;
--===== This creates a table consisting of the Server_ID and the given number
-- of Device_ID's for each Server_ID
SELECT *
INTO SingleSample
FROM
( --=== Server/Device ID's per hourly sample)
SELECT sd.Server_ID, t.N
FROM Tally t
CROSS JOIN SD sd
WHERE t.N BETWEEN 1 AND sd.DeviceCount
) d (Server_ID, Device_ID)
;
--===== This builds 6 months of hourly sample date/times
SELECT *
INTO SampleTimes
FROM
( --=== Hourly sample date/times
SELECT TOP (180*24) --180 day times 24 hours
Data_Timestamp = CAST(DATEADD(hh,t.N,'2011') AS SMALLDATETIME)
FROM Tally t
) d (Data_Timestamp)
;
--===== This cross join builds out the full 6 months of data (over 15 million rows)
SELECT Server_ID = ISNULL(ss.Server_ID,0),
Device_ID = ISNULL(ss.Device_ID,0),
Data_Timestamp = ISNULL(st.Data_Timestamp,0),
Datapoint_A = CAST(RAND(CHECKSUM(NEWID()))* 100 AS REAL),
Datapoint_B = CAST(RAND(CHECKSUM(NEWID()))*1000 AS REAL)
INTO Hourly_Stats
FROM SingleSample ss
CROSS JOIN SampleTimes st
;
GO
ALTER TABLE Hourly_Stats
ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED
(
[Data_Timestamp] ASC,
[Server_Id] ASC,
[Device_Id] ASC --Device_Id is only unique on a per-server basis.
)
;
CREATE NONCLUSTERED INDEX [IX_HS1] ON [dbo].[Hourly_Stats]
(
[Server_ID] ASC,
[Device_ID] ASC
)
;
GO
---+++++++ Added following view and index on view
CREATE VIEW DBO.VW_GAH
WITH SCHEMABINDING
AS SELECT DBO.HOURLY_STATS.SERVER_ID AS COL_1 ,
DBO.HOURLY_STATS.DEVICE_ID AS COL_2 ,
COUNT_BIG( * )AS COL_3
FROM DBO.HOURLY_STATS
GROUP BY DBO.HOURLY_STATS.SERVER_ID ,
DBO.HOURLY_STATS.DEVICE_ID;
GO
CREATE UNIQUE CLUSTERED INDEX [IX_vwgah] ON [dbo].[vw_gah]
(
[col_1] ASC,
[col_2] ASC
)
GO
---- MODEN server
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT s.Server_ID,
Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID)
FROM
(
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) s (Server_ID)
;
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
--- MAGOO server
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT s.Server_ID,
Max_Data_Timestamp
FROM
(
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) s (Server_ID)
OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID) ) AS x
;
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
---- MAGOO server / device
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT S.SERVER_ID ,
S.DEVICE_ID ,
MAX_DATA_TIMESTAMP
FROM
(
SELECT DEVICE_ID ,
SERVER_ID
FROM HOURLY_STATS
GROUP BY DEVICE_ID ,
SERVER_ID )AS S( DEVICE_ID , SERVER_ID )
OUTER APPLY(
SELECT MAX_DATA_TIMESTAMP = (
SELECT MAX( DATA_TIMESTAMP )
FROM HOURLY_STATS AS HS
WHERE HS.SERVER_ID = S.SERVER_ID
AND HS.DEVICE_ID = S.DEVICE_ID ))AS X;
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 6, 2011 at 11:28 am
Thanks for all your replies and help everyone!
@mister.magoo
Ah, I see your point now. This is not a RBAR problem. I'm grateful for your input. The reason that I'm so worried about code maintenance is that I'm basically the only person employed here that has sufficient enough knowledge in SQL to do it. If I find myself no longer employed here at some point, I don't want to leave the company any surprises that they aren't equipped to handle. If this weren't the case, then I would be more than willing to simply use the hard-coded method.
As it turns out, there IS a "Servers" table in my database. There is also a "Devices" table as well. I don't have any Foreign Key constraints on the hourly stats table because it is the most important factor when regarding this data, and if some rows are orphans, it isn't as significant of an issue than having the synchronization job failing due to failed Foreign Key constraints issues. Especially when those constraint issues are likely to be resolved the next time synchronization occurs.
So, I modified Jeff's and mister.magoo's solution to use this table as the source for the Server_Id's.
As far as Jeff's solution, and mister.magoo's modified solution: They're even faster than the UNION ALL method I've been using!
mister.magoo's OUTER APPLY method does take a bit longer to compile the execution plan initially (~6ms), but after that, both queries run in ~6ms total. This is a query that works well for what I need it to do.
As I am on SQL Server 2k5, Jeff's query would work just fine, however I think I prefer mister.magoo's method simply because its performance seems to be more consistent across newer versions of SQL Server. Also, modifying the code to get the most recent data_timestamp per device, mister.magoo's code worked much much faster (when having the original index configuration AND jeff's clustered index as a non-clustered index).
Here is what I've come up with: SELECT s.Server_Id, s.Device_Id,
Max_Usage_Timestamp FROM
(
SELECT Server_Id, Device_Id FROM Devices
) s (Server_Id, Device_Id)
OUTER APPLY
(SELECT Max_Usage_Timestamp =
(SELECT MAX(Usage_Timestamp)
FROM #Hourly_Stats hs
WHERE hs.Server_Id = s.Server_Id AND hs.Device_Id = s.Device_Id)
) AS x
/*Performance Stats:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 4 ms.
(3582 row(s) affected)
Table '#Hourly_Stats'. Scan count 3582, logical reads 11442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Devices'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 27 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
*/
Note, that code above takes a LONG time (>20 minutes) without having an index of Server_Id, Device_Id, and Data_Timestamp.
@jeff, Regarding the updated indexes, the majority of the time this table used, the query is retrieving data for a particular device over a period of time.
As an example:
DECLARE @Param_A tinyint, @Param_B Int, @Param_C smalldatetime, @Param_D smalldatetime
SET @Param_A = 2
SET @Param_B = 750
SET @Param_C = '2011-06-01'
SET @Param_D = '2011-07-01'
SELECT Server_Id, Device_Id, Data_Timestamp, Datapoint_A, Datapoint_B
FROM Hourly_Stats
WHERE Server_Id = @Param_A AND Device_Id = @Param_B
AND Data_Timestamp >= @Param_C AND Data_Timestamp < @Param_D
Implementing your suggested indexes would severely impact the performance of these types of queries. What would you recommend to address those types of queries? Should I keep the clustered index as is, and add your recommended index as a non-clustered index? Or vice-versa? I'm afraid that I don't know much about tuning indexes, but I can recognize that I absolutely need an index for Server_Id, Device_Id, Data_Timestamp.
Currently the data is synchronized every 6 hours, and all servers are synchronized at once (The data is inserted into a temporary table for each server in turn, then the data in the temp table is inserted into the Hourly_Stats table.)
The code provided for getting the most recent data_timestamp per Server does need that index to perform quickly, I just don't know if I should have it as the clustered index or leave the clustered index as it is.
@j-2 Livingston
Thanks for your hard work and effort. However, mister.magoo's solution works well for the per server, per device situation as well, as long as the proper indexes are there. Although I don't know if that performance is reflected in SQL 2008 R2 as well. I only have an instance of SQL 2k5 to test on, but I also don't want to code myself into a dead end in the event of an upgrade.
Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply