Looking for help taking the RBAR from this problem

Question

Looking for help taking the RBAR from this problem

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 1

clubbavich (9/3/2011)
I haven't even attempted it. To do so would be to execute 3600 select statements connected by UNION ALL. I cannot imagine that that would do anything good for performance.

reading back thro the thread, I am still not clear what your real problem is...pls accept the fact that somethings take a long time to sink in to the little grey matter I have left 😛

do you want code that will optimise the return of serve/device/ max(date)....or is there another question.

sorry if I have misunderstood.

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

clubbavich Default port Points: 1449 More actions · Answer 2

I'm simply trying to figure out why the RBAR solution to this problem is orders of magnitude faster than the obvious set-based solution. I'd like to figure out what exactly is going wrong here.

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

I disagree that there is an RBAR method here.

The UNION gets a result out of an 18M+ rows table with only 4 reads - you can't get less RBAR can you?

The MAX method on the other hand has to read every row to find the result - hence it is slower - or am I missing something here?

</pokingheadoutofthetrench>

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 4

clubbavich (9/3/2011)
@Jeff
Here is how the devices are distributed.
Server_Id, Device_Id
1 has 1540 Device_Id's
2 has 742 Device_Id's
3 has 357 Device_Id's
4 has 26 Device_Id's
5 has 917 Device_Id's

Thanks. I'll put something together tomorrow. Like I said, I think the problem is with the way the indexes are laid out and I'll build a wad of test data to see if I'm right. The information above helps a lot.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

clubbavich Default port Points: 1449 More actions · Answer 5

@mister.magoo

In this case, each select statement is returning a single row, instead of a set. While the UNION method works, in the long-term it will simply become another method that needs maintenance each time a server is added or removed, or will require dynamic SQL, which I avoid whenever possible. My initial hope was that there was some language construct that I hadn't thought of that would get me these results in an intelligent manner in a single statement. At this point it's looking unlikely.

Also, depending on future needs, I may end up needing a method to get the latest data timestamp per device per server, which the UNION ALL method is wholly unsuited for.

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 6

(3630 row(s) affected)

Table 'HOURLY_STATS'. Scan count 3, logical reads 27408, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 5413 ms, elapsed time = 2950 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Server\Device\Maxdate....based on 10million rows....is this acceptable?

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 7

How about ~1 milli-second first time, everytime along with shorter times to rebuild the clustered index and maybe no need to ever rebuild it? 😉

The code below is in 3 sections...

1. Build 6 months of test data (more than 15 million rows... takes about 2 minutes to build)

2. Build the indexes I recommend you have including a change to the PK (takes about 4 minutes to build)

3. The code to find the MAX timestamp for each Server_ID (takes 1 milli-second to run after the correct indexes have been built).

The details are in the comments in the code. Of course, everything is done as Temp Tables so as to not accidently whack any of your real tables. 😉

--=====================================================================================================================

-- Build six months worth of test data (more than 15 million rows)

-- NOTHING in this section is a part of the solution. We're just building test data here.

-- Takes 00:02:09 to build the data (15,474,240 rows or six months of samples)

--=====================================================================================================================

--===== Conditionally drop all the temp tables to make reruns easier in SSMS

IF OBJECT_ID('tempdb..#Tally','U') IS NOT NULL DROP TABLE #Tally;

IF OBJECT_ID('tempdb..#SD','U') IS NOT NULL DROP TABLE #SD;

IF OBJECT_ID('tempdb..#SingleSample','U') IS NOT NULL DROP TABLE #SingleSample;

IF OBJECT_ID('tempdb..#SampleTimes','U') IS NOT NULL DROP TABLE #SampleTimes;

IF OBJECT_ID('tempdb..#Hourly_Stats','U') IS NOT NULL DROP TABLE #Hourly_Stats;

GO

--===== Build a temporary Tally Table to help us easily build the test data.

-- As a sidebar, you should have a permanent Tally Table.

--===== Create and populate the Tally table on the fly.

SELECT TOP 11001

IDENTITY(INT,0,1) AS N

INTO #Tally

FROM sys.ALL_Columns ac1,

sys.ALL_Columns ac2

;

--===== Add a CLUSTERED Primary Key to maximize performance

ALTER TABLE #Tally

ADD CONSTRAINT PK_#Tally_N

PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

;

--===== This only builds the given requirements for the data and stores it in

-- a working table so we can expand it according to the device count for

-- each Server_ID

SELECT *

INTO #SD

FROM

( --=== Servers and Device count per server

SELECT 1, 1540 UNION ALL

SELECT 2, 742 UNION ALL

SELECT 3, 357 UNION ALL

SELECT 4, 26 UNION ALL

SELECT 5, 917

) d (Server_ID, DeviceCount)

;

--===== This creates a table consisting of the Server_ID and the given number

-- of Device_ID's for each Server_ID

SELECT *

INTO #SingleSample

FROM

( --=== Server/Device ID's per hourly sample)

SELECT sd.Server_ID, t.N

FROM #Tally t

CROSS JOIN #SD sd

WHERE t.N BETWEEN 1 AND sd.DeviceCount

) d (Server_ID, Device_ID)

;

--===== This builds 6 months of hourly sample date/times

SELECT *

INTO #SampleTimes

FROM

( --=== Hourly sample date/times

SELECT TOP (180*24) --180 day times 24 hours

Data_Timestamp = CAST(DATEADD(hh,t.N,'2011') AS SMALLDATETIME)

FROM #Tally t

) d (Data_Timestamp)

;

--===== This cross join builds out the full 6 months of data (over 15 million rows)

SELECT Server_ID = ISNULL(ss.Server_ID,0),

Device_ID = ISNULL(ss.Device_ID,0),

Data_Timestamp = ISNULL(st.Data_Timestamp,0),

Datapoint_A = CAST(RAND(CHECKSUM(NEWID()))* 100 AS REAL),

Datapoint_B = CAST(RAND(CHECKSUM(NEWID()))*1000 AS REAL)

INTO #Hourly_Stats

FROM #SingleSample ss

CROSS JOIN #SampleTimes st

;

GO

--=====================================================================================================================

-- I was correct about your indexes... they make a bit of a mess. The PK sufferes many, many page splits over

-- time and the non-clustered index really doesn't do much for speed when it comes to the particular query that

-- you're trying to run quickly.

--

-- With that in mind, here's what I recommend for indexes on your table to make index rebuilding shorter, prevent

-- fragementation on the PK during most days, and execute the query I wrote to solve your problem in

-- 1 milli-second.

--=====================================================================================================================

--===== Add the corrected PK which cuts page fragmentation to near zero

-- so that defragging it does takes so long.

-- Takes 00:02:29 to run

ALTER TABLE #Hourly_Stats

ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED

(

[Data_Timestamp] ASC,

[Server_Id] ASC,

[Device_Id] ASC --Device_Id is only unique on a per-server basis.

)

;

--===== Add the correct covering index to make the correlated query run super fast.

-- This index will fragment a lot less than you would think even thought

-- the time stamp is in descending order.

-- Takes 00:01:36 to run

CREATE INDEX IX_Hourly_Stats_Cover01

ON #Hourly_Stats (Server_ID ASC, Data_Timestamp DESC)

;

GO

--=====================================================================================================================

-- Then we come to your actual problem. While RBAR is bad, RBR isn't necessarily bad because the optimizer will

-- make it all part of the query as if it were written in a single, well formed view. The following query uses

-- such RBR (not RBAR) and could easily be used as a CROSS APPLY, instead.

--

-- The real key to this bad boy is that 99.99% of the work is already done by derived table "s". If the number

-- of servers doesn't change much, then hardcoding may be ok. If they change more often, it may be worth

-- maintaining the Server_ID's in a separate table which could be updated once a day by a scheduled job. (Simto "data warehousing" tricks)

--=====================================================================================================================

--===== Then comes the magic of a correlated subquery.

-- This takes only 1 millisecond to run.

SELECT s.Server_ID,

Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID)

FROM

( --=== These are the known Server_ID's

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 5

) s (Server_ID)

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 8

Hi clubbavich

Please find my response below, it is not intended to be confrontational but I am concerned that people will think (reading this thread) that a set based query such as yours (albeit hard coded for certain server ids), that is so good at achieving the intended result, should be discarded for some other method just because of confusion over the terminology.

I cannot agree that there is anything wrong with a query that performs 5 reads on an 18M+ rows table and I certainly wouldn't consider it to be RBAR.

In that context, here is my reply:

clubbavich (9/3/2011)
@mister.magoo
In this case, each select statement is returning a single row, instead of a set.

I have to dispute the language and the premise - a single row result does not mean the code is not set based.

RBAR is used commonly to describe a situation where your code is systematically reading multiple rows unnecessarily.

In your case, you are using the existing indexes perfectly - you couldn't ask for better than 1 read per server id.

While the UNION method works, in the long-term it will simply become another method that needs maintenance each time a server is added or removed, or will require dynamic SQL, which I avoid whenever possible. My initial hope was that there was some language construct that I hadn't thought of that would get me these results in an intelligent manner in a single statement. At this point it's looking unlikely.

Fair point, but your question (repeated later by yourself for clarity) was about why the UNION method was quicker, was it not? That is the question I am addressing.

As for the future, yes the current method requires ongoing maintenance and will not suit a server/device requirement without modification, but for the moment it is quicker because it is surgical in accuracy within the current design.

I am sure Jeff or one of the other experts on the site will help to find a way to speed up the non-hard-coded methods with you.

Best wishes

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 9

Jeff,

I tried your test data/sample query on my desktop with different results:

(Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) )

The final query took 4 seconds of cpu (and 1.2 seconds elapsed with parallelism allowed) and performed 29076 logical reads.

I played with it a bit and came up with this, which more closely matches the original UNION query but performed in 0 ms with just 18 logical reads... on my machine:

SELECT s.Server_ID,

Max_Data_Timestamp

FROM

( --=== These are the known Server_ID's

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 5

) s (Server_ID)

OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID) ) AS x

;

I don't understand why the performance difference with your query on my machine, but would be interested in your thoughts on the OUTER APPLY method?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47174 More actions · Answer 10

Adding this index to Jeff's table:

CREATE INDEX IX_Hourly_Stats_By_Device_Cover01

ON #Hourly_Stats (Device_ID ASC, Server_ID ASC, Data_Timestamp DESC);

Allows this Device specific query to return data for every server/device in just over 3 secs cpu/1 sec elapsed on my desktop.

SELECT s.Server_ID, s.Device_ID,

Max_Data_Timestamp

FROM

(

SELECT Device_ID, Server_ID

FROM #Hourly_Stats

GROUP BY Device_ID, Server_ID

) s (Device_ID,Server_ID)

OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM #Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID AND hs.Device_ID = s.Device_ID) ) AS x

;

IO Stats:

Table '#Hourly_Stats___...____000000000023'. Scan count 3587, logical reads 40620, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Of those 40620 reads, 29096 are from the "s" subquery which is gathering the device/server ids - if they were already in a table, this would be even quicker. (The elapsed time for the subquery is 99% of the whole query)

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 11

mister.magoo (9/4/2011)
Jeff,
I tried your test data/sample query on my desktop with different results:
(Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Jun 17 2011 00:54:03 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) )
The final query took 4 seconds of cpu (and 1.2 seconds elapsed with parallelism allowed) and performed 29076 logical reads.
I played with it a bit and came up with this, which more closely matches the original UNION query but performed in 0 ms with just 18 logical reads... on my machine:

I'm not sure what the difference is... here're the stats from my machine for the "Server_ID" based code I wrote...

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(5 row(s) affected)

Table '#Hourly_Stats_______________________________________________________________________________________________________00000000007B'.

Scan count 5, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

The OUTER APPLY does, infact, seem a bit easier on IO (according to the execution plan) than the CROSS APPLY.

Maybe 64 bit isn't all that it's cracked up to be: 😀

Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86)

Nov 24 2008 13:01:59

Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 3)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

Jeff,

Attached query execution plan for the two queries, your version first, then mine from my machine.

From your stats, I would say that the optimizer has decided on a different plan for your query on my machine...how does it compare to yours?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 13

I have run Jeff's code on two different machines both with 2008R2 (both dev ed)

and I cannot get the results he did.

My results are almost identical to MM's and the execution plans are the same as MM.

......however, just for fun (and curiousity) I also ran the code on a 2000 Std Ed box.

guess what...same peformance as Jeff described...almost instantaneous.

have attached a jpg of the sqlplan (not sure how to save it in 2000)

Edit>>> have run a 2005 Std Ed.....nearly as quick as 2000 🙂

plan attached

(5 row(s) affected)

Table '#Hourly_Stats_______________000000000008'. Scan count 5, logical reads 15, physical reads 0,

read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 28 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

...thoughts anyone??

regards

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 14

Have been playing around a little more and was keen to see if any improvement could be made on SQL 2008 R2...especially getting the server AND device\max date

ended up with an idea on indexed views....so had to alter Jeffs code to create the tables in tempdb

the indexes are

ALTER TABLE Hourly_Stats

ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED

(

[Data_Timestamp] ASC,

[Server_Id] ASC,

[Device_Id] ASC --Device_Id is only unique on a per-server basis.

)

;

CREATE NONCLUSTERED INDEX [IX_HS1] ON [dbo].[Hourly_Stats]

(

[Server_ID] ASC,

[Device_ID] ASC

)

;

GO

---+++++++ Added following view and index on view

CREATE VIEW DBO.VW_GAH

WITH SCHEMABINDING

AS SELECT DBO.HOURLY_STATS.SERVER_ID AS COL_1 ,

DBO.HOURLY_STATS.DEVICE_ID AS COL_2 ,

COUNT_BIG( * )AS COL_3

FROM DBO.HOURLY_STATS

GROUP BY DBO.HOURLY_STATS.SERVER_ID ,

DBO.HOURLY_STATS.DEVICE_ID;

GO

CREATE UNIQUE CLUSTERED INDEX [IX_vwgah] ON [dbo].[vw_gah]

(

[col_1] ASC,

[col_2] ASC

)

GO

using MMs version

SELECT S.SERVER_ID ,

S.DEVICE_ID ,

MAX_DATA_TIMESTAMP

FROM

(

SELECT DEVICE_ID ,

SERVER_ID

FROM HOURLY_STATS

GROUP BY DEVICE_ID ,

SERVER_ID )AS S( DEVICE_ID , SERVER_ID )

OUTER APPLY(

SELECT MAX_DATA_TIMESTAMP = (

SELECT MAX( DATA_TIMESTAMP )

FROM HOURLY_STATS AS HS

WHERE HS.SERVER_ID = S.SERVER_ID

AND HS.DEVICE_ID = S.DEVICE_ID ))AS X;

am getting results

(3582 row(s) affected)

Table 'Hourly_Stats'. Scan count 3582, logical reads 11442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'VW_GAH'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 47 ms, elapsed time = 114 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

complete set up code:

USE tempdb

GO

---=====creates permananet tables version

--=====================================================================================================================

-- Build six months worth of test data (more than 15 million rows)

-- NOTHING in this section is a part of the solution. We're just building test data here.

-- Takes 35 secs to build the data (15,474,240 rows or six months of samples)

--=====================================================================================================================

--===== Conditionally drop all the temp tables to make reruns easier in SSMS

IF OBJECT_ID('tempdb..Tally','U') IS NOT NULL DROP TABLE Tally;

IF OBJECT_ID('tempdb..SD','U') IS NOT NULL DROP TABLE SD;

IF OBJECT_ID('tempdb..SingleSample','U') IS NOT NULL DROP TABLE SingleSample;

IF OBJECT_ID('tempdb..SampleTimes','U') IS NOT NULL DROP TABLE SampleTimes;

IF OBJECT_ID('tempdb..Hourly_Stats','U') IS NOT NULL DROP TABLE Hourly_Stats;

GO

--===== Build a temporary Tally Table to help us easily build the test data.

-- As a sidebar, you should have a permanent Tally Table.

--===== Create and populate the Tally table on the fly.

SELECT TOP 11001

IDENTITY(INT,0,1) AS N

INTO Tally

FROM sys.ALL_Columns ac1,

sys.ALL_Columns ac2

;

--===== Add a CLUSTERED Primary Key to maximize performance

ALTER TABLE Tally

ADD CONSTRAINT PK_Tally_N

PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

;

--===== This only builds the given requirements for the data and stores it in

-- a working table so we can expand it according to the device count for

-- each Server_ID

SELECT *

INTO SD

FROM

( --=== Servers and Device count per server

SELECT 1, 1540 UNION ALL

SELECT 2, 742 UNION ALL

SELECT 3, 357 UNION ALL

SELECT 4, 26 UNION ALL

SELECT 5, 917

) d (Server_ID, DeviceCount)

;

--===== This creates a table consisting of the Server_ID and the given number

-- of Device_ID's for each Server_ID

SELECT *

INTO SingleSample

FROM

( --=== Server/Device ID's per hourly sample)

SELECT sd.Server_ID, t.N

FROM Tally t

CROSS JOIN SD sd

WHERE t.N BETWEEN 1 AND sd.DeviceCount

) d (Server_ID, Device_ID)

;

--===== This builds 6 months of hourly sample date/times

SELECT *

INTO SampleTimes

FROM

( --=== Hourly sample date/times

SELECT TOP (180*24) --180 day times 24 hours

Data_Timestamp = CAST(DATEADD(hh,t.N,'2011') AS SMALLDATETIME)

FROM Tally t

) d (Data_Timestamp)

;

--===== This cross join builds out the full 6 months of data (over 15 million rows)

SELECT Server_ID = ISNULL(ss.Server_ID,0),

Device_ID = ISNULL(ss.Device_ID,0),

Data_Timestamp = ISNULL(st.Data_Timestamp,0),

Datapoint_A = CAST(RAND(CHECKSUM(NEWID()))* 100 AS REAL),

Datapoint_B = CAST(RAND(CHECKSUM(NEWID()))*1000 AS REAL)

INTO Hourly_Stats

FROM SingleSample ss

CROSS JOIN SampleTimes st

;

GO

ALTER TABLE Hourly_Stats

ADD CONSTRAINT [PK_Hourly_Stats] PRIMARY KEY CLUSTERED

(

[Data_Timestamp] ASC,

[Server_Id] ASC,

[Device_Id] ASC --Device_Id is only unique on a per-server basis.

)

;

CREATE NONCLUSTERED INDEX [IX_HS1] ON [dbo].[Hourly_Stats]

(

[Server_ID] ASC,

[Device_ID] ASC

)

;

GO

---+++++++ Added following view and index on view

CREATE VIEW DBO.VW_GAH

WITH SCHEMABINDING

AS SELECT DBO.HOURLY_STATS.SERVER_ID AS COL_1 ,

DBO.HOURLY_STATS.DEVICE_ID AS COL_2 ,

COUNT_BIG( * )AS COL_3

FROM DBO.HOURLY_STATS

GROUP BY DBO.HOURLY_STATS.SERVER_ID ,

DBO.HOURLY_STATS.DEVICE_ID;

GO

CREATE UNIQUE CLUSTERED INDEX [IX_vwgah] ON [dbo].[vw_gah]

(

[col_1] ASC,

[col_2] ASC

)

GO

---- MODEN server

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT s.Server_ID,

Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID)

FROM

(

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 5

) s (Server_ID)

;

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

--- MAGOO server

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT s.Server_ID,

Max_Data_Timestamp

FROM

(

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 5

) s (Server_ID)

OUTER APPLY (SELECT Max_Data_Timestamp = (SELECT MAX(Data_Timestamp) FROM Hourly_Stats hs WHERE hs.Server_ID = s.Server_ID) ) AS x

;

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

---- MAGOO server / device

SET STATISTICS IO ON

SET STATISTICS TIME ON

SELECT S.SERVER_ID ,

S.DEVICE_ID ,

MAX_DATA_TIMESTAMP

FROM

(

SELECT DEVICE_ID ,

SERVER_ID

FROM HOURLY_STATS

GROUP BY DEVICE_ID ,

SERVER_ID )AS S( DEVICE_ID , SERVER_ID )

OUTER APPLY(

SELECT MAX_DATA_TIMESTAMP = (

SELECT MAX( DATA_TIMESTAMP )

FROM HOURLY_STATS AS HS

WHERE HS.SERVER_ID = S.SERVER_ID

AND HS.DEVICE_ID = S.DEVICE_ID ))AS X;

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

clubbavich Default port Points: 1449 More actions · Answer 15

Thanks for all your replies and help everyone!

@mister.magoo

Ah, I see your point now. This is not a RBAR problem. I'm grateful for your input. The reason that I'm so worried about code maintenance is that I'm basically the only person employed here that has sufficient enough knowledge in SQL to do it. If I find myself no longer employed here at some point, I don't want to leave the company any surprises that they aren't equipped to handle. If this weren't the case, then I would be more than willing to simply use the hard-coded method.

As it turns out, there IS a "Servers" table in my database. There is also a "Devices" table as well. I don't have any Foreign Key constraints on the hourly stats table because it is the most important factor when regarding this data, and if some rows are orphans, it isn't as significant of an issue than having the synchronization job failing due to failed Foreign Key constraints issues. Especially when those constraint issues are likely to be resolved the next time synchronization occurs.

So, I modified Jeff's and mister.magoo's solution to use this table as the source for the Server_Id's.

As far as Jeff's solution, and mister.magoo's modified solution: They're even faster than the UNION ALL method I've been using!

mister.magoo's OUTER APPLY method does take a bit longer to compile the execution plan initially (~6ms), but after that, both queries run in ~6ms total. This is a query that works well for what I need it to do.

As I am on SQL Server 2k5, Jeff's query would work just fine, however I think I prefer mister.magoo's method simply because its performance seems to be more consistent across newer versions of SQL Server. Also, modifying the code to get the most recent data_timestamp per device, mister.magoo's code worked much much faster (when having the original index configuration AND jeff's clustered index as a non-clustered index).

Here is what I've come up with: SELECT s.Server_Id, s.Device_Id,

Max_Usage_Timestamp FROM

(

SELECT Server_Id, Device_Id FROM Devices

) s (Server_Id, Device_Id)

OUTER APPLY

(SELECT Max_Usage_Timestamp =

(SELECT MAX(Usage_Timestamp)

FROM #Hourly_Stats hs

WHERE hs.Server_Id = s.Server_Id AND hs.Device_Id = s.Device_Id)

) AS x

/*Performance Stats:

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 4 ms.

(3582 row(s) affected)

Table '#Hourly_Stats'. Scan count 3582, logical reads 11442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Devices'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 27 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

*/

Note, that code above takes a LONG time (>20 minutes) without having an index of Server_Id, Device_Id, and Data_Timestamp.

@jeff, Regarding the updated indexes, the majority of the time this table used, the query is retrieving data for a particular device over a period of time.

As an example:

DECLARE @Param_A tinyint, @Param_B Int, @Param_C smalldatetime, @Param_D smalldatetime

SET @Param_A = 2

SET @Param_B = 750

SET @Param_C = '2011-06-01'

SET @Param_D = '2011-07-01'

SELECT Server_Id, Device_Id, Data_Timestamp, Datapoint_A, Datapoint_B

FROM Hourly_Stats

WHERE Server_Id = @Param_A AND Device_Id = @Param_B

AND Data_Timestamp >= @Param_C AND Data_Timestamp < @Param_D

Implementing your suggested indexes would severely impact the performance of these types of queries. What would you recommend to address those types of queries? Should I keep the clustered index as is, and add your recommended index as a non-clustered index? Or vice-versa? I'm afraid that I don't know much about tuning indexes, but I can recognize that I absolutely need an index for Server_Id, Device_Id, Data_Timestamp.

Currently the data is synchronized every 6 hours, and all servers are synchronized at once (The data is inserted into a temporary table for each server in turn, then the data in the temp table is inserted into the Hourly_Stats table.)

The code provided for getting the most recent data_timestamp per Server does need that index to perform quickly, I just don't know if I should have it as the clustered index or leave the clustered index as it is.

@j-2 Livingston

Thanks for your hard work and effort. However, mister.magoo's solution works well for the per server, per device situation as well, as long as the proper indexes are there. Although I don't know if that performance is reflected in SQL 2008 R2 as well. I only have an instance of SQL 2k5 to test on, but I also don't want to code myself into a dead end in the event of an upgrade.