October 23, 2011 at 8:00 am
This is my main (and simplified) table:
if object_id('tempdb.dbo.#Scan') is not null drop table #scan
create table #scan(
Code bigint NOT NULL,
Date datetime NOT NULL,
)
There is a clustered index on Code and nonclustered index on Date column.
Data in it:
insert into #scan(Code, Date)
select 21000748772, '2011.09.21 00:02:58' union all
select 21000821972, '2011.09.21 02:49:18' union all
select 21000823072, '2011.09.21 02:51:11' union all
select 21000828322, '2011.09.21 02:59:12' union all
select 21000829312, '2011.09.21 03:00:56' union all
select 21000832672, '2011.09.21 03:16:32' union all
select 21000849762, '2011.09.21 03:45:23' union all
select 21000855982, '2011.09.21 03:56:44' union all
select 21000868412, '2011.09.21 04:17:43' union all
select 12000694142, '2011.09.21 06:12:04' union all
select 12000697682, '2011.09.21 06:27:22' union all
select 21000941502, '2011.09.21 12:10:26' union all
select 21000955142, '2011.09.21 12:54:48' union all
select 21000993462, '2011.09.21 14:15:37' union all
select 21000994272, '2011.09.21 14:17:29' union all
select 21000999002, '2011.09.21 14:28:15' union all
select 21001012382, '2011.09.21 15:02:21' union all
select 21001014742, '2011.09.21 15:05:42' union all
select 21001025092, '2011.09.21 15:25:22' union all
select 21001041462, '2011.09.21 15:50:01' union all
select 21001046412, '2011.09.21 15:56:40' union all
select 21001083412, '2011.09.21 17:38:55' union all
select 21001098942, '2011.09.21 18:03:37' union all
select 21001103892, '2011.09.21 18:10:48' union all
select 21001121452, '2011.09.21 19:01:29' union all
select 21001129022, '2011.09.21 19:27:47' union all
select 21001135042, '2011.09.21 19:41:28' union all
select 21001156702, '2011.09.21 21:11:23' union all
select 21001181832, '2011.09.21 22:34:54' union all
select 21001183922, '2011.09.21 22:40:42'
I need to get number of codes, that start with either 11, 12, 14, 21, 22, 23 for all the 8 hour intervals in three-shift system. Like this:
create table #ShiftData (
DateFrom datetime,
DateTo datetime,
Count11 int,
Count12 int,
Count14 int,
Count21 int,
Count22 int,
Count23 int
)
insert into #ShiftData
select '2011-09-20 22:00:00.000', '2011-09-21 06:00:00.000', 0, 0, 0, 9, 0, 0
select '2011-09-21 06:00:00.000', '2011-09-21 14:00:00.000', 0, 2, 0, 2, 0, 0
select '2011-09-21 14:00:00.000', '2011-09-21 22:00:00.000', 0, 0, 0, 15, 0, 0
select '2011-09-21 22:00:00.000', '2011-09-22 06:00:00.000', 0, 0, 0, 2, 0, 0
So first I have to get all the possible shifts - in this example there are four, each of them lasts 8 hours (three-shift system). Then I need to sum it up for each of the code types (that is - codes that start with either 11, 12, 14, 21, 22 or 23).
What is the most efficient way to do this?
October 23, 2011 at 9:54 am
Can you post some easily usable sample data please (see http://www.sqlservercentral.com/articles/Best+Practices/61537/). I'm sure this could be massively optimised, not the least by getting rid of that UDF.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 23, 2011 at 1:43 pm
OK, I rewrote my question. Hope it is clearer now.
October 23, 2011 at 2:14 pm
Awesome. Makes things so much easier when I can test a solution.
First thing you need is a calendar table. I've created it as a # table, I recommend you create a permanent one. this one starts and the defined startdate and ends sometime in 2020. If you want to start it earlier, change the variable, if you want more rows, increase the TOP.
CREATE TABLE #CalendarTable (
ShiftStartDate DATETIME,
ShiftEndDate DATETIME
)
CREATE UNIQUE CLUSTERED INDEX idx_CalendarTable_StartDate ON #CalendarTable (ShiftStartDate)
DECLARE @StartDate DATETIME
SET @StartDate = '2011/09/01 06:00'
INSERT INTO #CalendarTable ( ShiftStartDate, ShiftEndDate)
SELECT DATEADD(hh,8*(Incr-1), @StartDate), DATEADD(hh,8*(Incr), @StartDate) FROM (
SELECT TOP (10000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) Incr
FROM sys.columns AS c CROSS JOIN sys.columns AS c2
) sub
Once we have that, which defined when the shifts start and finish, it's a trivial piece of code.
SELECT ShiftStartDate, [11], [12], [14], [21], [22], [23] FROM (
SELECT LEFT(s.Code, 2) AS Prefix, ShiftStartDate, code
FROM #scan AS s INNER JOIN #CalendarTable AS ct ON s.Date >= ShiftStartDate AND s.Date < ShiftEndDate
) s
PIVOT (
COUNT(code)
FOR Prefix IN ([11],[12],[14],[21],[22],[23])
) pivoted
Results:
ShiftStartDate 11 12 14 21 22 23
----------------------- ----------- ----------- ----------- ----------- ----------- -----------
2011-09-20 22:00:00.000 0 0 0 9 0 0
2011-09-21 06:00:00.000 0 2 0 2 0 0
2011-09-21 14:00:00.000 0 0 0 15 0 0
2011-09-21 22:00:00.000 0 0 0 2 0 0
(4 row(s) affected)
I won't claim this is the most efficient, PIVOT can get a little nasty sometimes. It should be more than adequate. If it doesn't meet performance requirements, post back and we'll see about a manual pivot that's more efficient. It should certainly be better than your while loop and UDF.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 24, 2011 at 12:50 am
'trivial piece of code', yes, if you speak pivot 😛
I modified your code a bit (removed the expensive join with ShiftsTable) and got this:
select ShiftStartDate, dateadd(hour, 8, ShiftStartDate) ShiftEndDate, [11], [12], [14], [21], [22], [23]
from (
select left(Code, 2) Prefix,
case
when datepart(hour, Date) < 6 then dateadd(day, -1, dateadd(hour, 22, datediff(day, 0, Date)))
when datepart(hour, Date) between 6 and 13 then dateadd(day, 0, dateadd(hour, 6, datediff(day, 0, Date)))
when datepart(hour, Date) between 14 and 21 then dateadd(day, 0, dateadd(hour, 14, datediff(day, 0, Date)))
else dateadd(day, 0, dateadd(hour, 22, datediff(day, 0, Date)))
end ShiftStartDate, Code
from #scan (nolock)
) s
pivot (
count(Code)
for Prefix in ([11], [12], [14], [21], [22], [23])
) pivoted
order by ShiftStartDate
Looks OK and fast too, thanks 🙂
October 24, 2011 at 3:11 am
Did you test that the join was expensive? Yes, it's an inequality, but that doesn't automatically mean expensive.
I tested your and my code on a million rows in #scan.
Calendar table and inequality join:
Table '#CalendarTable'. Scan count 480, logical reads 960, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#scan'. Scan count 1, logical reads 3047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2184 ms, elapsed time = 2180 ms.
Case statement:
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#scan'. Scan count 1, logical reads 3047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2840 ms, elapsed time = 2839 ms.
I ran those multiple times. The version with the case statements was 700 ms slower. Now, as the date range grows, the reads on CalendarTable will grow, but it should grow fairly slowly.
Please, please, please, drop the Nolocks.
See - http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 24, 2011 at 4:31 am
Interesting, I got completely opposite times. Will test it again though, maybe I forgot clustered index on ShiftsTable...
I am using nolock because the inserts should not be blocked in any way (your query locked the table for 2180ms, which is unexceptable in my scenario, where the number of rows is nearing 5 millions). The inserts are performed by a computer, which is hooked to a conveyor belt look alike machine and if the insert is slow, the conveyor belt has to stop, wait for the computer to process the insert and then start again.
And the report (data from the query in question) is generated once per day and used by me and a couple of my co-workers. And if this report shows one or two numbers off-by-one, nobody will ever notice it, because it is used as an overview report.
Of course when we do actual analysis of the data for our client, nolock is NOT used.
October 24, 2011 at 4:41 am
This shouldn't lock the table unless you're processing the entire table. If you're processing a portion of the scan table and have predicates on indexed columns (the date should be indexed, possibly clustered), then you should just get page locks. Depends what ranges of the table you're reading.
If you're querying that processing table often, consider snapshot isolation. That way readers never block writers and you can report accurately off live data without impacting the inserts.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply