August 27, 2008 at 4:42 pm
noeld (8/27/2008)
I felt that I had to post my version 😀
SELECT a.ShiftID
, a.TableID
, a.StationID
, a.EmployeeID
, a.CounterDateTimeStamp
, a.MusselCount
, a.CounterTimeSpan
,(SELECT TOP 1 b.MusselCount
FROM dbo.ShiftCountersRawData b
WHERE b.ShiftID = a.ShiftID
AND b.TableID = a.TableID
AND b.StationID = a.StationID
AND b.EmployeeID = a.EmployeeID
AND b.CounterDateTimeStamp > a.CounterDateTimeStamp
ORDER BY b.CounterDateTimeStamp )
- a.MusselCount as MusselCountDiff
FROM dbo.ShiftCountersRawData a
ORDER BY a.ShiftID, a.TableID, a.StationID, a.EmployeeID, a.CounterDateTimeStamp
Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 27, 2008 at 5:00 pm
To really make a test, I loaded up the table. After loading in the first 25 rows of the supplied data (I left out the rows that didn't change amounts), I made the employee work every day for the past year and then added 99 other employees, all working every day. This gave me a table with just shy of 1 million rows.
if Object_ID( 'dbo.ShiftCountersRawData', 'U' ) is not null begin
truncate table dbo.ShiftCountersRawData;
end;
else begin
create table ShiftCountersRawData (
ShiftID int NOT NULL,
TableID int NOT NULL,
StationID int NOT NULL,
EmployeeID varchar(12) NOT NULL,
CounterDateTimeStamp datetime NOT NULL,
MusselCount int NOT NULL,
CounterTimeSpan int NULL,
constraint PK_ShiftCountersRawData primary key
(ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp)
);
end;
Insert dbo.ShiftCountersRawData
(ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)
select dbo.GetRowCount( 'ShiftCountersRawData' );
select 7, 1, 1, 6652, '2006-12-07 13:13:22.223', 10261, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:16:22.330', 10261, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:19:22.470', 10302, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:22:22.470', 10388, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:25:22.487', 10476, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:28:22.487', 10565, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:31:22.487', 10648, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:34:22.487', 10734, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:37:22.487', 10825, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:40:22.487', 10917, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:43:22.503', 10998, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:46:22.503', 11084, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:49:22.503', 11179, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:52:22.503', 11268, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:55:22.503', 11355, 180 union all
select 7, 1, 1, 6652, '2006-12-07 13:58:22.503', 11436, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:01:22.673', 11521, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:04:22.767', 11600, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:07:22.770', 11684, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:10:22.770', 11763, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:13:22.770', 11839, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:16:22.770', 11894, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:19:22.783', 11972, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:22:22.783', 12005, 180 union all
select 7, 1, 1, 6652, '2006-12-07 14:25:22.783', 12025, 180;
Declare @I int;
set @I = 1;
-- Make the poor sucker work every day for the past year.
while @I < 365 begin
Insert dbo.ShiftCountersRawData
(ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)
select ShiftID, TableID, StationID, EmployeeID,
DateAdd( dd, -@I, CounterDateTimeStamp ), MusselCount, CounterTimeSpan
from dbo.ShiftCountersRawData
where CounterDateTimeStamp > '2006-12-07';
set @I = @I + 1;
end;
set @I = 1;
-- Add 99 other employees, all working the entire year.
while @I < 100 begin
Insert dbo.ShiftCountersRawData
(ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)
select ShiftID, TableID, StationID,
EmployeeID + @I, CounterDateTimeStamp, MusselCount + @I, CounterTimeSpan
from dbo.ShiftCountersRawData
where EmployeeID = 6652;
set @I = @I + 1;
end;Then, with multiple employees working multiple days, I had to add a WHERE clause.declare @EmpNo int, @StartDate datetime, @EndDate datetime;
set @EmpNo = 6700;
set @StartDate = '2006-01-01';
set @EndDate = DateAdd( dd, 1, @StartDate );
select s1.ShiftID, s1.TableID, s1.StationID, s1.EmployeeID,
s1.CounterDateTimeStamp as CounterStart, s2.CounterDateTimeStamp as CounterEnd,
s1.MusselCount as StartingMusselCount, s2.MusselCount as EndingMusselCount,
s2.MusselCount - s1.MusselCount as MusselCount
from ShiftCountersRawData s1
join ShiftCountersRawData s2
on s1.ShiftID = s2.ShiftID
and s1.TableID = s2.TableID
and s1.StationID = s2.StationID
and s1.EmployeeID = s2.EmployeeID
and s2.CounterDateTimeStamp = (
select Min( s3.CounterDateTimeStamp )
from ShiftCountersRawData s3
where s3.CounterDateTimeStamp > s1.CounterDateTimeStamp
and s3.ShiftID = s1.ShiftID
and s3.TableID = s1.TableID
and s3.StationID= s1.StationID
and s3.EmployeeID= s1.EmployeeID
group by s3.ShiftID, s3.TableID, s3.StationID, s3.EmployeeID
)
where s1.EmployeeID = @EmpNo
and s1.CounterDateTimeStamp between @StartDate and @EndDate
and s2.CounterDateTimeStamp between @StartDate and @EndDate;
At first I only added the s1 date check but I was getting back an extra row. Turns out the last entry for the day was joining with the first entry of the next day. The s2 date check fixed that. I ran the query multiple times with different employee numbers and different dates. The query averaged 200ms on my laptop.
Then I ran noeld's query -- after also adding a WHERE clause, or in this case, one new WHERE clause and an addition to an existing one: SELECT a.ShiftID
, a.TableID
, a.StationID
, a.EmployeeID
, a.CounterDateTimeStamp
, a.MusselCount
, a.CounterTimeSpan
,(SELECT TOP 1 b.MusselCount
FROM dbo.ShiftCountersRawData b
WHERE b.ShiftID = a.ShiftID
AND b.TableID = a.TableID
AND b.StationID = a.StationID
AND b.EmployeeID = a.EmployeeID
AND b.CounterDateTimeStamp > a.CounterDateTimeStamp
and b.CounterDateTimeStamp between @StartDate and @EndDate
ORDER BY b.CounterDateTimeStamp )
- a.MusselCount as MusselCountDiff
FROM dbo.ShiftCountersRawData a
where EmployeeID = @EmpNo
and CounterDateTimeStamp between @StartDate and @EndDate';
Executing this with the same inputs resulted in a 180ms average response -- a 10% improvement. Not too bad, but there is still one problem. It always returns 26 rows, with the MusselCountDiff value being NULL on the last row.
I haven't thought of an easy way around this. Any fix will surely increase the response time. It could be allowed as is, but then subsequent code will have to work around that invalid row, which will complicate the overall application efficiency.
"It it's not...one...thing...it's anotherrrrrrrr!" (Sung to the tune of...well, any tune will do.)
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 27, 2008 at 5:54 pm
rbarryyoung (8/27/2008)
Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.
I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 27, 2008 at 6:27 pm
rbarryyoung (8/26/2008)
Jeff, dude:Jeff Moden (8/26/2008)
SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0)
ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum,
*
... Uhm, ... what the blankety-blank is this !?!?!
Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 27, 2008 at 6:28 pm
Tomm Carr (8/27/2008)
rbarryyoung (8/27/2008)
Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.
You are right, I forgot to bound the inner query correctly when I changed my original query. Thanks for the catch.
However, it is not because I have A and B reversed, and my original point is still correct: Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it. All of the other queries have this correct (well, my first one does, my second one does need to be fixed, as you pointed out).
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 27, 2008 at 7:46 pm
Jeff Moden (8/27/2008)
rbarryyoung (8/26/2008)
Jeff, dude:Jeff Moden (8/26/2008)
SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0)
ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum,
*
... Uhm, ... what the blankety-blank is this !?!?!
Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.
Heh. Sorry, I thought it was another one of your mysterious voodoo performance tricks.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 27, 2008 at 7:50 pm
rbarryyoung (8/27/2008)
...Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it.
I think I see where you are coming from here. Look at the raw data and follow me for a minute.
In the first time period, the one that starts 13 minutes after 1PM, the initial mussel count is 10261. At the end of that 3 minute period (the beginning of the segment starting at 16 minutes after 1PM) the count is still at 10261. So during the time period of 13 minutes to 16 minutes after the hour, no mussels were shucked. The answer is 0.
Now we look at the time period from 16 minutes to 19 minutes after. It starts with a mussel count of 10261 (we already know that) and ends at 10302. So in the period from 16 minutes to 19 minutes after, 41 mussels were shucked.
The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).
Think about it like this. Suppose each time period was a complete day, from midnight to midnight. Wouldn't it make more sense to associate the work done during a day with the start of the day? "On May 18th, this much work was performed." The other way would associate May 18th's work with the row labeled "May 19".
However, that is a personal preference. Which one of us is "correct" depends on what the specs require. 😛
Btw, I forgot to mention why I like my query better (in general) than the more simpler correlated subquery. In many places where I use this pattern, I need access to more than one value from the "next" row. For example, my result set shows the timestamp at the beginning of the period, CounterStart, and the end, CounterEnd as well as the beginning and ending MusselCount values. This isn't possible with just a single subquery. I can get a join of the complete "current" row with the complete "next" row for just a 10% performance hit. I don't think that's too shabby.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 27, 2008 at 7:53 pm
Jeff Moden (8/27/2008)
I'll probably have a million row table in tow kinda like Tom did...
I'll make you a deal. You don't call me Tom and I won't call you Jef. 😉
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 27, 2008 at 7:56 pm
Heh... sorry Tomm... bet you've had "mo-den" enough of that 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
August 27, 2008 at 7:58 pm
Tomm Carr (8/27/2008)The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).
But it does not make sense, nor is it what was spec'ed. The only association that makes sense is to have the Mussel Count difference associated with the row that records the information for the same time period that the Mussel count difference is calculated against, and that is the Row at the end of that time period. We know that because the column that specifically documents the time span: the CounterTimeSpan column is stored in the row at the end of the time span and not at the beginning.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 27, 2008 at 9:06 pm
rbarryyoung (8/27/2008)
But it does not make sense, nor is it what was spec'ed.
Sigh! You can be so stubborn! You should be more like me and just have the courage of your convictions.
Very well, have it your own way. To do it that way, you have to process the rows backwards. Iow, you will be working with the "current" row and the count from the "previous" row. To do that, you have to sort the subquery descending. You still have the problem with one row with a NULL for the value, but I've thrown in a quick and dirty fix for that.
Does this give you what you want (as terribly wrong as it may be)?
declare @EmpNo int, @StartDate datetime, @EndDate datetime;
set @EmpNo = 6700;
set @StartDate = '2006-01-01';
set @EndDate = DateAdd( dd, 1, @StartDate );
Select [ShiftID]
,[TableID]
,[StationID]
,[EmployeeID]
,[CounterDateTimeStamp]
,[MusselCount]
,[CounterTimeSpan]
,MusselCount - IsNull( (Select TOP 1 B.MusselCount
From [ShiftCountersRawData] B
Where B.ShiftID = A.ShiftID
And B.TableID = A.TableID
And B.StationID = A.StationID
And B.EmployeeID = A.EmployeeID
And B.CounterDateTimeStamp < a.CounterDateTimeStamp
And B.CounterDateTimeStamp
Between @StartDate and @EndDate
order by B.CounterDateTimeStamp desc), MusselCount )
as [MusselCountDiff]
FROM [dbo].[ShiftCountersRawData] a
where a.EmployeeID = @EmpNo
and a.CounterDateTimeStamp between @StartDate and @EndDate
Edit-- I meant to add a little jab about how I have not been privy to the spec, but I decided not to be so petty. 😀
I will make one last point. The 25 rows that make up a day's worth of shucking define 24 time periods. If you'll look at the way I've formatted the output (in my original query, not the one above), the result set has 24 rows, each row represents a time period with a start and end timestamp, a start and end mussel count and a total count shucked during the period. Using a subquery means a result set of 25 rows. So the first or the last row (depending on if you want end or start time to be significant) will always be extraneous.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply