February 16, 2006 at 1:42 pm
I'm trying to write a set based update that sets a bit flag to 1 but ensuring that only one row can be flagged per distinct barcode per a 6 hour time span. A simplified version of the table in question is defined as [mytable] below.
create table [mytable] {
[barcode] varchar(10),
[datetimestamp] datetime,
[myflag] bit
)
The actual table has about 20 million rows and 640 thousand distinct barcodes. Oh, and it's running on SQL Server 2005.
The table is essentially a log of membership card swipes on entry to our facilities and we want to count how many people were admitted during various time spans but we do not want to count multiple swipes of the same card within a 6 hour period. So if someone was to swipe in and then leave the building to pick up a forgotten item in the car and then come in again we want to count the first swipe only. Or if someone does a double swipe on a single entry - that counts as one.
My idea is to create a bit column ([myflag] above) that we could periodically (probably nightly) update to identify which swipes would actually count in that type of query while leaving the raw data intact. The problem is that everything I come up with turns into a cursor process and I'd rather avoid that. I'm new to SQL Server 2005 so there might be a new easy way to accomplish something like this that I'm unaware of. If not, there's probably still a way to avoid cursors.
Thanks in advance,
John
February 16, 2006 at 2:06 pm
Actually you don't need any flag.
Select count(T1.barcode) as NumberOfMembers, dateadd(dd, T2.DayNumber, 0) as DateRecorded
FROM MyTable T1
INNER JOIN (
select barcode, datediff(dd, 0, datetimestamp) DayNumber -- if your time span is 1 day
from MyTable
group by barcode, datediff(dd, 0, datetimestamp)
) T2 on T1.barcode = T2.Barcode and datediff(dd, 0, T1.datetimestamp) = T2.DayNumber
It's good to have computed column datediff(dd, 0, datetimestamp) and set an index on it. This will make respond from your query really fast.
You may set as many additional columns as types of periods you are using - days, weeks, decades, months, etc. Becuse your field datetimestamp not to be updated it will not affect performance.
_____________
Code for TallyGenerator
February 16, 2006 at 2:26 pm
Sergiy query looks more plausible.
*edit just a test-query
This code should be tested first
DECLARE @begin datetime
DECLARE @end datetime
select ALLSWEEPS.barcode,ALLSWEEPS.ALL_COUNTS-ISNULL(SWEEPDOUBLES.DOUBLES,0)
FROM
(
select barcode,count_BIG(*) AS ALL_COUNTS all /*all counts*/
from mytable
where datetimestamp between @begin and @end /*within period*/
group by barcode
) AS ALLSWEEPS
LEFT join
(
select barcode, count_BIG(*) AS DOUBLES
from mytable t1
inner join mytable t2
on t1.barcode=t2.barcode
/*everything that should be discarded*/
and t1.datetimestamp >= t2.datetimestamp and
t2.datetimestamp<dateadd(h,t1.datetimestamp,6)
where t1.datetimestamp BETWEEN @begin AND @end /*within period*/
group by t1.barcode
having count_BIG(*)>1 /*doubles*/
) AS SWEEPDOUBLES
ON ALLSWEEPS.barcode=SWEEPDOUBLES.barcode
February 16, 2006 at 2:53 pm
Thanks Sergiy. I tried your query and get "Column 'T2.DayNumber' is invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause."
Btw, in your approach how would you implement a 6 hour time span instead of 1 day?
John
February 16, 2006 at 3:01 pm
Add
GROUP BY T2.DayNumber
at the end of my script.
For 6 hours you can use
datediff(hh, 0, datetimestamp) / 6
Because all values are int it will return number of 6 hours periods passed.
For dateadd you need to add * 6.
It's better to have it in UDF with 3 parameters:
@datetimestamp datetime, @PeriodType varchar, @PeriodDuration int
_____________
Code for TallyGenerator
February 16, 2006 at 8:16 pm
UPDATE myTable
SET myFlag = 1
FROM myTable mt1
WHERE NOT EXISTS
(
SELECT *
FROM myTable mt2
WHERE mt1.marcode
AND DATEPART(yy,mt1.dateTimeStamp) = DATEPART(yy,mt2.dateTimeStamp)
AND DATEPART(dy,mt1.dateTimeStamp) = DATEPART(dy,mt2.dateTimeStamp)
AND DATEPART(hh,mt1.dateTimeStamp)/4 = DATEPART(hh,mt2.dateTimeStamp)/4
AND mt1.dateTimeStamp < mt2.dateTimeStamp
)
February 16, 2006 at 9:36 pm
OK...just thinking about this a little more. If the above isn't what you had in mind, the next course of action gets trickier if you want to go rhetro-active.
CREATE TRIGGER tr_MyFlag ON myTable FOR INSERT, UPDATE
UPDATE myTable
SET myFlag = 1
FROM INSERTED mt1
WHERE myFlag = 0
AND NOT EXISTS
(
SELECT *
FROM myTable mt2
WHERE mt2.mFlag = 1
AND mt1.barcode = mt2.barcode
AND DATEDIFF(hh, mt2.dateTimeStamp, m1.dateTimeStamp) BETWEEN 0 AND 5
)
GO
--This will run through each record, start-finnish populating the correct value (using the trigger)....
--Run this code just one time for rhetro-active
DECLARE curMT CURSOR FOR
SELECT barcode, dateTimeStamp
FROM myTable
ORDER BY barcode, dateTimeStamp
OPEN curMT
DECLARE @bc varchar(10)
DECLARE @dts smalldatetime
FETCH NEXT FROM curMT INTO @bc, @dts
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE myTable
SET myFlag = 0 --default
WHERE barcode = @bc
AND dateTimeStamp = @dts
FETCH NEXT FROM curMT INTO @bc, @dts
END
CLOSE curMT
DEALLOCATE curMT
GO
--
February 17, 2006 at 2:43 am
Can you make an estimation how long it will take to run a cursor over 20 million rows firing trigger doing update with derived table on the same table with 20 mil rows?
I afraid they need for such task more computing power than Peter Jackson.
_____________
Code for TallyGenerator
February 17, 2006 at 7:15 am
I thought about it for a while and you either do it like the first post I made for rhetro or you do it with a cursor. The cursor is more acurate. Post back if you need an explantation on either method.
February 17, 2006 at 7:58 am
You could also try the script below.
I think what Sergiy suggests is that you make timeinterval (see below) a calculated and indexed column on mytable and then run the last query below. That's probably the best solution.
create table [mytable] (
[barcode] varchar(10),
[datetimestamp] datetime,
timeinterval int
)
go
declare @InitDate datetime
select @InitDate = '20051201'
insert mytable select 1, '20060101 4:00', 0
insert mytable select 1, '20060101 5:00', 0
insert mytable select 1, '20060101 8:00', 0
insert mytable select 2, '20060101 9:00', 0
insert mytable select 1, '20060101 11:00', 0
update mytable set timeinterval = datediff(hh, @InitDate, datetimestamp)/6
select
count(distinct barcode) as distinctEntries,
dateadd(hh, timeinterval*6, @InitDate) as FromDate,
dateadd(hh, (timeinterval + 1)*6, @InitDate) as ToDate
from mytable
group by timeinterval
drop table mytable
go
February 17, 2006 at 8:11 am
this still splits the day up into time sections. Here's the scenerio...
swipe at 11:58 -> myFlag = 1
12:00 -> 6 hour period elapses.
swipe at 12:01
My original post flags both with less overhead.
The cursor solution will flag the first swipe move to the second and realize the "original" (it knows, because there is a flag on the first) is less than 6 hrs. A batch update command simply cannot acheive this, because there is no moving frame of reference. You only need to run the cursor one time, then you rely on the Trigger.
February 20, 2006 at 2:44 am
I understand what you are saying. I cannot read from John's post whether he wants your example to count as a swipe during each period or whether he wants to ignore one of them (could be the first one if there is a swipe at 5:59). If this is the case, you will have to use a loop, that's right. But looping through 20 million rows, that's going to take a lot of time... One might accept that your user is included in both time periods if data can be found in a (relatively) fast and straight-forward way.
February 20, 2006 at 4:32 am
If you insist on solving this, it is probably better to loop over time than over each row in the table
declare @time datetime
select @time = '20050101' -- Your starting point
while @time < getdate()
begin
update t set t.myflag = 1
from mytable t inner join
(
select t1.barcode, min(t1.datetimestamp) as mintime
from mytable t1 left join mytable t2
on t1.barcode = t2.barcode and t2.myflag = 1 and t2.datetimestamp >= dateadd(hh, -6, t1.datetimestamp)
where
t1.datetimestamp >= @time
and t2.datetimestamp is null
group by t1.barcode
)
dt
on t.barcode = dt.barcode and t.datetimestamp = dt.mintime
select @time = dateadd(hh, 6, @time)
end
February 20, 2006 at 7:23 am
"on..and t2.myflag = 1"
at what point does myFlag = 1?
February 20, 2006 at 7:26 am
Previous iteration of the loop.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply