December 23, 2010 at 3:52 am
Ok so I am giving myself a huge pat on the shoulder because i've had no formal training on SQL nor do I even work in a IT-ish role but work as a data analyst. We had some data that I wanted to transform and the only thing I could think of was to write the script below. It works (!!) but it's very slow. It took 58 minutes to process 23,000 rows using the system below.
The idea is that it takes data from a table, looks at a start and end datetime, checks to see if there is an interval which fits between that range, and if so updates another table.
Can someone take my donkey script and help me make it more efficient? from now on, the script will be processing a much smaller number of rows, say 1500 per day, but that still quite a long time while my data warehouse loads.
Anyway, your thoughts would be appreciated.
-- Declare Procedural Variable
declare @exception varchar(50)
declare @Interval datetime
declare @start datetime
declare @stop datetime
declare @DateRaw varchar(50)
declare @AgentName varchar(50)
declare @RowCnt int
declare @MaxRows int
declare @Flag int
declare @subloop int
-- reset global variables
select @flag = 1
select @RowCnt = 1
--define total rowcount
select @MaxRows=count(*) from dbo.wfm_actualactivity
-- Load table into a temporary table for sorting, adding a row identifier for the rowcount
declare @SourceTable table (rownum int IDENTITY (1,1),DateRaw datetime null, AgentName varchar(50)null, exception varchar(50)null, e_start datetime null, e_end datetime null)
insert into @SourceTable (DateRaw, AgentName, exception, e_start, e_end) SELECT DateRaw, agentName, exception, cast(start as datetime), cast([stop] as datetime) FROM dbo.wfm_actualactivity
-- Start outer procedure :: Loop while there are still rows to process
While @RowCnt <= @MaxRows
begin
-- Prep for the inner loop to check each row against each interval
select @subloop = 0
select @start = (SELECT e_start FROM @SourceTable WHERE rownum = @RowCnt)
select @stop = (SELECT e_end FROM @SourceTable WHERE rownum = @RowCnt)
select @Interval = '01/01/1900 07:45:00'
--Start inner procedure :: Loop for each minute of the day
While @interval < '01/01/1900 20:15:00' -- Enter Statement to make sure row data is within interval
begin
select @subloop = 0 -- reset subloop counter. Subloop = 2 means it is within that time range
IF @Interval >= @Start select @subloop = @subloop + 1 -- Checks to see if the interval is greater than start time
IF @Interval <= @Stop select @subloop = @subloop + 1 -- checks to see if the interval is less than the end time.
IF @subloop =2 select @DateRaw = (Select DateRaw FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate
IF @subloop =2 select @AgentName = (Select AgentName FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate
IF @subloop =2 select @Exception = (Select Exception FROM @SourceTable WHERE rownum = @RowCnt) -- loads variable if appropriate
IF @subloop =2 INSERT INTO dbo.ActualInterval (DateRaw, Interval, AgentName, Exception, Flag) Values (@DateRaw, @Interval, @AgentName, @Exception, @flag) -- loads variable if appropriate
select @interval = @interval + '01/01/1900 00:05:00' -- run process to see if exception fits within the next 1 minute interval
END
select @RowCnt = @RowCnt + 1
END
December 23, 2010 at 4:10 am
To restate your problem :
You need a list of 5 minute intervals betweeen '07:45:00' and '20:15:00' that are between the start and stop time on wfm_actualactivity.
Is that correct ?
December 23, 2010 at 4:16 am
Yes. Well, any interval really, I would much prefer 1 minute intervals, the problem is query time which is why it's 5 minutes.
The source table looks something like this.
date, agent, exception, start, stop.
1/6/2010, John, Lunch, 12:30, 13:30
1/6/2010, John, OpenTime, 13:30, 15:00
I would want to send that to another table, which has an interval column, at an interval (1 min, 5 min whatever) and what exception they were in at that time.
December 23, 2010 at 4:38 am
Then i suspect something like this....
Create Table #Source
(
dateRaw datetime,
Agent varchar(20),
Exception varchar(20),
e_start datetime,
e_end datetime
)
go
Insert into #Source(dateRaw,Agent,Exception,e_start,e_end)
select '2010-01-06', 'John', 'Lunch', cast('12:30' as datetime), cast('13:30' as datetime)
union all
select '2010-01-06', 'John', 'OpenTime', '13:30', '15:00'
go
with cteTime
as
(
Select *,DATEDIFF(mi,e_start,e_end) as mins
from #source
)
select *,DATEADD(mi,vals.number,e_start) as ExceptionTime
from cteTime
join master..spt_values vals
on vals.number <= mins
where vals.type = 'P'
Hope you are able to break this down to understand the code, needless to say dont put anything live that you dont 100% understand.
Any questions , feel free to post back
December 23, 2010 at 4:52 am
what is the rowcount of the table?
December 23, 2010 at 5:02 am
23,000 rows ish.
with 1500 rows per day.
December 23, 2010 at 5:04 am
Dave Ballantyne (12/23/2010)
Then i suspect something like this....
Create Table #Source
(
dateRaw datetime,
Agent varchar(20),
Exception varchar(20),
e_start datetime,
e_end datetime
)
go
Insert into #Source(dateRaw,Agent,Exception,e_start,e_end)
select '2010-01-06', 'John', 'Lunch', cast('12:30' as datetime), cast('13:30' as datetime)
union all
select '2010-01-06', 'John', 'OpenTime', '13:30', '15:00'
go
with cteTime
as
(
Select *,DATEDIFF(mi,e_start,e_end) as mins
from #source
)
select *,DATEADD(mi,vals.number,e_start) as ExceptionTime
from cteTime
join master..spt_values vals
on vals.number <= mins
where vals.type = 'P'
Hope you are able to break this down to understand the code, needless to say dont put anything live that you dont 100% understand.
Any questions , feel free to post back
I will try this, this afternoon - but thanks in advance
December 23, 2010 at 9:26 am
To add to this, I had a similar issue not too long ago which was solved on this forum.
December 23, 2010 at 9:35 am
skcadavre (12/23/2010)
To add to this, I had a similar issue not too long ago which was solved on this forum.
Hmmmm... your last statement in the link you provided was
I then intend to do some testing on my real data to see the performance differences between the others.
I wonder if you could post the results of your testing and the code you finally came up with in the thread you just referred to. ;-):-)
I'm sure it'll help to make that other thread "well-formed"...
December 23, 2010 at 10:08 pm
How about this ?
;with cte
as
(
select e_start,e_end from #Source
union all
select DATEADD(minute,1,e_start),e_end from cte
where DATEADD(minute,1,e_start)<=e_end
)
select e_start from cte order by e_start
I think with Denali and introduction of SEQUENCES the above method will be a thing of the past.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
December 23, 2010 at 11:49 pm
Sachin Nandanwar (12/23/2010)
How about this ?
;with cte
as
(
select e_start,e_end from #Source
union all
select DATEADD(minute,1,e_start),e_end from cte
where DATEADD(minute,1,e_start)<=e_end
)
select e_start from cte order by e_start
I think with Denali and introduction of SEQUENCES the above method will be a thing of the past.
This method should be a thing of the past already , have you tested performance of this over a large number of rows and compared that to a tally table ?
Plus sequences will be a dead end here. You would have to reinitialize the sequence to zero for each call of the routine. This would also mean that concurrent executions would have to be stopped as sequences are global
December 24, 2010 at 12:04 am
No I havent.
Can you post some link where it has been done?
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
December 24, 2010 at 1:23 am
Here you go , Very easy to prove.
drop table #Source
go
Create Table #Source
(
SourceID integer identity ,
e_start smalldatetime,
e_end smalldatetime
)
go
insert into #Source(e_start,e_end)
select top(1000) '01jan2010','01jan2010 01:00'
from sys.columns a cross join sys.columns b
go
--- CTE RBAR Method
;with cte
as
(
select SourceID,e_start,e_end from #Source
union all
select SourceID,DATEADD(minute,1,e_start),e_end from cte
where DATEADD(minute,1,e_start)<=e_end
)
select SourceID,e_start from cte order by e_start
go
--- Tally Table
with cteTime
as
(
Select *,DATEDIFF(mi,e_start,e_end) as mins
from #source
)
select *,DATEADD(mi,vals.number,e_start) as ExceptionTime
from cteTime
join master..spt_values vals
on vals.number <= mins
where vals.type = 'P'
Bear in mind this is over a relatively small dataset of 1000 rows...
December 24, 2010 at 1:45 am
Thanks.
I am to getting somewhat the same results after testing both the methods.
I am now gonna STOP using the CTE method in future 🙂
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
December 24, 2010 at 7:29 am
Dave Ballantyne (12/23/2010)
To restate your problem :You need a list of 5 minute intervals betweeen '07:45:00' and '20:15:00' that are between the start and stop time on wfm_actualactivity.
Is that correct ?
To confirm, your solution works perfectly, and wanted to thank you. Coming to forums such as this is a perfect way for people who learn like I do, through application in real life issues. I understand now with some research the solution - and has gone to widen my ideas when searching for other issues.
Thanks again.
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply