July 14, 2009 at 2:22 pm
Given the following
CREATE TABLE #log(
log_event int identity(1,1) not null PRIMARY KEY,
log_ticket int not null,
log_group int not null,
curr_group int not null,
log_entry varchar(100) )
INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)
SELECT
1,
1,
1,
'test'
UNION ALL
SELECT
1,
1,
2,
'test2'
UNION ALL
SELECT
1,
1,
1,
'test3'
UNION ALL
SELECT
1,
1,
4,
'test4'
UNION ALL
SELECT
2,
2,
2,
'test5'
UNION ALL
SELECT
2,
2,
1,
'test6'
UNION ALL
SELECT
2,
2,
3,
'test7'
UNION ALL
SELECT
2,
2,
2,
'test8'
Count the number of times that curr_group changes for each log_ticket
Results should look something like
log_ticket number_of_changes
1 3
2 3
The above is a basic simulation of what I'm working with. My problem is I have about 500K records... my first thought was to LEFT JOIN the table to itself based on log_ticet = log_ticket and log_event + 1 = log_event and use that to check the next record to compare the groups.
It works... but it takes about 7 minutes on average to process... I was wondering if there was a better way.
July 14, 2009 at 2:32 pm
Greetings Mark,
Considering that you are wanting the changes and not the original, then a simple GROUP BY log_ticket and a COUNT of curr_group may work. Then you can just subtract 1 to remove the original record from the COUNT to find the number of changes. Add a HAVING clause then to only return the ones with changes.
SELECT
log_ticket,
COUNT(curr_group) - 1 AS ChangeCnt
FROM #log
GROUP BY log_ticket
HAVING
(COUNT(curr_group) - 1) > 0
Have a good day.
Terry Steadman
July 14, 2009 at 2:58 pm
Blah I should have spent more time on the test data.
Multiple entries can be entered in sequence for the same group, it doesn't always change from record to record. Just need to count the times it does change.
INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)
SELECT
1,
1,
1,
'test'
UNION ALL
SELECT
1,
1,
2,
'test2'
UNION ALL
SELECT
1,
1,
1,
'test3'
UNION ALL
SELECT
1,
1,
4,
'test4'
UNION ALL
SELECT
2,
2,
2,
'test5'
UNION ALL
SELECT
2,
2,
1,
'test6'
UNION ALL
SELECT
2,
2,
3,
'test7'
UNION ALL
SELECT
2,
2,
2,
'test8'
UNION ALL
SELECT
1,
1,
4,
'test9'
UNION ALL
SELECT
1,
1,
4,
'test10'
UNION ALL
SELECT
1,
1,
3,
'test11'
The results should be
log_ticket change_count
1 4
2 3
July 15, 2009 at 12:54 am
Hope this will meet your requirement:
select log_ticket, count( distinct curr_group) from #log
group by log_ticket
Output of above query is same as you needed.
July 15, 2009 at 7:52 am
The output is the same, but that's coincidence.
I need a solution that even if I had 20 changes between groups on a single ticket say
1 to 2 to 1 to 3 to 1 to 4 to 1 to 2 to 3 to 1 to 4 to 1 to 2 to 3... etc
it would count every change, not every instance of the group.
count(curr_group) will count each group... which means that if three entries are made in sequence in the same group it will count too high.
count(distinct curr_group) will count each group once... which means that if the assigned group bounces back and forth between 4 groups 20 times it will only see 4 changes.
if there was a count(change of curr_group) I'd be set. which is why my present solution works.. it just takes too long.
Here's what I have that works...
create table #results(
results_id int identity(1,1) NOT NULL PRIMARY KEY,
log_event int,
log_ticket int,
log_group int,
curr_group int)
INSERT INTO #results(log_event,log_ticket,log_group,curr_group)
select
log_event,
log_ticket,
log_group,
curr_group
FROM
#log
ORDER BY
log_ticket,log_event
SELECT
a.log_ticket,
a.log_event,
change_count = case when a.curr_group != b.curr_group then 1 else 0 end
INTO #results_count
FROM
#results a JOIN #results b ON
a.results_id +1 = b.results_id and
a.log_ticket = b.log_ticket
select
a.log_ticket,
change_sum = sum(isnull(b.change_count,0))
FROM
#log a LEFT JOIN #results_count b ON
a.log_ticket = b.log_ticket and
a.log_event = b.log_event
GROUP BY
a.log_ticket
And on 11 records it works great... On 500,000 it takes 15 minutes... which is too long.
July 15, 2009 at 7:58 am
Not sure how well this will perform, but should give correct results
WITH CTE AS (
SELECT log_ticket,
ROW_NUMBER() OVER(PARTITION BY log_ticket ORDER BY log_event) -
ROW_NUMBER() OVER(PARTITION BY log_ticket,curr_group ORDER BY log_event) AS rnDiff
FROM #log)
SELECT log_ticket,
COUNT(DISTINCT rnDiff) AS change_count
FROM CTE
GROUP BY log_ticket
ORDER BY log_ticket
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537July 15, 2009 at 8:32 am
That's awesome...
Sadly I screwed up... been working in our external data centers too long and posted a SQL 2000 question in a SQL 2005 forum (our data centers for our customers run 2k5... internally we're a bit slower than that)... I'll report this and see if somebody can't move this to the T-SQL 2k forum instead.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply