Counting Changes

Question

Counting Changes

mtassin

SSC-Insane

Points: 23099
More actions
July 14, 2009 at 2:22 pm

#73241

Given the following
CREATE TABLE #log(
log_event int identity(1,1) not null PRIMARY KEY,
log_ticket int not null,
log_group int not null,
curr_group int not null,
log_entry varchar(100) )
INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)
SELECT
1,
1,
1,
'test'
UNION ALL
SELECT
1,
1,
2,
'test2'
UNION ALL
SELECT
1,
1,
1,
'test3'
UNION ALL
SELECT
1,
1,
4,
'test4'
UNION ALL
SELECT
2,
2,
2,
'test5'
UNION ALL
SELECT
2,
2,
1,
'test6'
UNION ALL
SELECT
2,
2,
3,
'test7'
UNION ALL
SELECT
2,
2,
2,
'test8'
Count the number of times that curr_group changes for each log_ticket
Results should look something like
log_ticket number_of_changes
1 3
2 3
The above is a basic simulation of what I'm working with. My problem is I have about 500K records... my first thought was to LEFT JOIN the table to itself based on log_ticet = log_ticket and log_event + 1 = log_event and use that to check the next record to compare the groups.
It works... but it takes about 7 minutes on average to process... I was wondering if there was a better way.

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

terrance.steadman SSCrazy Points: 2097 More actions · Answer 1

Greetings Mark,

Considering that you are wanting the changes and not the original, then a simple GROUP BY log_ticket and a COUNT of curr_group may work. Then you can just subtract 1 to remove the original record from the COUNT to find the number of changes. Add a HAVING clause then to only return the ones with changes.

SELECT

log_ticket,

COUNT(curr_group) - 1 AS ChangeCnt

FROM #log

GROUP BY log_ticket

HAVING

(COUNT(curr_group) - 1) > 0

Have a good day.

Terry Steadman

mtassin SSC-Insane Points: 23099 More actions · Answer 2

Blah I should have spent more time on the test data.

Multiple entries can be entered in sequence for the same group, it doesn't always change from record to record. Just need to count the times it does change.

INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)

SELECT

1,

'test'

UNION ALL

SELECT

1,

2,

'test2'

UNION ALL

SELECT

1,

'test3'

UNION ALL

SELECT

1,

4,

'test4'

UNION ALL

SELECT

2,

'test5'

UNION ALL

SELECT

2,

1,

'test6'

UNION ALL

SELECT

2,

3,

'test7'

UNION ALL

SELECT

2,

'test8'

UNION ALL

SELECT

1,

4,

'test9'

UNION ALL

SELECT

1,

4,

'test10'

UNION ALL

SELECT

1,

3,

'test11'

The results should be

log_ticket change_count

1 4

2 3

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]

PR-572948 Grasshopper Points: 21 More actions · Answer 3

Hope this will meet your requirement:

select log_ticket, count( distinct curr_group) from #log

group by log_ticket

Output of above query is same as you needed.

mtassin SSC-Insane Points: 23099 More actions · Answer 4

The output is the same, but that's coincidence.

I need a solution that even if I had 20 changes between groups on a single ticket say

1 to 2 to 1 to 3 to 1 to 4 to 1 to 2 to 3 to 1 to 4 to 1 to 2 to 3... etc

it would count every change, not every instance of the group.

count(curr_group) will count each group... which means that if three entries are made in sequence in the same group it will count too high.

count(distinct curr_group) will count each group once... which means that if the assigned group bounces back and forth between 4 groups 20 times it will only see 4 changes.

if there was a count(change of curr_group) I'd be set. which is why my present solution works.. it just takes too long.

Here's what I have that works...

create table #results(

results_id int identity(1,1) NOT NULL PRIMARY KEY,

log_event int,

log_ticket int,

log_group int,

curr_group int)

INSERT INTO #results(log_event,log_ticket,log_group,curr_group)

select

log_event,

log_ticket,

log_group,

curr_group

FROM

#log

ORDER BY

log_ticket,log_event

SELECT

a.log_ticket,

a.log_event,

change_count = case when a.curr_group != b.curr_group then 1 else 0 end

INTO #results_count

FROM

#results a JOIN #results b ON

a.results_id +1 = b.results_id and

a.log_ticket = b.log_ticket

select

a.log_ticket,

change_sum = sum(isnull(b.change_count,0))

FROM

#log a LEFT JOIN #results_count b ON

a.log_ticket = b.log_ticket and

a.log_event = b.log_event

GROUP BY

a.log_ticket

And on 11 records it works great... On 500,000 it takes 15 minutes... which is too long.

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 5

Not sure how well this will perform, but should give correct results

WITH CTE AS (

SELECT log_ticket,

ROW_NUMBER() OVER(PARTITION BY log_ticket ORDER BY log_event) -

ROW_NUMBER() OVER(PARTITION BY log_ticket,curr_group ORDER BY log_event) AS rnDiff

FROM #log)

SELECT log_ticket,

COUNT(DISTINCT rnDiff) AS change_count

FROM CTE

GROUP BY log_ticket

ORDER BY log_ticket

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

mtassin SSC-Insane Points: 23099 More actions · Answer 6

That's awesome...

Sadly I screwed up... been working in our external data centers too long and posted a SQL 2000 question in a SQL 2005 forum (our data centers for our customers run 2k5... internally we're a bit slower than that)... I'll report this and see if somebody can't move this to the T-SQL 2k forum instead.

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]