Counting Changes

  • Given the following

    CREATE TABLE #log(

    log_event int identity(1,1) not null PRIMARY KEY,

    log_ticket int not null,

    log_group int not null,

    curr_group int not null,

    log_entry varchar(100) )

    INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)

    SELECT

    1,

    1,

    1,

    'test'

    UNION ALL

    SELECT

    1,

    1,

    2,

    'test2'

    UNION ALL

    SELECT

    1,

    1,

    1,

    'test3'

    UNION ALL

    SELECT

    1,

    1,

    4,

    'test4'

    UNION ALL

    SELECT

    2,

    2,

    2,

    'test5'

    UNION ALL

    SELECT

    2,

    2,

    1,

    'test6'

    UNION ALL

    SELECT

    2,

    2,

    3,

    'test7'

    UNION ALL

    SELECT

    2,

    2,

    2,

    'test8'

    Count the number of times that curr_group changes for each log_ticket

    Results should look something like

    log_ticket number_of_changes

    1 3

    2 3

    The above is a basic simulation of what I'm working with. My problem is I have about 500K records... my first thought was to LEFT JOIN the table to itself based on log_ticet = log_ticket and log_event + 1 = log_event and use that to check the next record to compare the groups.

    It works... but it takes about 7 minutes on average to process... I was wondering if there was a better way.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Greetings Mark,

    Considering that you are wanting the changes and not the original, then a simple GROUP BY log_ticket and a COUNT of curr_group may work. Then you can just subtract 1 to remove the original record from the COUNT to find the number of changes. Add a HAVING clause then to only return the ones with changes.

    SELECT

    log_ticket,

    COUNT(curr_group) - 1 AS ChangeCnt

    FROM #log

    GROUP BY log_ticket

    HAVING

    (COUNT(curr_group) - 1) > 0

    Have a good day.

    Terry Steadman

  • Blah I should have spent more time on the test data.

    Multiple entries can be entered in sequence for the same group, it doesn't always change from record to record. Just need to count the times it does change.

    INSERT INTO #log(log_ticket,log_group,curr_group,log_entry)

    SELECT

    1,

    1,

    1,

    'test'

    UNION ALL

    SELECT

    1,

    1,

    2,

    'test2'

    UNION ALL

    SELECT

    1,

    1,

    1,

    'test3'

    UNION ALL

    SELECT

    1,

    1,

    4,

    'test4'

    UNION ALL

    SELECT

    2,

    2,

    2,

    'test5'

    UNION ALL

    SELECT

    2,

    2,

    1,

    'test6'

    UNION ALL

    SELECT

    2,

    2,

    3,

    'test7'

    UNION ALL

    SELECT

    2,

    2,

    2,

    'test8'

    UNION ALL

    SELECT

    1,

    1,

    4,

    'test9'

    UNION ALL

    SELECT

    1,

    1,

    4,

    'test10'

    UNION ALL

    SELECT

    1,

    1,

    3,

    'test11'

    The results should be

    log_ticket change_count

    1 4

    2 3



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Hope this will meet your requirement:

    select log_ticket, count( distinct curr_group) from #log

    group by log_ticket

    Output of above query is same as you needed.

  • The output is the same, but that's coincidence.

    I need a solution that even if I had 20 changes between groups on a single ticket say

    1 to 2 to 1 to 3 to 1 to 4 to 1 to 2 to 3 to 1 to 4 to 1 to 2 to 3... etc

    it would count every change, not every instance of the group.

    count(curr_group) will count each group... which means that if three entries are made in sequence in the same group it will count too high.

    count(distinct curr_group) will count each group once... which means that if the assigned group bounces back and forth between 4 groups 20 times it will only see 4 changes.

    if there was a count(change of curr_group) I'd be set. which is why my present solution works.. it just takes too long.

    Here's what I have that works...

    create table #results(

    results_id int identity(1,1) NOT NULL PRIMARY KEY,

    log_event int,

    log_ticket int,

    log_group int,

    curr_group int)

    INSERT INTO #results(log_event,log_ticket,log_group,curr_group)

    select

    log_event,

    log_ticket,

    log_group,

    curr_group

    FROM

    #log

    ORDER BY

    log_ticket,log_event

    SELECT

    a.log_ticket,

    a.log_event,

    change_count = case when a.curr_group != b.curr_group then 1 else 0 end

    INTO #results_count

    FROM

    #results a JOIN #results b ON

    a.results_id +1 = b.results_id and

    a.log_ticket = b.log_ticket

    select

    a.log_ticket,

    change_sum = sum(isnull(b.change_count,0))

    FROM

    #log a LEFT JOIN #results_count b ON

    a.log_ticket = b.log_ticket and

    a.log_event = b.log_event

    GROUP BY

    a.log_ticket

    And on 11 records it works great... On 500,000 it takes 15 minutes... which is too long.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Not sure how well this will perform, but should give correct results

    WITH CTE AS (

    SELECT log_ticket,

    ROW_NUMBER() OVER(PARTITION BY log_ticket ORDER BY log_event) -

    ROW_NUMBER() OVER(PARTITION BY log_ticket,curr_group ORDER BY log_event) AS rnDiff

    FROM #log)

    SELECT log_ticket,

    COUNT(DISTINCT rnDiff) AS change_count

    FROM CTE

    GROUP BY log_ticket

    ORDER BY log_ticket

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • That's awesome...

    Sadly I screwed up... been working in our external data centers too long and posted a SQL 2000 question in a SQL 2005 forum (our data centers for our customers run 2k5... internally we're a bit slower than that)... I'll report this and see if somebody can't move this to the T-SQL 2k forum instead.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply