Selecting only rows with changed value

  • Hello,

    I have a table with three columns:

    1920 | ID16 | Value1

    1922 | ID16 | Value1

    1923 | ID19 | Value1

    1925 | ID10 | Value2

    1926 | ID16 | Value2

    1929 | ID20 | Value3

    1930 | ID16 | Value3

    1933 | ID16 | Value1

    1934 | ID30 | Value1

    Now I need the following result set:

    1920 | ID16 | Value1

    1925 | ID10 | Value2

    1929 | ID20 | Value3

    1933 | ID16 | Value1

    which means I need a Stored Procedure which gives me back only these rows, which have an altered value (3rd. column), independently of the other columns.

    And the best of all, it should be quite fast, because the table contains really a lot of rows.

    Has anybody an idea how to solve that?

    thanks, marcus

  • hard to follow what your asking?

    first off - these columns I'm presuming are the following:

    column 1 incremental numbers?

    column 2 ID's?

    first off you would need to know what the original value was because in your example you show:

    1920 | ID16 | Value1

    1925 | ID10 | Value2

    1929 | ID20 | Value3

    1933 | ID16 | Value1 

    which basically shows the ID16 having the same value?? 

     In cases where you have changing values it might be a good idea to create date/time column and have it default the date and time when data is entered.  this way no matter what ID is being pulled you could query it having the date column descending giving you the lateset updated value.

  • Hey Matt,

    I get it... when sorted in the order of the first column, you want to show only those rows whose value in the third column has changed from the previous row...

    Does your table have an auto-numbering ID or is the 1st column, by any chance, unique?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the answer to Jeff's question is that column 1 is unique and ascending (as it appears from your example), I still can't think of a quick way of doing this ... seems like you're heading towards the dreaded CURSOR to me.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Yes Jeff is right, columns 1 is unique and ascending

  • Ugh... No cursors... although it can certainly be done with one I don't have an answer right now (on the way to work) but, I've been challenged.  I whack at this tonight...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This may not be the most elegant, but how about:

    -----------------------

    set nocount on

    drop table #tmp

    select 1920 as AutoID, 'ID16' as ID, 'Value1' as Value

    into #tmp

    insert into #tmp values (1922, 'ID16', 'Value1')

    insert into #tmp values (1923, 'ID19', 'Value1')

    insert into #tmp values (1925, 'ID10', 'Value2')

    insert into #tmp values (1926, 'ID16', 'Value2')

    insert into #tmp values (1929, 'ID20', 'Value3')

    insert into #tmp values (1930, 'ID16', 'Value3')

    insert into #tmp values (1933, 'ID16', 'Value1')

    insert into #tmp values (1934, 'ID30', 'Value1')

    print 'Here is our original data:'

    select * from #tmp

    print 'Here are the results we get of changed items for col 3'

    print '(This assumes that col1 is an autoid)'

    print '(Remember, tho, that natural order of a relation should, in theory, be irrelevant...hence, why do this?)'

    select *

    from #tmp

    where AutoID = (select min(AutoID) from #tmp)

    UNION

    select t1.*

    from #tmp t1

    where value <> (select value

      from #tmp t2

      where t2.autoid = (select max(autoid)

         from #tmp t3

         where t3.autoid < t1.autoid))

    print 'We want to see:'

    print '1920 | ID16 | Value1'

    print '1925 | ID10 | Value2'

    print '1929 | ID20 | Value3'

    print '1933 | ID16 | Value1'

    set nocount off

    -----------------------

    It returns:

    -----------------------

    Here is our original data:

    AutoID      ID   Value 

    ----------- ---- ------

    1922        ID16 Value1

    1923        ID19 Value1

    1925        ID10 Value2

    1926        ID16 Value2

    1929        ID20 Value3

    1930        ID16 Value3

    1933        ID16 Value1

    1934        ID30 Value1

    1920        ID16 Value1

    Here are the results we get of changed items for col 3

    (This assumes that col1 is an autoid)

    (Remember, tho, that natural order of a relation should, in theory, be irrelevant...hence, why do this?)

    AutoID      ID   Value 

    ----------- ---- ------

    1920        ID16 Value1

    1925        ID10 Value2

    1929        ID20 Value3

    1933        ID16 Value1

    We want to see:

    1920 | ID16 | Value1

    1925 | ID10 | Value2

    1929 | ID20 | Value3

    1933 | ID16 | Value1

    -----------------------

    Is this what you're looking for?

    Paul

  • This should work, but may be bad performance (depending on indexing) for a lot of rows

    select a.*

    from 

    a

    left outer join

    b

    on b.col1 = (select max(col1) as col1 from

    where col1 < a.col1)

    where isnull(b.col3,'') <> a.col3

    order by a.col1

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Okay, you guys are great.

    Thanks a lot for all the effort you put in. It seems that this is exactly what I was looking for.

    thanks again, marcus

  • Jeff,

    Well done, I do not want to steal your thunder but I must take issue with your post.

    First, my query does return the first value (ID=1920). 

    Second, I extended the data to 100,000 rows and put a PK on ID (otherwise query takes too long) and ran your query against mine and mine ran consistently faster.

    I know it was only a test but I did run the queries several times to check and was surprised at the results

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David,

    I'd say that I owe you an apology!  Your code does, in fact, return all of the rows expected!  I don't know what I did to your code late last night but I copied and modified it again this morning to match the test table I'd set up and your code ran as expected.

    Again, I sure am sorry... don't know what the heck happened.  I've deleted my previous erroneous post.

    Thank you for the feed back on the speed... Yeah, that really surprises the heck out of me, too!  The <> on my the machine I'm using seems to be a real killer.  Sent the same code samples I used to a buddy to run and he agrees with your test results.  Don't know what the difference is, yet.  Guess I have more testing to do 'cause speed will really be important on the project coming up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No apology needed. 

    I liked your code, it is the way I like to do it, not sure why I did my differently other than it seemed to work.

    As for the speed, I did it out of curiosity more than anything. I suspect someone will find some test that will reverse the timings, who knows.

    I ran my tests on SQL2K on a box that is currently only used by our Finance people so performance may be relevant to server load. At one point I did notice my query used more cpu, so again may be relevant.

    The other thing that surprised me was the difference in the execution plans.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • David,

    Thank you for your kindness...

    I've sent the code samples to 4 different people and have run it on the server at work.  In every case, your code with the <> ran nearly twice as fast as the "postive" logic code I wrote.  And you're right... the differences in the Execution Plan were amazing.

    I still don't know why my machine was (is) indicating the reverse of our now mutual findings. 

    Rather than mislead anyone, I've deleted my posting because it was flat-out wrong!  Man, am I glad you did the followup speed test!  Thanks a million!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply