Compare two columns

  • Hi guys,
      need quantity from previous row , when prev_orderid=orderid from previous row.
    for ex:
     If prev_orderid=44086564 matches orderid = 44086564 then i need to select qty as 1900.

    please see sql below:
    create table orders (
    orderid varchar(20),
    prev_orderid varchar(20),
    flag char(1),
    cdate date,
    ord_typ varchar(10),
    quantity int,
    left_quantity int,
    seq int,
    symb varchar(10)
    );

    INSERT INTO orders (orderid, prev_orderid,flag, cdate,ord_typ,quantity,left_quantity,seq,symb)
    VALUES ('44086564', '0','N' ,'2015-01-01','new',2000,2000,1,'abc');
    INSERT INTO orders (orderid, prev_orderid,flag, cdate,ord_typ,quantity,left_quantity,seq,symb)
    VALUES ('44086564', '0', 'Y','2015-01-01','mod',1900,1900,2,'abc');
    INSERT INTO orders (orderid, prev_orderid,flag, cdate,ord_typ,quantity,left_quantity,seq,symb)
    VALUES ('44672901', '44086564','N', '2015-01-01','cancel/rep',2400,2400,3,'abc');
    INSERT INTO orders (orderid, prev_orderid,flag, cdate,ord_typ,quantity,left_quantity,seq,symb)
    VALUES ('44672901', '0', 'Y','2015-01-01','mod',2300,2300,4,'abc');
    INSERT INTO orders (orderid, prev_orderid, flag,cdate,ord_typ,quantity,left_quantity,seq,symb)
    VALUES ('44672901', '0', 'Y','2015-01-01','mod',2200,2200,5,'abc');

  • Where two rows have the same order Id, how do you define which one comes first?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • it depends on the seq. number .. seq number is alloted as transactions come in  .. they are in chronological order.

    thanks,

  • How about using LAG?
    SELECT *,
           CASE WHEN LAG(Orderid) OVER (ORDER BY orderid, seq) = prev_orderid THEN LAG(quantity) OVER (ORDER BY orderid, seq) END AS Prev_quantity
    FROM orders;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Don't know if this suffice the requirement. Below code is efficient than the above

    select b.*,a.quantity from orders a
    right join orders b
    on a.orderid = b.prev_orderid
    and a.seq - b.seq = -1

    First solve the problem then write the code !

  • TheCTEGuy - Monday, October 9, 2017 8:00 AM

    Don't know if this suffice the requirement. Below code is efficient than the above

    select b.*,a.quantity from orders a
    right join orders b
    on a.orderid = b.prev_orderid
    and a.seq - b.seq = -1

    Well, take a good look at what has to happen.   EVERY value of a.seq has to have EVERY value of b.seq subtracted before the query can determine which rows to select.   Not necessarily good when LAG can be used instead.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • TheCTEGuy - Monday, October 9, 2017 8:00 AM

    Don't know if this suffice the requirement. Below code is efficient than the above

    select b.*,a.quantity from orders a
    right join orders b
    on a.orderid = b.prev_orderid
    and a.seq - b.seq = -1

    Looking at the query plan alone isn't everything (Jeff taught me that one!):
    Using Lag:
    (5 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'orders'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Using RIGHT JOIN:
    (5 rows affected)
    Table 'orders'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that your query uses 2 scans and 6 logical reads, where as LAG has 1 on each. On a larger table using a RIGHT JOIN would be a lot more costly than using LAG.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Monday, October 9, 2017 8:42 AM

    TheCTEGuy - Monday, October 9, 2017 8:00 AM

    Don't know if this suffice the requirement. Below code is efficient than the above

    select b.*,a.quantity from orders a
    right join orders b
    on a.orderid = b.prev_orderid
    and a.seq - b.seq = -1

    Looking at the query plan alone isn't everything (Jeff taught me that one!):
    Using Lag:
    (5 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'orders'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Using RIGHT JOIN:
    (5 rows affected)
    Table 'orders'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that your query uses 2 scans and 6 logical reads, where as LAG has 1 on each. On a larger table using a RIGHT JOIN would be a lot more costly than using LAG.

    Can you prove that with an example.

    First solve the problem then write the code !

  • TheCTEGuy - Monday, October 9, 2017 8:50 AM

    Can you prove that with an example.

    What do you mean prove? That is pasted from SET STATISTICS IO ON. Is that not proof enough?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Monday, October 9, 2017 8:53 AM

    TheCTEGuy - Monday, October 9, 2017 8:50 AM

    Can you prove that with an example.

    What do you mean prove? That is pasted from SET STATISTICS IO ON. Is that not proof enough?

    U said for large tables.  So can u give an example of same with large table.

    First solve the problem then write the code !

  • Using the vTally table, i created a table with 10,000,000 rows:
    SELECT TOP 10000000 *, ABS(CHECKSUM(NEWID()) % 200) AS RandNo, ABS(CHECKSUM(NEWID()) % 200) AS PrevRandNo
    INTO RandomNumbers
    FROM vTally V;

    I then switched Statistics on:
    SET STATISTICS IO ON;
    I ran the following queries (separately) and timed them:
    select b.*,a.N from RandomNumbers a
    right join RandomNumbers b
    on a.RandNo = b.PrevRandNo
    and a.N - b.N = -1

    SELECT *,
       CASE WHEN LAG(RandNo) OVER (ORDER BY orderid, seq) = PrevRandNo THEN LAG(N) OVER (ORDER BY orderid, seq) END AS Prev_quantity
    FROM RandomNumbers;

     And then, for completeness here's the Clean up (DROP Commented out):
    SET STATISTICS IO OFF;

    --DROP TABLE RandomNumbers;

    These queries were run on a SQL Server 2012 Developer Environment, with 8GB of Max memory for the SQL Server Instance (as it's our Sandbox for messing around on, it doesn't have a lot of Memory), and 4 x Intel Xeon E5 Virtual Cores @ 2.1Ghz.

    The first query (using a RIGHT JOIN) was running for over 10 minutes before I cancelled it. Unfortunately this means I have no statistics to paste as the query was cancelled.

    The second, query, on the other hand (using LAG) completed in 57 seconds! The statistics are below:
    (10000000 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 28412, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'RandomNumbers'. Scan count 5, logical reads 30960, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I am running the other query again now (at the time of making this post it has already exceeded 2 minutes 30 seconds again), and I will reply again with the other statistics once I finally have them.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Monday, October 9, 2017 9:18 AM

    Using the vTally table, i created a table with 10,000,000 rows:
    SELECT TOP 10000000 *, ABS(CHECKSUM(NEWID()) % 200) AS RandNo, ABS(CHECKSUM(NEWID()) % 200) AS PrevRandNo
    INTO RandomNumbers
    FROM vTally V;

    I then switched Statistics on:
    SET STATISTICS IO ON;
    I ran the following queries (separately) and timed them:
    select b.*,a.N from RandomNumbers a
    right join RandomNumbers b
    on a.RandNo = b.PrevRandNo
    and a.N - b.N = -1

    SELECT *,
       CASE WHEN LAG(RandNo) OVER (ORDER BY orderid, seq) = PrevRandNo THEN LAG(N) OVER (ORDER BY orderid, seq) END AS Prev_quantity
    FROM RandomNumbers;

     And then, for completeness here's the Clean up (DROP Commented out):
    SET STATISTICS IO OFF;

    --DROP TABLE RandomNumbers;

    These queries were run on a SQL Server 2012 Developer Environment, with 8GB of Max memory for the SQL Server Instance (as it's our Sandbox for messing around on, it doesn't have a lot of Memory), and 4 x Intel Xeon E5 Virtual Cores @ 2.1Ghz.

    The first query (using a RIGHT JOIN) was running for over 10 minutes before I cancelled it. Unfortunately this means I have no statistics to paste as the query was cancelled.

    The second, query, on the other hand (using LAG) completed in 57 seconds! The statistics are below:
    (10000000 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 28412, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'RandomNumbers'. Scan count 5, logical reads 30960, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I am running the other query again now (at the time of making this post it has already exceeded 2 minutes 30 seconds again), and I will reply again with the other statistics once I finally have them.

    That's a great explanation.  Thanks:)

    First solve the problem then write the code !

  • TheCTEGuy - Monday, October 9, 2017 9:01 AM

    Thom A - Monday, October 9, 2017 8:53 AM

    TheCTEGuy - Monday, October 9, 2017 8:50 AM

    Can you prove that with an example.

    What do you mean prove? That is pasted from SET STATISTICS IO ON. Is that not proof enough?

    U said for large tables.  So can u give an example of same with large table.

    The LAG function is O(n), but the JOIN is at best O(n*log(n)) and at worst O(n^2).  The best case scenario is when it is able to do a index seek on the right table for each record in the left table; the worst case scenario is when it does an index/table scan on the right table for each record in the left table.

    Is that proof enough for you?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, October 9, 2017 9:31 AM

    The LAG function is O(n), but the JOIN is at best O(n*log(n)) and at worst O(n^2).  The best case scenario is when it is able to do a index seek on the right table for each record in the left table; the worst case scenario is when it does an index/table scan on the right table for each record in the left table.

    Is that proof enough for you?

    Drew

    If not, the query is still running (over 20 minutes now :blink:). Currently at around row 702,219. 🙂

    Edit (@ 17:07 BST): So, it's now been running for over 50 minutes and has processed approximately 1,726,700 rows. I'd hazard a guess that the total execution time is going to be around the 5 hour mark. I'm going to leave it running overnight (home time now), but considering that the LAG query took 57 seconds and we take 5 hours as the run time for the RIGHT JOIN, then the query using RIGHT JOIN is about 300 times slower. 😉

    Edit: Ok, the query failed, I had a time out (which I believe happens at 4 hours). Either way, point proved. 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • TheCTEGuy - Monday, October 9, 2017 8:50 AM

    Thom A - Monday, October 9, 2017 8:42 AM

    TheCTEGuy - Monday, October 9, 2017 8:00 AM

    Don't know if this suffice the requirement. Below code is efficient than the above

    select b.*,a.quantity from orders a
    right join orders b
    on a.orderid = b.prev_orderid
    and a.seq - b.seq = -1

    Looking at the query plan alone isn't everything (Jeff taught me that one!):
    Using Lag:
    (5 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'orders'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Using RIGHT JOIN:
    (5 rows affected)
    Table 'orders'. Scan count 2, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Notice that your query uses 2 scans and 6 logical reads, where as LAG has 1 on each. On a larger table using a RIGHT JOIN would be a lot more costly than using LAG.

    Can you prove that with an example.

    Really?  And the logical basis for justifying the need for proof of an exceedingly clear example is what, exactly?  I realize you're trying to learn, but the last thing you should be doing is looking at something fairly obvious and saying "prove it".   As you're still learning, you don't have the credentials to justify that statement.    A better way to learn would have been to ask how you might construct a test so that you can see the impact for a larger table.   Comments like what you posted could have been interpreted as either condescending, insulting, or both.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply