Query Help

  • Eirikur Eiriksson (8/23/2014)


    Shadab Shah (8/23/2014)


    Hi Eirikur,

    I have being trying to help OP with this question with the help of your solution.

    I think the grouping in your solution is incorrect.

    If we consider the first group the grouping should end at HardDrink, but that is not the case.

    Could you please guide if i had understand your solution.

    Thanks.

    Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.

    😎

    A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!

    Here is a solution based on my previous post, one of many ways of doing this.

    CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)

    INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed

    INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed

    INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')

    INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest

    INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta

    INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')

    INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')

    INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')

    INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')

    ;WITH BASE_DATA AS

    (

    SELECT

    T_ID

    ,T.UserName

    ,T.Drink

    ,T.Value

    ,T.CreatedDate

    FROM #TEMP T

    )

    ,GROUPED_DATA AS

    (

    SELECT

    BD.T_ID

    ,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID

    ,BD.UserName

    ,BD.Drink

    ,BD.Value

    ,BD.CreatedDate

    FROM BASE_DATA BD

    OUTER APPLY

    (

    SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2

    WHERE B2.Drink = 'HardDrink'

    AND BD.T_ID < B2.T_ID

    ) AS X

    )

    ,DRINK_OF_THE_DAY AS

    (

    SELECT

    MAX(T_ID) AS T_ID

    FROM GROUPED_DATA GD

    WHERE GD.Drink NOT IN ('HardDrink','XXX')

    GROUP BY GD.GROUP_ID

    )

    SELECT

    GD.UserName

    ,GD.Drink

    ,GD.Value

    ,GD.CreatedDate

    FROM GROUPED_DATA GD

    INNER JOIN DRINK_OF_THE_DAY DOTD

    ON GD.T_ID = DOTD.T_ID

    DROP TABLE #TEMP;

    Results

    UserName Drink Value CreatedDate

    --------- ------- ------ -----------

    ABC Pepsi 50 2014-01-14

    ABC Fanta 36 2014-01-20

    ABC Pepsi 510 2014-02-24

    ABC Fanta 136 2014-02-26

    That's some very nice modularized code. My concern at this point would be the 4 table scans rather than just 2.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/23/2014)


    That's some very nice modularized code.

    I am striving to maintain the readability of my code;-)

    My concern at this point would be the 4 table scans rather than just 2.

    Or even one using the Window functions on SQL Server 2012 or later.

    😎

  • Eirikur Eiriksson (8/23/2014)


    Shadab Shah (8/23/2014)


    Hi Eirikur,

    I have being trying to help OP with this question with the help of your solution.

    I think the grouping in your solution is incorrect.

    If we consider the first group the grouping should end at HardDrink, but that is not the case.

    Could you please guide if i had understand your solution.

    Thanks.

    Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.

    😎

    A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!

    Here is a solution based on my previous post, one of many ways of doing this.

    CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)

    INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed

    INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed

    INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')

    INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest

    INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta

    INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')

    INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')

    INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')

    INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')

    ;WITH BASE_DATA AS

    (

    SELECT

    T_ID

    ,T.UserName

    ,T.Drink

    ,T.Value

    ,T.CreatedDate

    FROM #TEMP T

    )

    ,GROUPED_DATA AS

    (

    SELECT

    BD.T_ID

    ,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID

    ,BD.UserName

    ,BD.Drink

    ,BD.Value

    ,BD.CreatedDate

    FROM BASE_DATA BD

    OUTER APPLY

    (

    SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2

    WHERE B2.Drink = 'HardDrink'

    AND BD.T_ID < B2.T_ID

    ) AS X

    )

    ,DRINK_OF_THE_DAY AS

    (

    SELECT

    MAX(T_ID) AS T_ID

    FROM GROUPED_DATA GD

    WHERE GD.Drink NOT IN ('HardDrink','XXX')

    GROUP BY GD.GROUP_ID

    )

    SELECT

    GD.UserName

    ,GD.Drink

    ,GD.Value

    ,GD.CreatedDate

    FROM GROUPED_DATA GD

    INNER JOIN DRINK_OF_THE_DAY DOTD

    ON GD.T_ID = DOTD.T_ID

    DROP TABLE #TEMP;

    Results

    UserName Drink Value CreatedDate

    --------- ------- ------ -----------

    ABC Pepsi 50 2014-01-14

    ABC Fanta 36 2014-01-20

    ABC Pepsi 510 2014-02-24

    ABC Fanta 136 2014-02-26

    Hi Eirikur ,

    Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.

    Thanks for the solution :-).

    I got a little problem, i am working on it.

    I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.

    I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.

    Could you please help me to optimised this query.

    Currently i am working on the below approach.

    1. Dump those record in temp table

    2. Apply your logic on the temp table and create the required indexes.

    Thanks.

  • JackTimber (8/24/2014)


    Hi Eirikur ,

    Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.

    Thanks for the solution :-).

    I got a little problem, i am working on it.

    I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.

    I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.

    Could you please help me to optimised this query.

    Currently i am working on the below approach.

    1. Dump those record in temp table

    2. Apply your logic on the temp table and create the required indexes.

    Thanks.

    No worries, we'll figure this out in the end. Like Jeff Moden mentioned, there is some room for improvement, I'll have a look later when I have time.

    😎

  • Eirikur Eiriksson (8/24/2014)


    Jeff Moden (8/23/2014)


    That's some very nice modularized code.

    I am striving to maintain the readability of my code;-)

    My concern at this point would be the 4 table scans rather than just 2.

    Or even one using the Window functions on SQL Server 2012 or later.

    😎

    Exactly. Lead/Lag could work very well here. I've found out from JackTimber that there are 20 million rows in the table. Because it's 2008 and not 2012, I'm seriously considering the "Quirky Update" on an extracted set of rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • JackTimber (8/24/2014)


    Eirikur Eiriksson (8/23/2014)


    Shadab Shah (8/23/2014)


    Hi Eirikur,

    I have being trying to help OP with this question with the help of your solution.

    I think the grouping in your solution is incorrect.

    If we consider the first group the grouping should end at HardDrink, but that is not the case.

    Could you please guide if i had understand your solution.

    Thanks.

    Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.

    😎

    A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!

    Here is a solution based on my previous post, one of many ways of doing this.

    CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)

    INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed

    INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed

    INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')

    INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest

    INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta

    INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')

    INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')

    INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')

    INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')

    ;WITH BASE_DATA AS

    (

    SELECT

    T_ID

    ,T.UserName

    ,T.Drink

    ,T.Value

    ,T.CreatedDate

    FROM #TEMP T

    )

    ,GROUPED_DATA AS

    (

    SELECT

    BD.T_ID

    ,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID

    ,BD.UserName

    ,BD.Drink

    ,BD.Value

    ,BD.CreatedDate

    FROM BASE_DATA BD

    OUTER APPLY

    (

    SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2

    WHERE B2.Drink = 'HardDrink'

    AND BD.T_ID < B2.T_ID

    ) AS X

    )

    ,DRINK_OF_THE_DAY AS

    (

    SELECT

    MAX(T_ID) AS T_ID

    FROM GROUPED_DATA GD

    WHERE GD.Drink NOT IN ('HardDrink','XXX')

    GROUP BY GD.GROUP_ID

    )

    SELECT

    GD.UserName

    ,GD.Drink

    ,GD.Value

    ,GD.CreatedDate

    FROM GROUPED_DATA GD

    INNER JOIN DRINK_OF_THE_DAY DOTD

    ON GD.T_ID = DOTD.T_ID

    DROP TABLE #TEMP;

    Results

    UserName Drink Value CreatedDate

    --------- ------- ------ -----------

    ABC Pepsi 50 2014-01-14

    ABC Fanta 36 2014-01-20

    ABC Pepsi 510 2014-02-24

    ABC Fanta 136 2014-02-26

    Hi Eirikur ,

    Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.

    Thanks for the solution :-).

    I got a little problem, i am working on it.

    I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.

    I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.

    Could you please help me to optimised this query.

    Currently i am working on the below approach.

    1. Dump those record in temp table

    2. Apply your logic on the temp table and create the required indexes.

    Thanks.

    How long did Chris' solution take?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is a very well performing solution which adds a covering filtered index, simply blazing fast.;-) Even without the index it is still more than three times faster than previous solutions. Of course the building of the index will take some time but my thought is that the building time will be recovered during the execution.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)

    INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed

    INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed

    INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')

    INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest

    INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')

    INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')

    INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')

    INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta

    INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')

    INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')

    INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')

    INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')

    INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03');

    CREATE INDEX TMP_TBL_CREATEDATE_ASC_INCL_ALL ON #TEMP (CreatedDate ASC) INCLUDE (UserName,Drink,Value) WHERE (Drink <> 'XXX');

    SET STATISTICS IO ON;

    ;WITH BASE_DATA AS

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY T.CreatedDate) AS T_RID

    ,T.UserName

    ,T.Drink

    ,T.Value

    ,T.CreatedDate

    FROM #TEMP T

    WHERE T.Drink <> 'XXX'

    )

    SELECT

    BD.UserName

    ,BD.Drink

    ,BD.Value

    ,BD.CreatedDate

    FROM BASE_DATA BD

    LEFT OUTER JOIN BASE_DATA BLEAD

    ON BD.T_RID = BLEAD.T_RID - 1

    WHERE BLEAD.Drink = 'HardDrink' OR BLEAD.T_RID IS NULL;

    SET STATISTICS IO OFF;

    DROP TABLE #TEMP;

    Edit: typo

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply