August 23, 2014 at 8:45 pm
Eirikur Eiriksson (8/23/2014)
Shadab Shah (8/23/2014)
Hi Eirikur,I have being trying to help OP with this question with the help of your solution.
I think the grouping in your solution is incorrect.
If we consider the first group the grouping should end at HardDrink, but that is not the case.
Could you please guide if i had understand your solution.
Thanks.
Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.
😎
A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!
Here is a solution based on my previous post, one of many ways of doing this.
CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)
INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed
INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')
INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed
INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')
INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')
INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest
INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')
INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta
INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')
INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')
INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')
INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')
;WITH BASE_DATA AS
(
SELECT
T_ID
,T.UserName
,T.Drink
,T.Value
,T.CreatedDate
FROM #TEMP T
)
,GROUPED_DATA AS
(
SELECT
BD.T_ID
,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID
,BD.UserName
,BD.Drink
,BD.Value
,BD.CreatedDate
FROM BASE_DATA BD
OUTER APPLY
(
SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2
WHERE B2.Drink = 'HardDrink'
AND BD.T_ID < B2.T_ID
) AS X
)
,DRINK_OF_THE_DAY AS
(
SELECT
MAX(T_ID) AS T_ID
FROM GROUPED_DATA GD
WHERE GD.Drink NOT IN ('HardDrink','XXX')
GROUP BY GD.GROUP_ID
)
SELECT
GD.UserName
,GD.Drink
,GD.Value
,GD.CreatedDate
FROM GROUPED_DATA GD
INNER JOIN DRINK_OF_THE_DAY DOTD
ON GD.T_ID = DOTD.T_ID
DROP TABLE #TEMP;
Results
UserName Drink Value CreatedDate
--------- ------- ------ -----------
ABC Pepsi 50 2014-01-14
ABC Fanta 36 2014-01-20
ABC Pepsi 510 2014-02-24
ABC Fanta 136 2014-02-26
That's some very nice modularized code. My concern at this point would be the 4 table scans rather than just 2.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2014 at 12:27 am
Jeff Moden (8/23/2014)
That's some very nice modularized code.
I am striving to maintain the readability of my code;-)
My concern at this point would be the 4 table scans rather than just 2.
Or even one using the Window functions on SQL Server 2012 or later.
😎
August 24, 2014 at 4:10 am
Eirikur Eiriksson (8/23/2014)
Shadab Shah (8/23/2014)
Hi Eirikur,I have being trying to help OP with this question with the help of your solution.
I think the grouping in your solution is incorrect.
If we consider the first group the grouping should end at HardDrink, but that is not the case.
Could you please guide if i had understand your solution.
Thanks.
Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.
😎
A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!
Here is a solution based on my previous post, one of many ways of doing this.
CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)
INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed
INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')
INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed
INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')
INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')
INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest
INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')
INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta
INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')
INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')
INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')
INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')
;WITH BASE_DATA AS
(
SELECT
T_ID
,T.UserName
,T.Drink
,T.Value
,T.CreatedDate
FROM #TEMP T
)
,GROUPED_DATA AS
(
SELECT
BD.T_ID
,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID
,BD.UserName
,BD.Drink
,BD.Value
,BD.CreatedDate
FROM BASE_DATA BD
OUTER APPLY
(
SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2
WHERE B2.Drink = 'HardDrink'
AND BD.T_ID < B2.T_ID
) AS X
)
,DRINK_OF_THE_DAY AS
(
SELECT
MAX(T_ID) AS T_ID
FROM GROUPED_DATA GD
WHERE GD.Drink NOT IN ('HardDrink','XXX')
GROUP BY GD.GROUP_ID
)
SELECT
GD.UserName
,GD.Drink
,GD.Value
,GD.CreatedDate
FROM GROUPED_DATA GD
INNER JOIN DRINK_OF_THE_DAY DOTD
ON GD.T_ID = DOTD.T_ID
DROP TABLE #TEMP;
Results
UserName Drink Value CreatedDate
--------- ------- ------ -----------
ABC Pepsi 50 2014-01-14
ABC Fanta 36 2014-01-20
ABC Pepsi 510 2014-02-24
ABC Fanta 136 2014-02-26
Hi Eirikur ,
Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.
Thanks for the solution :-).
I got a little problem, i am working on it.
I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.
I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.
Could you please help me to optimised this query.
Currently i am working on the below approach.
1. Dump those record in temp table
2. Apply your logic on the temp table and create the required indexes.
Thanks.
August 24, 2014 at 4:23 am
JackTimber (8/24/2014)
Hi Eirikur ,
Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.
Thanks for the solution :-).
I got a little problem, i am working on it.
I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.
I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.
Could you please help me to optimised this query.
Currently i am working on the below approach.
1. Dump those record in temp table
2. Apply your logic on the temp table and create the required indexes.
Thanks.
No worries, we'll figure this out in the end. Like Jeff Moden mentioned, there is some room for improvement, I'll have a look later when I have time.
😎
August 24, 2014 at 9:24 pm
Eirikur Eiriksson (8/24/2014)
Jeff Moden (8/23/2014)
That's some very nice modularized code.I am striving to maintain the readability of my code;-)
My concern at this point would be the 4 table scans rather than just 2.
Or even one using the Window functions on SQL Server 2012 or later.
😎
Exactly. Lead/Lag could work very well here. I've found out from JackTimber that there are 20 million rows in the table. Because it's 2008 and not 2012, I'm seriously considering the "Quirky Update" on an extracted set of rows.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2014 at 9:25 pm
JackTimber (8/24/2014)
Eirikur Eiriksson (8/23/2014)
Shadab Shah (8/23/2014)
Hi Eirikur,I have being trying to help OP with this question with the help of your solution.
I think the grouping in your solution is incorrect.
If we consider the first group the grouping should end at HardDrink, but that is not the case.
Could you please guide if i had understand your solution.
Thanks.
Whether the HardDrink falls in the next group is irrelevant, the main thing is that it marks either a beginning or an end of a group. Since HardDrink is ignored in the result set, it doesn't really matter. The thing is that I have tried to point the OP in the right direction with few hints but it looks like there isn't much of an effort on the other end.
😎
A last effort on this; Looping through a set is not the way to go with SQL, working on a grouped set is!
Here is a solution based on my previous post, one of many ways of doing this.
CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)
INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed
INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')
INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed
INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')
INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')
INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest
INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')
INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta
INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')
INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')
INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')
INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03')
;WITH BASE_DATA AS
(
SELECT
T_ID
,T.UserName
,T.Drink
,T.Value
,T.CreatedDate
FROM #TEMP T
)
,GROUPED_DATA AS
(
SELECT
BD.T_ID
,ISNULL(X.T_ID,(SELECT MAX(T_ID) FROM BASE_DATA)) AS GROUP_ID
,BD.UserName
,BD.Drink
,BD.Value
,BD.CreatedDate
FROM BASE_DATA BD
OUTER APPLY
(
SELECT MIN(T_ID) AS T_ID FROM BASE_DATA B2
WHERE B2.Drink = 'HardDrink'
AND BD.T_ID < B2.T_ID
) AS X
)
,DRINK_OF_THE_DAY AS
(
SELECT
MAX(T_ID) AS T_ID
FROM GROUPED_DATA GD
WHERE GD.Drink NOT IN ('HardDrink','XXX')
GROUP BY GD.GROUP_ID
)
SELECT
GD.UserName
,GD.Drink
,GD.Value
,GD.CreatedDate
FROM GROUPED_DATA GD
INNER JOIN DRINK_OF_THE_DAY DOTD
ON GD.T_ID = DOTD.T_ID
DROP TABLE #TEMP;
Results
UserName Drink Value CreatedDate
--------- ------- ------ -----------
ABC Pepsi 50 2014-01-14
ABC Fanta 36 2014-01-20
ABC Pepsi 510 2014-02-24
ABC Fanta 136 2014-02-26
Hi Eirikur ,
Actually i was trying at my end and due to slow internet connectivity downtown, i was not able reply to you.
Thanks for the solution :-).
I got a little problem, i am working on it.
I feel embarrassed to ask over and over again. I am a newbie and i think this grouped set would take time for me to understand.
I would like to share my problem. It would be great if you could please help.The table on which i am working contains nearly 2 million records. If i go with your solution it has taken more then one hour with no result return.
Could you please help me to optimised this query.
Currently i am working on the below approach.
1. Dump those record in temp table
2. Apply your logic on the temp table and create the required indexes.
Thanks.
How long did Chris' solution take?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 25, 2014 at 12:29 am
Here is a very well performing solution which adds a covering filtered index, simply blazing fast.;-) Even without the index it is still more than three times faster than previous solutions. Of course the building of the index will take some time but my thought is that the building time will be recovered during the execution.
😎
USE tempdb;
GO
SET NOCOUNT ON;
CREATE TABLE #TEMP (T_ID INT IDENTITY(1,1),UserName varchar(20),Drink varchar(20),Value int,CreatedDate date)
INSERT INTO #TEMP VALUES ('ABC','Coke',100,'2014-01-12') --Just one pepsi record in this loop would be displayed
INSERT INTO #TEMP VALUES ('ABC','Fanta',50,'2014-01-13')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',50,'2014-01-14')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',75,'2014-01-18')
INSERT INTO #TEMP VALUES ('ABC','Fanta',36,'2014-01-20') -- No Pepsi in this loop so no pepsi record would be displayed
INSERT INTO #TEMP VALUES ('ABC','XXX',45,'2014-01-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',75,'2014-02-04')
INSERT INTO #TEMP VALUES ('ABC','XXX',96,'2014-02-06')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',12,'2014-02-10')
INSERT INTO #TEMP VALUES ('ABC','Fanta',93,'2014-02-22') -- 2 Pepsi in this loop but the one with the value 510 would be shown because it is latest
INSERT INTO #TEMP VALUES ('ABC','XXX',101,'2014-02-22')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',150,'2014-02-23')
INSERT INTO #TEMP VALUES ('ABC','Pepsi',510,'2014-02-24')
INSERT INTO #TEMP VALUES ('ABC','HardDrink',175,'2014-02-25')
INSERT INTO #TEMP VALUES ('ABC','Fanta',136,'2014-02-26') --Since there is no end, Hence Either Fanta or Coke which ever latest record would be displayed. In this case Fanta
INSERT INTO #TEMP VALUES ('ABC','XXX',145,'2014-02-27')
INSERT INTO #TEMP VALUES ('ABC','XXX',175,'2014-02-28')
INSERT INTO #TEMP VALUES ('ABC','XXX',196,'2014-03-01')
INSERT INTO #TEMP VALUES ('ABC','XXX',112,'2014-03-02')
INSERT INTO #TEMP VALUES ('ABC','XXX',193,'2014-03-03');
CREATE INDEX TMP_TBL_CREATEDATE_ASC_INCL_ALL ON #TEMP (CreatedDate ASC) INCLUDE (UserName,Drink,Value) WHERE (Drink <> 'XXX');
SET STATISTICS IO ON;
;WITH BASE_DATA AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY T.CreatedDate) AS T_RID
,T.UserName
,T.Drink
,T.Value
,T.CreatedDate
FROM #TEMP T
WHERE T.Drink <> 'XXX'
)
SELECT
BD.UserName
,BD.Drink
,BD.Value
,BD.CreatedDate
FROM BASE_DATA BD
LEFT OUTER JOIN BASE_DATA BLEAD
ON BD.T_RID = BLEAD.T_RID - 1
WHERE BLEAD.Drink = 'HardDrink' OR BLEAD.T_RID IS NULL;
SET STATISTICS IO OFF;
DROP TABLE #TEMP;
Edit: typo
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply