May 26, 2016 at 1:29 pm
Wonder if you can help me with a query problem.
I have the following style table that runs into 10,000,000+ rows.
CREATE TABLE #example (
[customerId] [smallint] NOT NULL,
[window] [nvarchar](50) NULL,
[product] [smallint] NOT NULL,
[sales] [smallint] NULL)
Go
INSERT INTO #example
(customerId, window,product, sales)
values (1,'0-30', 1, 100)
,(1,'31-60', 2, 110)
,(1,'61-90', 3, 130)
,(2,'0-30', 1, null)
,(2,'31-60', 2, null)
,(2,'61-90', 4,null )
,(2,'91-120', 5, null)
,(3,'0-30', 3,120)
,(3,'31-60', 4, null)
,(3,'61-90', 6, null)
GO
I need to create a calculated column called Class that contains 0 if all the sales for each customer is null for all the windows else 1.
For instance, CustomerId 1, 3 would be 1, CustomerId 2 would be 0.
I suppose I can join to a grouped by query but wondered what the optimal query might be for a table this size.
Best
Lee
May 26, 2016 at 1:39 pm
SELECT *, CAST(COUNT(sales) OVER(PARTITION BY e.customerId) AS BIT)
FROM #example e
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 26, 2016 at 2:49 pm
This could be quite a bit more efficient. That COUNT gets especially expensive as the number of matching rows goes up.
SELECT *,
CASE WHEN s.sf = 1 THEN 1 ELSE 0 END AS sales_flag
FROM #example e
OUTER APPLY (SELECT TOP 1 1 AS sf
FROM #example e2
WHERE e.customerid = e2.customerid
AND e2.sales IS NOT NULL) AS s
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 26, 2016 at 4:18 pm
Since you are on a 2012 forum then why not use a window function?
😎
Note that a POC index like this would be very helpful
CREATE NONCLUSTERED INDEX NCLIDX_#EXAMPLE_CUST_SALES_INCL_WIN_PRODUCT ON #example
(customerId ASC, sales ASC) INCLUDE ( window,product);
SELECT
E.customerId
,E.window
,E.product
,E.sales
,SIGN(ISNULL(MAX(E.sales) OVER
(
PARTITION BY E.customerId
),0)) AS FLG
FROM #example E;
Output
customerId window product sales FLG
---------- -------- ------- ------ -----------
1 0-30 1 100 1
1 31-60 2 110 1
1 61-90 3 130 1
2 0-30 1 NULL 0
2 31-60 2 NULL 0
2 61-90 4 NULL 0
2 91-120 5 NULL 0
3 31-60 4 NULL 1
3 61-90 6 NULL 1
3 0-30 3 120 1
May 27, 2016 at 3:14 am
Thanks all for sharing. I'll let you know which has the highest performance. Haven't used the SIGN() function before 🙂
Best
Lee
May 27, 2016 at 11:09 am
SIGN is one of the things used in the '60s to develop characteristic functions (the precursor to the modern CASE statement), which allowed incredibly powerful data processing in a single pass through the data. If you think we are constrained by IO now just imagine what it was like then. 😀
Oh, I suspect that the MAX will suffer the same issue as COUNT when there isn't a POC index in place. It can't shortcut like TOP 1 (or EXISTS, which could possibly be employed here too) can and will have to touch all rows. Having a good index in place will most certainly help both MAX and TOP 1 (I think) though.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 27, 2016 at 6:57 pm
select customerId, window, product, sales,
( Case when count(sales) over(partition by customerid order by customerid) >0 then 1 else 0 end)as class
from #example
May 28, 2016 at 3:23 am
santiagoc93 (5/27/2016)
select customerId, window, product, sales,
( Case when count(sales) over(partition by customerid order by customerid) >0 then 1 else 0 end)as class
from #example
Very nice!
😎
The addition of the ORDER BY eliminates the table spool operators and the query turns into a single index scan with stream aggregate if there is a POC index on the table .
May 28, 2016 at 9:01 am
Eirikur Eiriksson (5/28/2016)
santiagoc93 (5/27/2016)
select customerId, window, product, sales,
( Case when count(sales) over(partition by customerid order by customerid) >0 then 1 else 0 end)as class
from #example
Very nice!
😎
The addition of the ORDER BY eliminates the table spool operators and the query turns into a single index scan with stream aggregate if there is a POC index on the table .
But it still suffers from the "count everything" problem which gets ugly as the number of matching rows increases per customerid.
It would be nice to see some performance comparisons of the methods for this since it is a very common request. Something tickles the back of my mind though that Itzik has already done this...
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 28, 2016 at 9:39 am
TheSQLGuru (5/28/2016)
Eirikur Eiriksson (5/28/2016)
santiagoc93 (5/27/2016)
select customerId, window, product, sales,
( Case when count(sales) over(partition by customerid order by customerid) >0 then 1 else 0 end)as class
from #example
Very nice!
😎
The addition of the ORDER BY eliminates the table spool operators and the query turns into a single index scan with stream aggregate if there is a POC index on the table .
But it still suffers from the "count everything" problem which gets ugly as the number of matching rows increases per customerid.
It would be nice to see some performance comparisons of the methods for this since it is a very common request. Something tickles the back of my mind though that Itzik has already done this...
You are absolutely right on the "count everything" problem which makes this method at least 4 times slower than the TOP 1 one.
😎
My thought is that pre-grouping without aggregation will be the fastest way of doing this, will peace together an example/test harness when I have the time.
May 28, 2016 at 10:28 am
Pre-grouping works quite well in this case, especially if there is a filtered index on the the table for customerID where sales is not null.
😎
;WITH CUSTOMERS_SALE AS
(
SELECT
SSF.customerId
FROM #example SSF
WHERE SSF.sales IS NOT NULL
GROUP BY SSF.customerId
)
SELECT
SF.customerId
,SF.window
,SF.product
,SF.sales
,SIGN(ISNULL(CSALE.customerId,0)) AS SALE_FLAG
FROM #example SF
LEFT OUTER JOIN CUSTOMERS_SALE CSALE
ON SF.customerId = CSALE.customerId;
Quick and simple test harness
USE tempdb;
GO
SET NOCOUNT ON;
--/*
IF OBJECT_ID(N'dbo.TBL_SAMPLE_SALE_FLAG') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_SALE_FLAG;
CREATE TABLE dbo.TBL_SAMPLE_SALE_FLAG
(
SSF_ID INT NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_SALE_FLAG_SALE_SSF_ID PRIMARY KEY CLUSTERED
,CUST_ID INT NOT NULL
,WIND_ID INT NOT NULL
,PROD_ID INT NOT NULL
,SALE_VAL INT NULL
);
DECLARE @SAMPLE_SIZE INT = 1000000;
DECLARE @WINDOW_COUNT INT = 12;
DECLARE @CUSTOMER_COUNT INT = 500000;
DECLARE @PRODUCT_COUNT INT = 1000;
DECLARE @NO_SALE_SEED INT = 3;
DECLARE @MAX_SALE_VAL INT = 10000;
;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
INSERT INTO dbo.TBL_SAMPLE_SALE_FLAG WITH (TABLOCK) (SSF_ID,CUST_ID,WIND_ID,PROD_ID,SALE_VAL)
SELECT
NM.N AS SSF_ID
,ABS(CHECKSUM(NEWID())) % @CUSTOMER_COUNT AS CUST_ID
,ABS(CHECKSUM(NEWID())) % @WINDOW_COUNT AS WIND_ID
,ABS(CHECKSUM(NEWID())) % @PRODUCT_COUNT AS PROD_ID
,ABS(CHECKSUM(NEWID())) % @MAX_SALE_VAL
+ NULLIF(ABS(CHECKSUM(NEWID())) % @NO_SALE_SEED,0) AS SALE_VAL
FROM NUMS NM;
-- POC INDICES
CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_SAMPLE_SALE_FLAG_CUST_ID_SALE_VAL_INCL_WIN_ID_PROD_ID
ON dbo.TBL_SAMPLE_SALE_FLAG (CUST_ID ASC, SALE_VAL ASC) INCLUDE (WIND_ID,PROD_ID);
CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_SAMPLE_SALE_FLAG_CUST_ID_SALE_VAL
ON dbo.TBL_SAMPLE_SALE_FLAG (CUST_ID ASC,SALE_VAL ASC) WHERE (SALE_VAL IS NOT NULL);
CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_SAMPLE_SALE_FLAG_SALE_VAL_INCL_CUST_ID_WIND_ID
ON dbo.TBL_SAMPLE_SALE_FLAG (SALE_VAL ASC) INCLUDE (CUST_ID,WIND_ID) WHERE (SALE_VAL IS NOT NULL);
-- */
DECLARE @timer TABLE (T_TEXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));
DECLARE @INT_BUCKET01 INT = 0;
DECLARE @INT_BUCKET02 INT = 0;
DECLARE @INT_BUCKET03 INT = 0;
DECLARE @INT_BUCKET04 INT = 0;
DECLARE @BIT_BUCKET01 INT = 0;
-- First round
RAISERROR ('--- DRY RUN ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
FROM dbo.TBL_SAMPLE_SALE_FLAG E
ORDER BY CUST_ID ASC
,SALE_VAL ASC
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN');
RAISERROR ('--- MAX ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('MAX');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
,@BIT_BUCKET01 = SIGN(ISNULL(CONVERT(INT,MAX(E.SALE_VAL) OVER
(
PARTITION BY E.CUST_ID
ORDER BY E.CUST_ID
),0),0)) --AS FLG
FROM dbo.TBL_SAMPLE_SALE_FLAG E
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('MAX');
RAISERROR ('--- COUNT ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('COUNT');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
,@BIT_BUCKET01 = CASE WHEN COUNT(E.SALE_VAL) OVER
(
PARTITION BY E.CUST_ID
ORDER BY E.CUST_ID
) > 0 THEN 1 ELSE 0 END --AS FLG
FROM dbo.TBL_SAMPLE_SALE_FLAG E
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('COUNT');
RAISERROR ('--- TOP ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('TOP');
SELECT
@INT_BUCKET01 = e.CUST_ID
,@INT_BUCKET02 = e.WIND_ID
,@INT_BUCKET03 = e.PROD_ID
,@INT_BUCKET04 = e.SALE_VAL
,@BIT_BUCKET01 = CASE WHEN s.sf = 1 THEN 1 ELSE 0 END --AS sales_flag
FROM dbo.TBL_SAMPLE_SALE_FLAG e
OUTER APPLY (SELECT TOP 1 1 AS sf
FROM dbo.TBL_SAMPLE_SALE_FLAG e2
WHERE e.CUST_ID = e2.CUST_ID
AND e2.SALE_VAL IS NOT NULL) AS s
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('TOP');
RAISERROR ('--- PRE GROUP ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('PRE GROUP');
;WITH CUSTOMERS_SALE AS
(
SELECT
SSF.CUST_ID
FROM dbo.TBL_SAMPLE_SALE_FLAG SSF
WHERE SSF.SALE_VAL IS NOT NULL
GROUP BY SSF.CUST_ID
)
SELECT
@INT_BUCKET01 = SF.CUST_ID
,@INT_BUCKET02 = SF.WIND_ID
,@INT_BUCKET03 = SF.PROD_ID
,@INT_BUCKET04 = SF.SALE_VAL
,@BIT_BUCKET01 = SIGN(ISNULL(CSALE.CUST_ID,0)) --AS SALE_FLAG
FROM dbo.TBL_SAMPLE_SALE_FLAG SF
LEFT OUTER JOIN CUSTOMERS_SALE CSALE
ON SF.CUST_ID = CSALE.CUST_ID
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('PRE GROUP');
-- Second round
RAISERROR ('--- DRY RUN 02 ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN 02');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
FROM dbo.TBL_SAMPLE_SALE_FLAG E
ORDER BY CUST_ID ASC
,SALE_VAL ASC
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('DRY RUN 02');
RAISERROR ('--- MAX 02 ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('MAX 02');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
,@BIT_BUCKET01 = SIGN(ISNULL(CONVERT(INT,MAX(E.SALE_VAL) OVER
(
PARTITION BY E.CUST_ID
ORDER BY E.CUST_ID
),0),0)) --AS FLG
FROM dbo.TBL_SAMPLE_SALE_FLAG E
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('MAX 02');
RAISERROR ('--- COUNT 02 ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('COUNT 02');
SELECT
@INT_BUCKET01 = E.CUST_ID
,@INT_BUCKET02 = E.WIND_ID
,@INT_BUCKET03 = E.PROD_ID
,@INT_BUCKET04 = E.SALE_VAL
,@BIT_BUCKET01 = CASE WHEN COUNT(E.SALE_VAL) OVER
(
PARTITION BY E.CUST_ID
ORDER BY E.CUST_ID
) > 0 THEN 1 ELSE 0 END --AS FLG
FROM dbo.TBL_SAMPLE_SALE_FLAG E
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('COUNT 02');
RAISERROR ('--- TOP 02 ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('TOP 02');
SELECT
@INT_BUCKET01 = e.CUST_ID
,@INT_BUCKET02 = e.WIND_ID
,@INT_BUCKET03 = e.PROD_ID
,@INT_BUCKET04 = e.SALE_VAL
,@BIT_BUCKET01 = CASE WHEN s.sf = 1 THEN 1 ELSE 0 END --AS sales_flag
FROM dbo.TBL_SAMPLE_SALE_FLAG e
OUTER APPLY (SELECT TOP 1 1 AS sf
FROM dbo.TBL_SAMPLE_SALE_FLAG e2
WHERE e.CUST_ID = e2.CUST_ID
AND e2.SALE_VAL IS NOT NULL) AS s
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('TOP 02');
RAISERROR ('--- PRE GROUP 02 ---',0,0) WITH NOWAIT;
INSERT INTO @timer(T_TEXT) VALUES('PRE GROUP 02');
;WITH CUSTOMERS_SALE AS
(
SELECT
SSF.CUST_ID
FROM dbo.TBL_SAMPLE_SALE_FLAG SSF
WHERE SSF.SALE_VAL IS NOT NULL
GROUP BY SSF.CUST_ID
)
SELECT
@INT_BUCKET01 = SF.CUST_ID
,@INT_BUCKET02 = SF.WIND_ID
,@INT_BUCKET03 = SF.PROD_ID
,@INT_BUCKET04 = SF.SALE_VAL
,@BIT_BUCKET01 = SIGN(ISNULL(CSALE.CUST_ID,0)) --AS SALE_FLAG
FROM dbo.TBL_SAMPLE_SALE_FLAG SF
LEFT OUTER JOIN CUSTOMERS_SALE CSALE
ON SF.CUST_ID = CSALE.CUST_ID
OPTION (MAXDOP 1);
INSERT INTO @timer(T_TEXT) VALUES('PRE GROUP 02');
SELECT
T.T_TEXT
,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
FROM @timer T
GROUP BY T.T_TEXT
ORDER BY DURATION;
Results on 2nd Gen i5 / SQL Server 2014
T_TEXT DURATION
--------------- -----------
DRY RUN 02 190001
DRY RUN 200000
PRE GROUP 440000
PRE GROUP 02 440000
TOP 2530004
TOP 02 2570004
MAX 02 10675016
COUNT 10700015
MAX 10796016
COUNT 02 10810015
Edit: Corrected the wrong "pre-group" code snippet.
June 2, 2016 at 4:56 am
I had a the best results from this query. Works really well over the huge table. Thanks 🙂
June 4, 2016 at 3:07 pm
leehbi (6/2/2016)
I had a the best results from this query. Works really well over the huge table. Thanks 🙂
Which query is "this" query? A lot of them have been posted.
Never mind... it's gotta be the one marked as "This worked for the OP". :blush:
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2016 at 3:32 pm
Jeff Moden (6/4/2016)
leehbi (6/2/2016)
I had a the best results from this query. Works really well over the huge table. Thanks 🙂
Which query is "this" query? A lot of them have been posted.Never mind... it's gotta be the one marked as "This worked for the OP". :blush:
Good question, huge difference in performance but all suggestions return the correct results.
😎
June 5, 2016 at 12:51 am
Thanks again for sharing. Modern T-SQL gives us many ways to skin a cat.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply