Windowing function puzzle

Question

Windowing function puzzle

Aaron N. Cutshall

SSCrazy Eights

Points: 9011
More actions
March 2, 2017 at 4:01 pm

#341278

I have a complex query that produces data for multiple sets. I need to ensure that for each set of criteria that I get data for each set. The following query works, but I'd rather use a windowing function to avoid the double execution of the complex query:
select a.*
from dbo.ComplexQuery a
inner join (select ColumnA, ColumnB, ColumnC, count(distinct ColumnD) as ColumnDCount
from dbo.ComplexQuery
group by ColumnA, ColumnB, ColumnC
) b
on b.ColumnA = a.ColumnA
and b.ColumnB = a.ColumnB
and b.ColumnC = a.ColumnC
and b.ColumnDCount = 2
order by a.ColumnA, a.ColumnB, a.ColumnC, a.ColumnD
This is about as simple of a query that I could produce which replicates the scenario I have. To complicate things, it's possible to have multiple records of each ColumnD value with some in one group, some in the other, but we're interested in those in both groups only. I'll have to work on some sample data, but I was hoping in the meantime someone had an idea.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Eddie Wuerch SSChampion Points: 12535 More actions · Answer 1

You've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.

Eddie Wuerch
MCM: SQL

sgmunson SSC Guru Points: 110640 More actions · Answer 2

Why not use a windowed version of COUNT?

SELECT a.* FROM dbo.ComplexQuery AS a WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2

If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 3

Eddie Wuerch - Thursday, March 2, 2017 8:27 PM
You've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.

Thanks Eddie, but I already tried the DENSE_RANK() function. Here's an example of my situation:

ColumnA	ColumnB	ColumnC	ColumnD	Dense_Rank	What I Need
123	ABC	987	A	1	2
123	ABC	987	B	2	2
456	DEF	654	A	1	1
789	GHI	321	A	1	2
789	GHI	321	B	2	2

I hope this is clear enough. Essentially, I need to ensure that I get both of the records as identified in ColumnD. In the above case, I should get the records with Column A of 123 and 789 but NOT the record of 456 because it didn't have both and A and B values for ColumnD. If it weren't for the need to return all columns of each record (I didn't display them all), I could simply deal with a GROUP BY. Having the total row count in the column "What I Need" allows me to properly filter out incomplete sets.

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 4

sgmunson - Thursday, March 2, 2017 8:32 PM
Why not use a windowed version of COUNT?
SELECT a.* FROM dbo.ComplexQuery AS a WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2
If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.

Unfortunately, the DISTINCT keyword is not allowed in a window function with OVER() and without OVER(), it's no longer a window function! This would have been the ideal solution!!

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 5

Actually, let me clarify it a bit more and simplify the situation more succinctly:

ColumnA	ColumnB	OtherData1	OtherData2	Dense_Rank	What I Need
123	A	ABC	987	1	2
123	B	BCD	876	2	2
456	A	DEF	654	1	1
789	A	GHI	321	1	2
789	B	HIJ	210	2	2
789	B	IJK	312	3	2

The query that works is:
select a.ColumnA, a.ColumnB, a.OtherData1, a.OtherData2

from dbo.ComplexQuery a

inner join (select ColumnA, count(distinct ColumnB) as ColumnBCount

from dbo.ComplexQuery

group by ColumnA

) b

on b.ColumnA = a.ColumnA

and b.ColumnBCount = 2

order by a.ColumnA, a.ColumnB

So, while the above works, it's not very efficient. I had hoped to use something as suggested earlier with a COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA) but the DISTINCT is definitely not allowed. Thoughts?

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 6

You can use

DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of

COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Jeff Atherton SSCrazy Points: 2663 More actions · Answer 7

Window functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE. If you're just looking for alternative ways to write your select stmt then here's one option:

SELECT * FROM ComplexQuery WHERE ColumnA IN ( SELECT ColumnA FROM ComplexQuery GROUP BY ColumnA HAVING count(distinct ColumnB) = 2 )

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 8

Mark Cowne - Friday, March 3, 2017 7:03 AM
You can use
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of
COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)

This works when the situation only calls for 2 values for ColumnB (which I showed in the examples). What about where there are 3, 4, or more?

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 9

Jeff Atherton - Friday, March 3, 2017 7:12 AM
Window functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE. If you're just looking for alternative ways to write your select stmt then here's one option:
SELECT * FROM ComplexQuery WHERE ColumnA IN ( SELECT ColumnA FROM ComplexQuery GROUP BY ColumnA HAVING count(distinct ColumnB) = 2 )

This would also work, but it requires me to still have the complex query in there twice. Unfortunately, I can't make use of a CTE in my particular scenario as I only gave a simplified portion of my overall query structure. Even then, referencing a CTE twice still causes it to execute twice. Window functions, as I understand it, execute it once but will navigate the result set multiple times as needed.

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 10

Aaron N. Cutshall - Friday, March 3, 2017 8:33 AM
Mark Cowne - Friday, March 3, 2017 7:03 AM
You can use
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of
COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)
This works when the situation only calls for 2 values for ColumnB (which I showed in the examples). What about where there are 3, 4, or more?

Can you post some sample data.

This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1)) INSERT INTO @mytable(columnA, ColumnB) VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');

SELECT ColumnA, ColumnB, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount FROM @mytable ORDER BY ColumnA;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Jeff Atherton SSCrazy Points: 2663 More actions · Answer 11

Aaron, so is this a performance issue? How much data are we talking about here? How long does the query take to run with the ComplexQuery in there twice? How often will this query need to be run everyday? Do the users need up to the minute results or can you run some ETL overnight to a new table just for reporting?

Aaron N. Cutshall SSCrazy Eights Points: 9011 More actions · Answer 12

Mark Cowne - Friday, March 3, 2017 8:49 AM
Can you post some sample data.
This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1)) INSERT INTO @mytable(columnA, ColumnB) VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');
SELECT ColumnA, ColumnB, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount FROM @mytable ORDER BY ColumnA;

Mark,

This is PERFECT!! WOW!! I'm going to have to study this one to make sure I truly understand it. Here's your revised query with additional data:
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT) INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2) VALUES (123,'A','ABC',987), (123,'B','BCD',876), (123,'C','CDE',765), (456,'A','DEF',654), (789,'A','GHI',321), (789,'B','HIJ',210), (789,'B','IJK',312), (258,'A','FDE',753), (258,'B','THE',951), (258,'C','PKH',873), (246,'A','LDJ',983), (246,'B','WKS',893), (246,'C','FUW',373), (246,'C','WLC',583), (246,'D','GKD',353);

SELECT ColumnA, ColumnB, OtherData1, OtherData2, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount FROM @mytable ORDER BY ColumnA, ColumnB;

It certainly presents the results just as I needed it in all situations. I'm totally impressed!!

LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 13

Aaron N. Cutshall - Friday, March 3, 2017 10:08 AM
Mark Cowne - Friday, March 3, 2017 8:49 AM
Can you post some sample data.
This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1)) INSERT INTO @mytable(columnA, ColumnB) VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');
SELECT ColumnA, ColumnB, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount FROM @mytable ORDER BY ColumnA;
Mark,
This is PERFECT!! WOW!! I'm going to have to study this one to make sure I truly understand it. Here's your revised query with additional data:
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT) INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2) VALUES (123,'A','ABC',987), (123,'B','BCD',876), (123,'C','CDE',765), (456,'A','DEF',654), (789,'A','GHI',321), (789,'B','HIJ',210), (789,'B','IJK',312), (258,'A','FDE',753), (258,'B','THE',951), (258,'C','PKH',873), (246,'A','LDJ',983), (246,'B','WKS',893), (246,'C','FUW',373), (246,'C','WLC',583), (246,'D','GKD',353);
SELECT ColumnA, ColumnB, OtherData1, OtherData2, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) + DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount FROM @mytable ORDER BY ColumnA, ColumnB;
It certainly presents the results just as I needed it in all situations. I'm totally impressed!!

You're welcome.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Luis Cazares SSC Guru Points: 183706 More actions · Answer 14

Here's an alternative. I'm not sure which would perform better as both have issues that can slow them down.
WITH CTE AS( SELECT ColumnA, ColumnB, OtherData1, OtherData2, DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRank FROM @mytable ) SELECT ColumnA, ColumnB, OtherData1, OtherData2, MAX(DenseRank) OVER( PARTITION BY ColumnA) DistinctCount FROM CTE ORDER BY ColumnA, ColumnB;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2