March 2, 2017 at 4:01 pm
I have a complex query that produces data for multiple sets. I need to ensure that for each set of criteria that I get data for each set. The following query works, but I'd rather use a windowing function to avoid the double execution of the complex query:
select a.*
from dbo.ComplexQuery a
inner join (select ColumnA, ColumnB, ColumnC, count(distinct ColumnD) as ColumnDCount
from dbo.ComplexQuery
group by ColumnA, ColumnB, ColumnC
) b
on b.ColumnA = a.ColumnA
and b.ColumnB = a.ColumnB
and b.ColumnC = a.ColumnC
and b.ColumnDCount = 2
order by a.ColumnA, a.ColumnB, a.ColumnC, a.ColumnD
This is about as simple of a query that I could produce which replicates the scenario I have. To complicate things, it's possible to have multiple records of each ColumnD value with some in one group, some in the other, but we're interested in those in both groups only. I'll have to work on some sample data, but I was hoping in the meantime someone had an idea.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 2, 2017 at 8:28 pm
You've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.
Eddie Wuerch
MCM: SQL
March 2, 2017 at 8:32 pm
Why not use a windowed version of COUNT?
SELECT a.*
FROM dbo.ComplexQuery AS a
WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2
If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
March 3, 2017 at 5:27 am
Eddie Wuerch - Thursday, March 2, 2017 8:27 PMYou've basically hand-coded a DENSE_RANK() ... WHERE MAX(<DENSE_RANK() value>) = 2. The sample data will help roll a query to test, but you'll likely find a similar query plan with DENSE_RANK() . Window functions kick off a second trip through the data to prepare the results (for each distinct PARTITION BY clause), which are joined back to the base query results to produce the final output.
Thanks Eddie, but I already tried the DENSE_RANK() function. Here's an example of my situation:
ColumnA | ColumnB | ColumnC | ColumnD | Dense_Rank | What I Need |
123 | ABC | 987 | A | 1 | 2 |
123 | ABC | 987 | B | 2 | 2 |
456 | DEF | 654 | A | 1 | 1 |
789 | GHI | 321 | A | 1 | 2 |
789 | GHI | 321 | B | 2 | 2 |
I hope this is clear enough. Essentially, I need to ensure that I get both of the records as identified in ColumnD. In the above case, I should get the records with Column A of 123 and 789 but NOT the record of 456 because it didn't have both and A and B values for ColumnD. If it weren't for the need to return all columns of each record (I didn't display them all), I could simply deal with a GROUP BY. Having the total row count in the column "What I Need" allows me to properly filter out incomplete sets.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 5:28 am
sgmunson - Thursday, March 2, 2017 8:32 PMWhy not use a windowed version of COUNT?
SELECT a.*
FROM dbo.ComplexQuery AS a
WHERE COUNT(DISTINCT ColumnD) OVER(PARTITION BY ColumnA, ColumnB, ColumnC) = 2If it won't let you use that in the WHERE clause, you can always just SELECT it in a CTE and then use the WHERE against that new column name in the final SELECT.
Unfortunately, the DISTINCT keyword is not allowed in a window function with OVER() and without OVER(), it's no longer a window function! This would have been the ideal solution!!
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 5:45 am
Actually, let me clarify it a bit more and simplify the situation more succinctly:
ColumnA | ColumnB | OtherData1 | OtherData2 | Dense_Rank | What I Need |
123 | A | ABC | 987 | 1 | 2 |
123 | B | BCD | 876 | 2 | 2 |
456 | A | DEF | 654 | 1 | 1 |
789 | A | GHI | 321 | 1 | 2 |
789 | B | HIJ | 210 | 2 | 2 |
789 | B | IJK | 312 | 3 | 2 |
The query that works is:select a.ColumnA, a.ColumnB, a.OtherData1, a.OtherData2
from dbo.ComplexQuery a
inner join (select ColumnA, count(distinct ColumnB) as ColumnBCount
from dbo.ComplexQuery
group by ColumnA
) b
on b.ColumnA = a.ColumnA
and b.ColumnBCount = 2
order by a.ColumnA, a.ColumnB
So, while the above works, it's not very efficient. I had hoped to use something as suggested earlier with a COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA) but the DISTINCT is definitely not allowed. Thoughts?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 7:03 am
You can use
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of
COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 3, 2017 at 7:12 am
Window functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE. If you're just looking for alternative ways to write your select stmt then here's one option:
SELECT *
FROM ComplexQuery
WHERE ColumnA IN
(
SELECT ColumnA
FROM ComplexQuery
GROUP BY ColumnA
HAVING count(distinct ColumnB) = 2
)
March 3, 2017 at 8:33 am
Mark Cowne - Friday, March 3, 2017 7:03 AMYou can use
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of
COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)
This works when the situation only calls for 2 values for ColumnB (which I showed in the examples). What about where there are 3, 4, or more?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 8:36 am
Jeff Atherton - Friday, March 3, 2017 7:12 AMWindow functions are only allowed in the SELECT and ORDER BY clause. So if you really wanted to use one of them then the best option would be to put it in a CTE. If you're just looking for alternative ways to write your select stmt then here's one option:
SELECT *
FROM ComplexQuery
WHERE ColumnA IN
(
SELECT ColumnA
FROM ComplexQuery
GROUP BY ColumnA
HAVING count(distinct ColumnB) = 2
)
This would also work, but it requires me to still have the complex query in there twice. Unfortunately, I can't make use of a CTE in my particular scenario as I only gave a simplified portion of my overall query structure. Even then, referencing a CTE twice still causes it to execute twice. Window functions, as I understand it, execute it once but will navigate the result set multiple times as needed.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 8:49 am
Aaron N. Cutshall - Friday, March 3, 2017 8:33 AMMark Cowne - Friday, March 3, 2017 7:03 AMYou can use
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1
instead of
COUNT(DISTINCT ColumnB) OVER(PARTITION BY ColumnA)This works when the situation only calls for 2 values for ColumnB (which I showed in the examples). What about where there are 3, 4, or more?
Can you post some sample data.
This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
INSERT INTO @mytable(columnA, ColumnB)
VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');
SELECT ColumnA, ColumnB,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
FROM @mytable
ORDER BY ColumnA;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 3, 2017 at 9:01 am
Aaron, so is this a performance issue? How much data are we talking about here? How long does the query take to run with the ComplexQuery in there twice? How often will this query need to be run everyday? Do the users need up to the minute results or can you run some ETL overnight to a new table just for reporting?
March 3, 2017 at 10:08 am
Mark Cowne - Friday, March 3, 2017 8:49 AMCan you post some sample data.This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
INSERT INTO @mytable(columnA, ColumnB)
VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');SELECT ColumnA, ColumnB,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
FROM @mytable
ORDER BY ColumnA;
Mark,
This is PERFECT!! WOW!! I'm going to have to study this one to make sure I truly understand it. Here's your revised query with additional data:DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT)
INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2)
VALUES (123,'A','ABC',987),
(123,'B','BCD',876),
(123,'C','CDE',765),
(456,'A','DEF',654),
(789,'A','GHI',321),
(789,'B','HIJ',210),
(789,'B','IJK',312),
(258,'A','FDE',753),
(258,'B','THE',951),
(258,'C','PKH',873),
(246,'A','LDJ',983),
(246,'B','WKS',893),
(246,'C','FUW',373),
(246,'C','WLC',583),
(246,'D','GKD',353);
SELECT ColumnA, ColumnB, OtherData1, OtherData2,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
FROM @mytable
ORDER BY ColumnA, ColumnB;
It certainly presents the results just as I needed it in all situations. I'm totally impressed!!
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
March 3, 2017 at 10:57 am
Aaron N. Cutshall - Friday, March 3, 2017 10:08 AMMark Cowne - Friday, March 3, 2017 8:49 AMCan you post some sample data.This appears to work
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1))
INSERT INTO @mytable(columnA, ColumnB)
VALUES(123,'A'),(123,'B'),(456,'A'),(789,'A'),(789,'B'),(789,'B'),(888,'A'),(888,'B'),(888,'B'),(888,'D'),(888,'C'),(888,'C');SELECT ColumnA, ColumnB,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
FROM @mytable
ORDER BY ColumnA;Mark,
This is PERFECT!! WOW!! I'm going to have to study this one to make sure I truly understand it. Here's your revised query with additional data:
DECLARE @mytable TABLE(ColumnA INT, ColumnB CHAR(1), OtherData1 CHAR(3), OtherData2 INT)
INSERT INTO @mytable(columnA, ColumnB, OtherData1, OtherData2)
VALUES (123,'A','ABC',987),
(123,'B','BCD',876),
(123,'C','CDE',765),
(456,'A','DEF',654),
(789,'A','GHI',321),
(789,'B','HIJ',210),
(789,'B','IJK',312),
(258,'A','FDE',753),
(258,'B','THE',951),
(258,'C','PKH',873),
(246,'A','LDJ',983),
(246,'B','WKS',893),
(246,'C','FUW',373),
(246,'C','WLC',583),
(246,'D','GKD',353);SELECT ColumnA, ColumnB, OtherData1, OtherData2,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRankOnly,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) AS DenseRankDesc,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) +
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB DESC) - 1 AS DistinctCount
FROM @mytable
ORDER BY ColumnA, ColumnB;It certainly presents the results just as I needed it in all situations. I'm totally impressed!!
You're welcome.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 3, 2017 at 12:10 pm
Here's an alternative. I'm not sure which would perform better as both have issues that can slow them down.
WITH CTE AS(
SELECT ColumnA, ColumnB, OtherData1, OtherData2,
DENSE_RANK() OVER(PARTITION BY ColumnA ORDER BY ColumnB) AS DenseRank
FROM @mytable
)
SELECT ColumnA, ColumnB, OtherData1, OtherData2,
MAX(DenseRank) OVER( PARTITION BY ColumnA) DistinctCount
FROM CTE
ORDER BY ColumnA, ColumnB;
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply