November 23, 2013 at 7:39 am
Howdy,
Late last night I got a request for some data. The format was the biggest challenge, since the definitions of small, medium, and large businesses are different ranges for reach country, and the coded variables for them also end up being a different three number sequence for each country and business size. The format was:
country, total, total small, total medium, total large
So I wrote this, altered a bit to accommodate the test data. I'm fine with how and how fast it works, so you don't have to rack your brain. But if you see an obvious improvement I'd love to hear it.
with data (orc_country, orc_quotacell, statusflag) as (
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 18, 1
),
agg(country, sm, md, lg) as (
select s1.orc_country
, min(s2.orc_quotacell) as [Sm]
, min(s2.orc_quotacell) + 1 as [Med]
, max(s2.orc_quotacell) as [Large]
from data s1 left join data s2
on s1.orc_country = s2.orc_country
group by s1.orc_country
)
select s.orc_country
,sum(case when s.orc_country = a.country then 1 else 0 end) as [Total]
,sum(case when s.orc_quotacell = a.sm then 1 else 0 end) as [Small]
,sum(case when s.orc_quotacell = a.md then 1 else 0 end) as [Medium]
,sum(case when s.orc_quotacell = a.lg then 1 else 0 end) as [Large]
from data s,
agg a
where s.statusflag = 1
group by s.orc_country
order by orc_country
November 24, 2013 at 6:43 am
Here is my solution along with yours:
with data (orc_country, orc_quotacell, statusflag) as (
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 18, 1
),
agg(country, sm, md, lg) as (
select s1.orc_country
, min(s2.orc_quotacell) as [Sm]
, min(s2.orc_quotacell) + 1 as [Med]
, max(s2.orc_quotacell) as [Large]
from data s1 left join data s2
on s1.orc_country = s2.orc_country
group by s1.orc_country
)
select s.orc_country
,sum(case when s.orc_country = a.country then 1 else 0 end) as [Total]
,sum(case when s.orc_quotacell = a.sm then 1 else 0 end) as [Small]
,sum(case when s.orc_quotacell = a.md then 1 else 0 end) as [Medium]
,sum(case when s.orc_quotacell = a.lg then 1 else 0 end) as [Large]
from data s,
agg a
where s.statusflag = 1
group by s.orc_country
order by orc_country;
go
with data (orc_country, orc_quotacell, statusflag) as (
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 18, 1
),
BaseData as (
select
d.orc_country,
d.orc_quotacell,
d.statusflag,
rn = dense_rank() over (partition by d.orc_country order by d.orc_quotacell)
from
data d
)
select
bd.orc_country,
count(bd.orc_country) Total,
sum(case rn when 1 then 1 else 0 end) Small,
sum(case rn when 2 then 1 else 0 end) Medium,
sum(case rn when 3 then 1 else 0 end) Large
from
BaseData bd
group by
bd.orc_country;
Also attaching the actual execution plans generated when I ran them.
November 24, 2013 at 8:30 am
Hot damn, Lynn, that's awesome. Attached plans from running on my actual table. I'm expecting data for 13 more countries, so I'll compare again when I have everything.
Is there a specific reason you went with dense rank over regular rank or just row number?
Thanks
November 24, 2013 at 8:39 am
erikd (11/24/2013)
Hot damn, Lynn, that's awesome. Attached plans from running on my actual table. I'm expecting data for 13 more countries, so I'll compare again when I have everything.Is there a specific reason you went with dense rank over regular rank or just row number?
Thanks
Actually, I tried row_number() first and displaying the data generated from the BaseData cte quickly showed me that that wasn't what I wanted. I needed to rank the values representing small, medium, and large with the same values; in the case 1,2, and 3. Rank wold not accomplish this due to ties in the values, this left DENSE_RANK.
To see the differences just do a select from BaseData using row_number(), rank(), and dense_rank() to see the different return values for rn.
November 24, 2013 at 8:59 am
What's the difference in execution times?
November 24, 2013 at 9:11 am
Mine was 503ms, yours was 96ms :blush:
November 24, 2013 at 9:26 am
erikd (11/24/2013)
Mine was 503ms, yours was 96ms :blush:
Try something for me, move the WHERE statusflag = 1 into the BaseData cte and rerun the code.
Be sure to report the execution plan as well, I'd like to see if there is any difference. Not thinkingthere will be based on what I have alread seen in this plan. Looks like all the records may have a statusflag = 1, which means the filter on this column is not relevant.
November 24, 2013 at 12:48 pm
Hi Lynn,
They're all 1 in the test data, but not in my table:
statusflagDescription Counts
1 Released 40158
10 Duplicate within File75
30 Do Not Call List 20
45 Removed cell phones170
70 Invalid Area Code 1
So the filter isn't entirely irrelevant. Anyway, I moved the where. Here's the execution plan for it. 96ms still, but now the analyzer recommends an index. Interesting.
November 24, 2013 at 10:23 pm
erikd (11/24/2013)
Hi Lynn,They're all 1 in the test data, but not in my table:
statusflagDescription Counts
1 Released 40158
10 Duplicate within File75
30 Do Not Call List 20
45 Removed cell phones170
70 Invalid Area Code 1
So the filter isn't entirely irrelevant. Anyway, I moved the where. Here's the execution plan for it. 96ms still, but now the analyzer recommends an index. Interesting.
Looking at your counts, I don't really see a reason to filter on status while reading from the table. It wouldn't filter out a lot of rows of data, which is what I was looking for here. If you have queries that select data where the statusflag <> 1 would benefit from a filtered index where statusflag <> 1.
November 25, 2013 at 7:13 am
Is it generally better to keep filters out of CTEs? Since your suggestion to move it, I've tried it in some other queries I've written and the query plans no longer suggest indexes. I guess I falsely assumed that getting all your filtering and aggregation done in the CTE and then querying that smaller set of results would be faster.
November 25, 2013 at 8:35 am
erikd (11/25/2013)
Is it generally better to keep filters out of CTEs? Since your suggestion to move it, I've tried it in some other queries I've written and the query plans no longer suggest indexes. I guess I falsely assumed that getting all your filtering and aggregation done in the CTE and then querying that smaller set of results would be faster.
I'm going to give the standard answer, It depends. If the filter will greatly reduce the number of records to be processed then you should probably filter it as soon as possible. If it makes sense to add the index that sql server suggests, then it may be worthwhile especially if the index is a covering index and much smaller than then table itself.
All you can do is test, test, and test some more. One of the things you probably noticed when comparing the two execution plans is that my version eliminated two of your clustered index (table) scans.
November 25, 2013 at 8:54 am
That's sort of what I figured. Thanks again Lynn.
November 25, 2013 at 9:31 am
The key here will be having an index to support the ranking function and avoid a "sort" iterator in the execution plan.
I would filter first, then aggregate, then enumerate the result (rank) and then pivot.
DECLARE @T TABLE (
sk int NOT NULL IDENTITY(1, 1),
orc_country varchar(25) NOT NULL,
orc_quotacell int NOT NULL,
statusflag tinyint NOT NULL,
UNIQUE CLUSTERED (orc_country, orc_quotacell, sk)
);
with data (orc_country, orc_quotacell, statusflag) as (
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 3, 1 UNION ALL
select 'Canada', 1, 1 UNION ALL
select 'Canada', 2, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 4, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'India', 5, 1 UNION ALL
select 'India', 6, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 7, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Singapore', 8, 1 UNION ALL
select 'Singapore', 9, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 10, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'Thailand', 11, 1 UNION ALL
select 'Thailand', 12, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 13, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'UK', 14, 1 UNION ALL
select 'UK', 15, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 16, 1 UNION ALL
select 'United Arab Emirates', 17, 1 UNION ALL
select 'United Arab Emirates', 18, 1
)
INSERT INTO @T (orc_country, orc_quotacell, statusflag)
SELECT
*
FROM
data;
WITH C1 AS (
SELECT
orc_country,
orc_quotacell,
COUNT(*) AS cnt,
DENSE_RANK() OVER(PARTITION BY orc_country ORDER BY orc_quotacell) AS rn
FROM
@T
WHERE
statusflag = 1
GROUP BY
orc_country,
orc_quotacell
)
SELECT
P.orc_country,
ISNULL([1], 0) + ISNULL([2], 0) + ISNULL([3], 0) AS Total,
ISNULL([1], 0) AS Small,
ISNULL([2], 0) AS Medium,
ISNULL([3], 0) AS Large
FROM
(
SELECT orc_country, cnt, rn FROM C1
) AS T
PIVOT
(
SUM(cnt)
FOR rn IN ([1], [2], [3])
) AS P;
GO
This works because your sample data happens to have maximum three distinct values for each country. I wonder if this assumption is correct, otherwise can you describe the problem with words?
November 25, 2013 at 10:28 am
One, using a table variable isn't necessarily a good choice for testing. No matter how many rows of data exist, the optimizer will assume 1 row as there are no statistics kept on a table variable.
Two, looking at the execution plans based on the small data set, both your pivot solution and my solution are identical.
Three, based on the sample data and the code originally developed I am making an assumption that each of the 3 values for each country indicate small, medium, large. The purpose of the DENSE_RANK function is to conform the data to the same set of values that represents small, medium, and large.
Four, as far as indexing on the actual data source the OP may not have control over that as there may be other data values as well and the existing index(es) may be needed for other purposes.
Five, it would be interesting to see how your solution works when adapted to use the actual data source.
November 25, 2013 at 11:04 am
Lynn Pettis (11/25/2013)
Five, it would be interesting to see how your solution works when adapted to use the actual data source.
Ask and ye shall receive.
The first query plan is using a table variable. It took 386ms.
The second query skips the table variable and performs the query on the actual table. It took 26ms.
Which is funny, because I read a lot about pivots being slowpokes, and tend to avoid them.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply