March 21, 2018 at 4:54 pm
Hi,
I have this dataset:
create table test1
(
[Category] char(1) null,
[CNT] int null
)
insert into test1 ([Category], [CNT])
values ('A', 5)
;
insert into test1 ([Category], [CNT])
values ('B', 3)
;
How can I end up with this dataset:
create table test2
([Category] char(1) null)
insert into test2 (category)
values ('A')
;
insert into test2 (category)
values ('A')
;
insert into test2 (category)
values ('A')
;
insert into test2 (category)
values ('A')
;
insert into test2 (category)
values ('A')
;
insert into test2 (category)
values ('B')
;
insert into test2 (category)
values ('B')
;
insert into test2 (category)
values ('B')
;
The idea is one separate row for each category, that corresponds to the count of the category.
Any help is appreciated!
(Sorry, I couldn't fix the line spacing in the code...)
-M
March 21, 2018 at 6:15 pm
Like this, for one method:
create table dbo.test1
([Category] char(1) null,
[CNT] int null
);
create table dbo.test2
([Category] char(1) null);
insert into dbo.test1 ([Category], [CNT])
values ('A', 5),('B', 3);
WITH eTally(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) n FROM [sys].[all_columns] AS [ac] CROSS JOIN [sys].[all_columns] AS [ac2])
INSERT INTO dbo.test2([Category])
SELECT t1.[Category]
FROM
dbo.test1 t1
CROSS APPLY (SELECT TOP(t1.[CNT]) n FROM eTally ORDER BY n) t;
SELECT * FROM dbo.test2;
DROP TABLE [dbo].[test2];
DROP TABLE [dbo].[test1];
March 22, 2018 at 10:48 am
I don't really see the need for the overhead and potential issues of reading the sys.all_columns view.
;WITH
cteTally10 AS (
SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
),
cteTally1000 AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM cteTally10 c1
CROSS JOIN cteTally10 c2
CROSS JOIN cteTally10 c3
)
--INSERT INTO dbo.test2 ( [Category] )
SELECT t1.[Category]
FROM dbo.test1 t1
INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 22, 2018 at 1:14 pm
ScottPletcher - Thursday, March 22, 2018 10:48 AMI don't really see the need for the overhead and potential issues of reading the sys.all_columns view.
;WITH
cteTally10 AS (
SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
),
cteTally1000 AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM cteTally10 c1
CROSS JOIN cteTally10 c2
CROSS JOIN cteTally10 c3
)
--INSERT INTO dbo.test2 ( [Category] )
SELECT t1.[Category]
FROM dbo.test1 t1
INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT
It was a quick, throw together cte based tally table, one that Jeff has even used. I could have just as easily taken the time to type this:
with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
e2(n) as (select 1 from e1 a cross join e1 b),
e4(n) as (select 1 from e2 a cross join e2 b),
etally(n) as (select n = row_number() over (order by (select null)) from e4 a cross join e4 b)
March 22, 2018 at 1:40 pm
To add a further tweak, nothing about the ROW_NUMBER is needed here, so there's no need to add the overhead of generating it.
Something like this is sufficient and avoids the unnecessary overhead:
WITH
n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)
SELECT t1.[Category]
FROM dbo.test1 t1 CROSS APPLY (SELECT TOP (cnt) n FROM n3)n;
The overhead's pretty small, but real. 🙂
Cheers!
EDIT: I should point out that this isn't guaranteed (that there's additional overhead); it's been a while since I've run the full battery of tests, but I remember that sometimes the optimizer implemented the query forms in the same way. Still, I'd rather not just hope that the optimizer gets it right, and if it's unnecessary, there's no need to clutter up the query 🙂
March 26, 2018 at 10:01 am
Thanks everybody!
I ended up using the response by Mr. Jacob Wilkins.
There is something I'm not understanding though: The cte that is called 'n', why is it just a table of 10 zeros? I don't get how that makes this query work. Also, I noticed the query doesn't work for values over 1,000. What can I do to change that upper limit? Sorry if I'm being dense...
-m
March 26, 2018 at 10:09 am
That sort of CTE is part of a common construct to form an on-the-fly tally or numbers table.
The specific values don't really matter; it's just generating a certain number of rows. In this case, the first CTE generates 10 rows. The second CTE cross joins it to itself 3 times, forming 10^3=1000 rows.
Most commonly, a ROW_NUMBER is calculated over those rows to generate the tally or numbers table. In this case, we don't need a list of numbers; we just need enough rows to "explode" your original rows into the appropriate number of new rows. For example, if you have something with a count of 5 on one row, we need to make that result in 5 new rows, so we use APPLY to "match" that row to 5 rows from the CTE.
If you need more than 10^3, just add an appropriate number of cross joins in the second CTE.
Cheers!
March 26, 2018 at 10:47 am
Thank you sir, your explanation made me understand it perfectly. I added one more join for a 10K limit which should be enough.
-martin
March 27, 2018 at 11:14 am
Just thought I'd mention that I tried both cte tally schemes and they produce a different number of rows. Jacob, yours seems to produce a doubled result set.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 27, 2018 at 11:59 am
The Dixie Flatline - Tuesday, March 27, 2018 11:14 AMJust thought I'd mention that I tried both cte tally schemes and they produce a different number of rows. Jacob, yours seems to produce a doubled result set.
I'd need to see a repro script 🙂
IF OBJECT_ID('test1') IS NOT NULL DROP TABLE test1;
CREATE TABLE test1 ([Category] CHAR(1) NULL,[CNT] INT NULL);
INSERT INTO test1 ([Category], [CNT]) VALUES ('A', 5),('B', 3);
WITH
n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)
SELECT t1.[Category]
FROM dbo.test1 t1 CROSS APPLY (SELECT TOP (cnt) n FROM n3)n;
WITH
cteTally10 AS (
SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
),
cteTally1000 AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM cteTally10 c1
CROSS JOIN cteTally10 c2
CROSS JOIN cteTally10 c3
)
SELECT t1.[Category]
FROM dbo.test1 t1
INNER JOIN cteTally1000 t ON t.number BETWEEN 1 AND t1.CNT;
Results:
Category
--------
A
A
A
A
A
B
B
B
(8 rows affected)
Category
--------
A
B
A
B
A
B
A
A
(8 rows affected)
If you mean the tally itself, and not its application to this particular query, I'd still need to see a repro script 🙂
IF OBJECT_ID('test1') IS NOT NULL DROP TABLE test1;
CREATE TABLE test1 ([Category] CHAR(1) NULL,[CNT] INT NULL);
INSERT INTO test1 ([Category], [CNT]) VALUES ('A', 5),('B', 3);
WITH
n AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n(n)),
n3 AS (SELECT n=n1.n FROM n n1, n n2, n n3)
SELECT COUNT(*) FROM n3;
WITH
cteTally10 AS (
SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
),
cteTally1000 AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM cteTally10 c1
CROSS JOIN cteTally10 c2
CROSS JOIN cteTally10 c3
)
SELECT COUNT(*) FROM cteTally1000;
Results:
-----------
1000
(1 row affected)
-----------
1000
(1 row affected)
Cheers!
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply