March 24, 2009 at 2:50 pm
I have an interesting challenge to create duplicate rows based on a column value in a result set.
As an example, I have a resultset with 2 records.
---------------------------------
| id | Product | Description | Qty |
---------------------------------
| 1 | P1 | Product 1 | 2 |
| 2 | P2 | Product 2 | 3 |
---------------------------------
The resultset should look like below with 5 rows
---------------------------------
| id | Product | Description | Qty |
---------------------------------
| 1 | P1 | Product 1 | 2 |
| 1 | P1 | Product 1 | 2 |
| 2 | P2 | Product 2 | 3 |
| 2 | P2 | Product 2 | 3 |
| 2 | P2 | Product 2 | 3 |
---------------------------------
I thought about using temp tables and repeating the rows based on the qty value with recursive loops. I wasnt sure if there is a better option or possibly a query trick.
help is much appreciated!
March 24, 2009 at 2:58 pm
You should use a Tally (or Numbers) table for this. There are several articles about Tally tables on this site. You can join to your table to a Tally table, limiting the number of rows selected from the Tally table based on the value of your Qty column.
March 24, 2009 at 3:00 pm
Hi
You can use a recursive CTE with a counter like this:
DECLARE @t TABLE (id INT, product VARCHAR(100), qty INT)
INSERT INTO @t
SELECT 1, 'P1', 2
UNION SELECT 2, 'P2', 3
;WITH recurse (id, product, qty, cnt) AS
(
SELECT id, product, qty, qty
FROM @t
UNION ALL
SELECT id, product, qty, cnt - 1
FROM recurse
WHERE cnt - 1 != 0
)
SELECT id, product, qty
FROM recurse
ORDER BY id
Greets
Flo
March 24, 2009 at 3:05 pm
andrewd.smith (3/24/2009)
You should use a Tally (or Numbers) table for this. There are several articles about Tally tables on this site. You can join to your table to a Tally table, limiting the number of rows selected from the Tally table based on the value of your Qty column.
... sure you are right. The Tally Solution is the better approach!
SELECT id, product, qty
FROM @t t
JOIN Tally ON Tally.N <= t.qty
Thanks for correction! 🙂
Flo
March 27, 2009 at 8:52 am
Thanks guy .... tally table worked like a charm
December 4, 2009 at 8:33 pm
If the maximum value for quantity is relatively small, or if you don't have a Numbers Table available, something like the following provides an efficient alternative:
declare @data
table (
id integer primary key,
product char(2) not null,
[description] varchar(100) not null,
quantity tinyint not null
);
insert @data
(
id,
product,
[description],
quantity
)
values (
1,
'P1',
'Product 1',
2
);
insert @data
(
id,
product,
[description],
quantity
)
values (
2,
'P2',
'Product 2',
3
);
select d.id,
d.product,
d.[description],
d.quantity
from @data d
cross
apply (
-- Assuming MAX(quantity) <= 9
select convert(integer, 1) union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9
) ca (rn)
where ca.rn <= quantity
order by
d.id;
For larger values, a dynamic Numbers Table could be used (generated using TOP (n) and the ROW_NUMBER() function over some system table CROSS JOINed to itself a number of times:
select top (1000)
rn = ROW_NUMBER() over (order by a1.allocation_unit_id)
from master.sys.allocation_units a1,
master.sys.allocation_units a2,
master.sys.allocation_units a3,
master.sys.allocation_units a4
order by
rn;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 5, 2009 at 9:54 am
Paul White (12/4/2009)
For larger values, a dynamic Numbers Table could be used (generated using TOP (n) and the ROW_NUMBER() function over some system table CROSS JOINed to itself a number of times:
select top (1000)
rn = ROW_NUMBER() over (order by a1.allocation_unit_id)
from master.sys.allocation_units a1,
master.sys.allocation_units a2,
master.sys.allocation_units a3,
master.sys.allocation_units a4
order by
rn;
Ooooo.... be careful. The idea of using a single cross join is a great idea because it's nasty fast and doesn't cost much. When you start adding addition cross-joins and you go over the boundry of one of the single cross-joins, it starts to become very expensive insofar as possible growth in the TempDB log file.
I built a "little" race between the Cross-Join method and the cascading CTE method that Itzik has in his book to build a Billion row numeric table just to see wha'hapuns. 😛 The cascading CTE method did nothing to TempDB... the multiple Cross-Join method cause the TempDB logfile to blow out to over 40GB.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2009 at 2:56 pm
Thanks Jeff,
Yes one should be careful. I wouldn't advocate using the method for billions of rows 😉
I had much smaller maximum values for 'Quantity' in mind - maybe thousands.
(And if a Tally/Numbers table is to hand that's obviously the way to go anyway)
I have heard you mention your test versus Itzik's method before - one day I'd like to look into it for myself. Is there anywhere I can find the code for your tests? Just if it's easy - it intrigues me slightly. I imagine the issue is the normal one (row count spools) but I'd be good to know for sure.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
December 6, 2009 at 7:08 pm
Paul White (12/6/2009)
Thanks Jeff,Yes one should be careful. I wouldn't advocate using the method for billions of rows 😉
I had much smaller maximum values for 'Quantity' in mind - maybe thousands.
(And if a Tally/Numbers table is to hand that's obviously the way to go anyway)
I have heard you mention your test versus Itzik's method before - one day I'd like to look into it for myself. Is there anywhere I can find the code for your tests? Just if it's easy - it intrigues me slightly. I imagine the issue is the normal one (row count spools) but I'd be good to know for sure.
Paul
Ah... you can do the tests for yourself... under 8k, the Tally table normally wins especially after it caches. For VARCHAR(MAX), believe it or not, a well written WHILE loop will many times win (of course, I'm thinking "splits" here). Of course, for splits, a very well written CLR will win (and I do mean very well written like the guy that FLO introduced on that very long split thread).
If you need something up to 16 million, I'd recommend the single cross join on sys.All_Columns in 2k5.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 21, 2023 at 8:44 am
This was removed by the editor as SPAM
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply