July 6, 2010 at 11:12 am
I have a table with 100k records of 10 different types. I need to extract any 2 of each type into a second table for sampling. I can do this with a cursor or loop but can't come up with the set based solution. For example the loop solution is shown below.
create table #testgrp
(
recid decimal(13,0),
typeid varchar(5)
CONSTRAINT [PK_#testgrp] PRIMARY KEY CLUSTERED
(
[recid]
) ON [PRIMARY]
) ON [PRIMARY]
insert into #testgrp(recid,typeid)
select 1,'N1' union
select 2,'N1' union
select 3,'N1' union
select 4,'N1' union
select 5,'N2' union
select 6,'N2' union
select 7,'N2' union
select 8,'N2' union
select 9,'N3' union
select 10,'N3' union
select 11,'N3' union
select 21,'N3'
select top 0 e.* into #seeds from #testgrp e
declare @rc int,@qc int
set @rc = 0
set @qc = 0
select typeid from #testgrp
group by typeid
set @rc = @@rowcount
while @qc < @rc
begin
insert into #seeds
select top 2 e.* from #testgrp e left join #seeds s on s.typeid=e.typeid
where s.typeid is null
set @qc = @qc + 1
end
select * from #seeds
drop table #testgrp
drop table #seeds
Results:
recidtypeid
1N1
2N1
5N2
6N2
9N3
10N3
July 6, 2010 at 11:53 am
mrpolecat, with windows 2005 and up, you could easily use the row_number() function to get what you are after; you probably knoew that already.
For SQL 2000 it's not so intuitive.
Here's a snippet I've adapted to your sample data that i had in my notes; it seems to do exactly what you were after:
SELECT T1.[recid],
T1.[typeid]
FROM #testgrp T1
WHERE T1.[recid] IN
(
SELECT TOP 2 --Change this number to vary # output
T2.[recid]
FROM #testgrp T2
WHERE T2.[typeid] = T1.[typeid]
ORDER BY T2.[recid] DESC
)
ORDER BY T1.[recid],
T1.[typeid] DESC
Lowell
July 6, 2010 at 12:14 pm
July 6, 2010 at 12:37 pm
i believe it is because of the correlated subquery, where the inner query getting the TOP 2 is connecting to the outer query:
SELECT TOP 2 --Change this number to vary # output
T2.[recid]
FROM #testgrp T2
WHERE T2.[typeid] = T1.[typeid] -- the correlating part of the query referencing the outer query
ORDER BY T2.[recid] DESC
if it was just a vanilla SELECT recid from the table, all rows would show up in the outer query...but because of the WHERE T2.[typeid] = T1.[typeid], only the rows that have a join show up, and hte TOP limits it to only a few rows for each recid;
Lowell
July 6, 2010 at 1:36 pm
July 7, 2010 at 1:53 pm
mrpolecat (7/6/2010)
I need to extract any 2 of each type into a second table for sampling.
mrpolecat, it may not be an issue in this case, but for a truly randomized statistical sample, you should probably stay away from ordering by id for extracting the two records from each type. For example, what if someone bulk loaded the data into the table at one time, and had ORDER BY cost, or age, or any other attribute that would cause the top two records to not be representative of the phenomenon. Just a thought.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
July 7, 2010 at 2:04 pm
Thanks Greg, There is actually a whole other part of the query that deals with the randomization of the records that I left out to simplify the issue to the part I was having toruble with. I add a where clause to both the inner and outer queries which limits the records it can select from to a random set.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply