May 19, 2005 at 4:44 pm
Hi,
I want to be able to select any five rows in random, from a table that I have. How do I do this? I could not think of any way to do it.
Thanks.
May 19, 2005 at 5:09 pm
One possibility:
SELECT TOP 5 *
FROM table
ORDER BY NEWID()
May 19, 2005 at 5:24 pm
Chris, Thanks for the reply. It works great, just what I needed.
Can you tell me how it works? I read that NEWID will return an unique identifier. But to use it like 'ORDER BY NEWID()' every row in my table needs a NEWID column right? How does this work?
May 19, 2005 at 9:13 pm
Yup. That's right. If you have a few million rows, then this isn't a good way of doing it.
In SQL2005 with RowNumber, you'll be able to pick some random numbers and then do something like...
declare @row1 int, @row2 int, ...
declare @cnt int
select @cnt = count(*) from tbl
select @row1 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000100000)) + 1
select @row2 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000200000)) + 1
select @row3 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000300000)) + 1
...
select top 5 *
from tbl
where rownumber in (@row1, @row2, @row3,....
But however you get your random numbers is up to you. I just find that it's nice to use something based on the smaller fractions of getdate.
If you have an ID field, then you could use rownumber instead of rownumber. If you have gaps in your id field, then you could get extra random numbers (depending on how sparse it is), and then use 'top 5'. In fact, 'top 5' is good if you don't have many numbers, because you might get the same random number appearing twice.
RobF
Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au
May 20, 2005 at 1:44 am
What happensd is that SQL creates a temporary work table in TempDB for the columns returned from the query, plus a new column for the NewID, populates the work table and does the sort. The structure of your table won't be affected.
It's very nasty on large tables. (Took 2 min on a 6 million row table on a 4 processor Itanium server)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 20, 2005 at 3:24 am
Yep, like the others are saying it is not a good query performance-wise to run on any large table. In fact, I would not run it on any table at all, since the whole concept of returning 5 random rows from a set is strange to me. Why not simply return the 5 first rows that are found? Since sets have no ordering, any query that wants to return a subset of the rows in a set, but without using a filter to select that subset, would be returning 'random' rows. But then that query in itself would be a strange thing to want to have, since it would make no sense to use a set that way.
But it was an answer to the question here. A better description of the problem might lead to a different answer.
Rob, I do not have a Yukon machine here, so I cannot test it right now, but I believe you cannot do this in SQL Server 2005:
select top 5 *
from tbl
where row_number() in (@row1, @row2, @row3,....
Note that I changed your word "rownumber" in the query to a call to the ranking function row_number() since I assume that was what you meant. If so, then what should the where clause filter on? row_number() is used to add a ranking number to a resultset. Something like this should work however:
select top 5 *
from (select row_number() as rownumber, * from tbl)
where rownumber IN (@row1...)
But then I wonder what the execution plan for that query will be?
May 20, 2005 at 3:48 am
Yes, sorry. I was writing it the Oracle way, where you can use rownum like that. I haven't played with SQL2005 enough yet to work out some of the tricks with row_number().
Of course, in this example, best is still to use the ID field for it. Generate a bunch of random numbers, and then get the IDs that match them.
Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au
May 20, 2005 at 10:48 am
>> the whole concept of returning 5 random rows from a set is strange to me.
We have a "What's New" section on our website. We usually have like 10-15 items, but can only display 5 or less in the available space.
>> Why not simply return the 5 first rows that are found? Since sets have no ordering,
>> any query that wants to return a subset of the rows in a set, but without using a
>> filter to select that subset, would be returning 'random' rows.
It is not working like you mentioned. When I tried it, select top 5 * from tbl, it always returned the first five rows. The table has a primary key identity column, I don't know if that has anything to do with it.
What is row_number()? I searched books online, but there is no row_number.
And Thanks for all your replies.
May 20, 2005 at 3:10 pm
You could play this game:
Alter Table [???]
Add GUId as NewId()
...
Select TOP 5 * from [???]
where Binary_CheckSum(GUId)>0
The computed column doesn't really exist (not taking up any disk space) and it changes with every reference to the row! Tweak the WHERE clause logic a bit to stretch out the selection interval.
May 20, 2005 at 3:33 pm
What is row_number()? I searched books online, but there is no row_number.
that's SQL Server 2005
Does your table has a numeric pkey ?
if the answer is yes can use something like this :
create view v_rand
as
select rand() r
create function dbo.f_rand ()
returns int
as
begin
return 100*(select r from v_rand)
end
select *
from yourtable
where pk in (
select dbo.f_rand()
union all select dbo.f_rand()
union all select dbo.f_rand()
union all select dbo.f_rand()
union all select dbo.f_rand() ) -- This assumes all pk present and in 0-100
-- You can modify the function to give you the appropriate values should you need to
hth
* Noel
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply