Select any 5 rows in random

Question

Post reply

Select any 5 rows in random

Senthil T

Hall of Fame

Points: 3367
More actions
May 19, 2005 at 4:44 pm

#66967

Hi,
I want to be able to select any five rows in random, from a table that I have. How do I do this? I could not think of any way to do it.
Thanks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 1

One possibility:

SELECT TOP 5 *

FROM table

ORDER BY NEWID()

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Senthil T Hall of Fame Points: 3367 More actions · Answer 2

Chris, Thanks for the reply. It works great, just what I needed.

Can you tell me how it works? I read that NEWID will return an unique identifier. But to use it like 'ORDER BY NEWID()' every row in my table needs a NEWID column right? How does this work?

Rob Farley SSCrazy Points: 2739 More actions · Answer 3

Yup. That's right. If you have a few million rows, then this isn't a good way of doing it.

In SQL2005 with RowNumber, you'll be able to pick some random numbers and then do something like...

declare @row1 int, @row2 int, ...

declare @cnt int

select @cnt = count(*) from tbl

select @row1 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000100000)) + 1

select @row2 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000200000)) + 1

select @row3 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000300000)) + 1

...

select top 5 *

from tbl

where rownumber in (@row1, @row2, @row3,....

But however you get your random numbers is up to you. I just find that it's nice to use something based on the smaller fractions of getdate.

If you have an ID field, then you could use rownumber instead of rownumber. If you have gaps in your id field, then you could get extra random numbers (depending on how sparse it is), and then use 'top 5'. In fact, 'top 5' is good if you don't have many numbers, because you might get the same random number appearing twice.

RobF

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

What happensd is that SQL creates a temporary work table in TempDB for the columns returned from the query, plus a new column for the NewID, populates the work table and does the sort. The structure of your table won't be affected.

It's very nasty on large tables. (Took 2 min on a 6 million row table on a 4 processor Itanium server)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 5

Yep, like the others are saying it is not a good query performance-wise to run on any large table. In fact, I would not run it on any table at all, since the whole concept of returning 5 random rows from a set is strange to me. Why not simply return the 5 first rows that are found? Since sets have no ordering, any query that wants to return a subset of the rows in a set, but without using a filter to select that subset, would be returning 'random' rows. But then that query in itself would be a strange thing to want to have, since it would make no sense to use a set that way.

But it was an answer to the question here. A better description of the problem might lead to a different answer.

Rob, I do not have a Yukon machine here, so I cannot test it right now, but I believe you cannot do this in SQL Server 2005:

select top 5 *

from tbl

where row_number() in (@row1, @row2, @row3,....

Note that I changed your word "rownumber" in the query to a call to the ranking function row_number() since I assume that was what you meant. If so, then what should the where clause filter on? row_number() is used to add a ranking number to a resultset. Something like this should work however:

select top 5 *

from (select row_number() as rownumber, * from tbl)

where rownumber IN (@row1...)

But then I wonder what the execution plan for that query will be?

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Rob Farley SSCrazy Points: 2739 More actions · Answer 6

Yes, sorry. I was writing it the Oracle way, where you can use rownum like that. I haven't played with SQL2005 enough yet to work out some of the tricks with row_number().

Of course, in this example, best is still to use the ID field for it. Generate a bunch of random numbers, and then get the IDs that match them.

Rob Farley
LobsterPot Solutions & Adelaide SQL Server User Group
Company: http://www.lobsterpot.com.au
Blog: http://blogs.lobsterpot.com.au

Senthil T Hall of Fame Points: 3367 More actions · Answer 7

>> the whole concept of returning 5 random rows from a set is strange to me.

We have a "What's New" section on our website. We usually have like 10-15 items, but can only display 5 or less in the available space.

>> Why not simply return the 5 first rows that are found? Since sets have no ordering,

>> any query that wants to return a subset of the rows in a set, but without using a

>> filter to select that subset, would be returning 'random' rows.

It is not working like you mentioned. When I tried it, select top 5 * from tbl, it always returned the first five rows. The table has a primary key identity column, I don't know if that has anything to do with it.

What is row_number()? I searched books online, but there is no row_number.

And Thanks for all your replies.

Peter E. Kierstead SSCarpal Tunnel Points: 4114 More actions · Answer 8

You could play this game:

Alter Table [???]

Add GUId as NewId()

...

Select TOP 5 * from [???]

where Binary_CheckSum(GUId)>0

The computed column doesn't really exist (not taking up any disk space) and it changes with every reference to the row! Tweak the WHERE clause logic a bit to stretch out the selection interval.

PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

noeld SSC Guru Points: 96590 More actions · Answer 9

What is row_number()? I searched books online, but there is no row_number.

that's SQL Server 2005

Does your table has a numeric pkey ?

if the answer is yes can use something like this :

create view v_rand

as

select rand() r

create function dbo.f_rand ()

returns int

as

begin

return 100*(select r from v_rand)

end

select *

from yourtable

where pk in (

select dbo.f_rand()

union all select dbo.f_rand()

union all select dbo.f_rand() ) -- This assumes all pk present and in 0-100

-- You can modify the function to give you the appropriate values should you need to

hth

* Noel