Select any 5 rows in random

  • Hi,

    I want to be able to select any five rows in random, from a table that I have. How do I do this? I could not think of any way to do it.

    Thanks.

  • One possibility:

    SELECT TOP 5 *

    FROM table

    ORDER BY NEWID()

  • Chris, Thanks for the reply. It works great, just what I needed.

    Can you tell me how it works? I read that NEWID will return an unique identifier. But to use it like 'ORDER BY NEWID()' every row in my table needs a NEWID column right? How does this work?

  • Yup. That's right. If you have a few million rows, then this isn't a good way of doing it.

    In SQL2005 with RowNumber, you'll be able to pick some random numbers and then do something like...

    declare @row1 int, @row2 int, ...

    declare @cnt int

    select @cnt = count(*) from tbl

    select @row1 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000100000)) + 1

    select @row2 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000200000)) + 1

    select @row3 = @cnt * rand(convert(int,(0.5+convert(float,getdate())-convert(int,getdate()))*1000300000)) + 1

    ...

    select top 5 *

    from tbl

    where rownumber in (@row1, @row2, @row3,....

    But however you get your random numbers is up to you. I just find that it's nice to use something based on the smaller fractions of getdate.

    If you have an ID field, then you could use rownumber instead of rownumber. If you have gaps in your id field, then you could get extra random numbers (depending on how sparse it is), and then use 'top 5'. In fact, 'top 5' is good if you don't have many numbers, because you might get the same random number appearing twice.

    RobF

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • What happensd is that SQL creates a temporary work table in TempDB for the columns returned from the query, plus a new column for the NewID, populates the work table and does the sort. The structure of your table won't be affected.

    It's very nasty on large tables. (Took 2 min on a 6 million row table on a 4 processor Itanium server)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yep, like the others are saying it is not a good query performance-wise to run on any large table. In fact, I would not run it on any table at all, since the whole concept of returning 5 random rows from a set is strange to me. Why not simply return the 5 first rows that are found? Since sets have no ordering, any query that wants to return a subset of the rows in a set, but without using a filter to select that subset, would be returning 'random' rows. But then that query in itself would be a strange thing to want to have, since it would make no sense to use a set that way.

    But it was an answer to the question here. A better description of the problem might lead to a different answer.

    Rob, I do not have a Yukon machine here, so I cannot test it right now, but I believe you cannot do this in SQL Server 2005:

    select top 5 *

    from tbl

    where row_number() in (@row1, @row2, @row3,....

    Note that I changed your word "rownumber" in the query to a call to the ranking function row_number() since I assume that was what you meant. If so, then what should the where clause filter on? row_number() is used to add a ranking number to a resultset. Something like this should work however:

    select top 5 *

    from (select row_number() as rownumber, * from tbl)

    where rownumber IN (@row1...)

    But then I wonder what the execution plan for that query will be?

  • Yes, sorry. I was writing it the Oracle way, where you can use rownum like that. I haven't played with SQL2005 enough yet to work out some of the tricks with row_number().

    Of course, in this example, best is still to use the ID field for it. Generate a bunch of random numbers, and then get the IDs that match them.

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • >> the whole concept of returning 5 random rows from a set is strange to me.

    We have a "What's New" section on our website. We usually have like 10-15 items, but can only display 5 or less in the available space.

    >> Why not simply return the 5 first rows that are found? Since sets have no ordering,

    >> any query that wants to return a subset of the rows in a set, but without using a

    >> filter to select that subset, would be returning 'random' rows.

    It is not working like you mentioned. When I tried it, select top 5 * from tbl, it always returned the first five rows. The table has a primary key identity column, I don't know if that has anything to do with it.

    What is row_number()? I searched books online, but there is no row_number.

    And Thanks for all your replies.

  • You could play this game:

    Alter Table [???]

    Add GUId as NewId()

    ...

    Select TOP 5 * from [???]

    where Binary_CheckSum(GUId)>0

    The computed column doesn't really exist (not taking up any disk space) and it changes with every reference to the row! Tweak the WHERE clause logic a bit to stretch out the selection interval.



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • What is row_number()? I searched books online, but there is no row_number.

    that's SQL Server 2005

    Does your table has a numeric pkey ?

    if the answer is yes can use something like this :

    create view v_rand

    as

     select rand() r

    create function dbo.f_rand ()

     returns int

    as

    begin

     return 100*(select r from v_rand)

    end

    select *

    from yourtable

    where pk in (

    select dbo.f_rand()

    union all select dbo.f_rand()

    union all select dbo.f_rand()

    union all select dbo.f_rand()

    union all select dbo.f_rand() )  -- This assumes all pk present and in 0-100

    -- You can modify the function  to give you the appropriate values should you need to

     

    hth


    * Noel

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply