Sample every nth record

  • Hi everyone-

    Prob an easy question, is it possible to sample every nth record in SQL 2005? I have a database with over 4 million records and wish to sample every 100th record? Thanks!

  • Assuming you want a random sampling:

    declare @n int

    set @n=100

    select t.* from

    (select *, row_number() over (Order by newID() ) as 'ranknumber' from table1) t

    where ranknumber%@n=0

    Or...(in the case one out of 100)

    select top 1 percent * from

    table1

    order by newID()

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • This is unlikely to run very fast, 43 seconds on my machine against 54 million rows, but it'll work.

    WITH x AS (SELECT b.[ID]

    ,b.[Date]

    ,b.[Value]

    ,ROW_NUMBER() OVER(ORDER BY b.[ID]) AS RowNum

    FROM [BigTable] b

    )

    SELECT * FROM x

    WHERE (x.rownum % 100 = 0)

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Try this:

    SELECT *

    FROM your_table

    TABLESAMPLE SYSTEM (1 PERCENT) ;

  • Thanks everyone!

  • Suppose i have 250 records. If i tried by the above suggested queries i am getting 3 .All fine. But how to confirm those are the recoeds in 0th , 100th and 200th positions. I tried but getting different records. Please clear my confusion

  • deb (11/8/2007)


    Suppose i have 250 records. If i tried by the above suggested queries i am getting 3 .All fine. But how to confirm those are the recoeds in 0th , 100th and 200th positions. I tried but getting different records. Please clear my confusion

    I can't address all the examples, but from the example I supplied, with no ORDER BY statement, you can't guarantee getting the same values every time. Adding the ORDER BY would guarantee that.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I came across this in my notes, but haven't tried it lately ... (too tired & busy right now)

    +++++++++

    --This selects every 500 th record, using an identity field

    SELECT fielda, fieldb, fieldc

    FROM Table_Name

    WHERE Identityfield % 500 = 0

  • That's right there with my take on it, but I tried supplying the ID field in case the table didn't have one, or there were gaps or whatever. "Every 'X'" pretty much equated to modulo in my mind.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you have an identity column then % operator should be the quickest mechanism to get every N records back.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Except if there are gaps (which there usually are) and you really need exactly every nth record from some particular order. Then you still have to do some kind of processing to arrive at the solution.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That was the same line of thinking I used when I threw in the ROW_NUMBER() solution. At least you know you won't get a gap that way.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I took it that if there was an identity column and the Nth record was absent then that Nth record shouldn't be returned. 😛

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • True - but if records with ID 40-60 are gone, then record with ID=100 isn't the 100th record anymore...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply