SELECT 1 random row query help, please

  • Hi all,

    While the query below works as expected, it's very slow when I run it on a table with 1million+ rows. Is there an alternative to ORDER BY NEWID()?

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT NULL, '77765-555-0990'

    SELECT TOP 1 Id,

    CategoryId,

    SKU

    FROM @t

    ORDER BY NEWID()

    I also tried the below code (taken from https://msdn.microsoft.com/en-us/library/cc441928.aspx) but it sometimes returns nothing so is unreliable and changing "< 10" to "> 0" always returns the first row. Please help me getting one single & random row from the above data set efficiently. Thank you.

    SELECT TOP 1 Id,

    CategoryId,

    SKU

    FROM @t

    WHERE (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) AS INT)) % 100) < 10

  • Here is something to play with

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT NULL, '77765-555-0990'

    DECLARE @RandomRowNumber BIGINT = (SELECT CAST(ROUND((COUNT(*) - 1) * Rand() + 1, 0) AS BIGINT) FROM @T) ;

    ;WITH dataCTE AS

    (

    SELECT *

    , RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM @T

    )

    SELECT *

    FROM dataCTE

    WHERE RN = @RandomRowNumber ;

  • Aditya Daruka (12/8/2015)


    Here is something to play with

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT NULL, '77765-555-0990'

    DECLARE @RandomRowNumber BIGINT = (SELECT CAST(ROUND((COUNT(*) - 1) * Rand() + 1, 0) AS BIGINT) FROM @T) ;

    ;WITH dataCTE AS

    (

    SELECT *

    , RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM @T

    )

    SELECT *

    FROM dataCTE

    WHERE RN = @RandomRowNumber ;

    Replace one of those two clustered index scans with a seek:

    SELECT *

    FROM (SELECT RandomRowNumber = 1+ABS(CHECKSUM(NEWID()))%COUNT(*) FROM @T) t

    CROSS APPLY (SELECT TOP 1 * FROM @T WHERE RandomRowNumber > Id ORDER BY Id DESC) x

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • See my next post, don't know how to delete this error posting.

  • Most probably there were a lot of DELETEs an INSERTs on a table with 1million+ rows. So IDs hardly are evenly distributed on (1 .. count(*)) interval or even on (Min(id) ..Max(id) ) interval. Output will not be really random.

    For example

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT NULL, '77765-555-0990'

    ;

    delete from @t where id>3 and CategoryId=3200;

    SELECT *

    FROM (SELECT RandomRowNumber = min(id)+1+ABS(CHECKSUM(NEWID()))%(max(id)-min(id)+1) FROM @T) t

    CROSS APPLY (SELECT TOP 1 * FROM @T WHERE RandomRowNumber > Id ORDER BY Id DESC) x

    This will return Id=3 far frequently then others.

  • You simply MUST NOT order all rows to get one!! That is an immensely costly operation!

    One quick-and-easy way is to get min and max ID values and create a WHILE loop and generate a random variable int value between those (variety of ways to accomplish this) and then seek on that value with a WHERE id = @id filter. This will be a VERY efficient PK index seek operation. If you don't ever do any deletes and you start with 1 and go up you simply need to get the max value and you will be almost guaranteed to get a hit on the first call, only missing if someone rolled back an insert. Even if you have fairly large ranges of missing records this will almost certainly be faster than scanning all rows into a temp table and doing the row_number, NEWID() thing.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • serg-52 (12/9/2015)


    Most probably there were a lot of DELETEs an INSERTs on a table with 1million+ rows. So IDs hardly are evenly distributed on (1 .. count(*)) interval or even on (Min(id) ..Max(id) ) interval. Output will not be really random.

    For example

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT NULL, '77765-555-0990'

    ;

    delete from @t where id>3 and CategoryId=3200;

    SELECT *

    FROM (SELECT RandomRowNumber = min(id)+1+ABS(CHECKSUM(NEWID()))%(max(id)-min(id)+1) FROM @T) t

    CROSS APPLY (SELECT TOP 1 * FROM @T WHERE RandomRowNumber > Id ORDER BY Id DESC) x

    This will return Id=3 far frequently then others.

    Yes it will. You could mitigate that somewhat by using TOP(n), whatever value you want for n, and choosing one of those rows at random - but see Kevin's post below for an alternative method where ID's have been deleted.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • TheSQLGuru (12/9/2015)


    You simply MUST NOT order all rows to get one!! That is an immensely costly operation!

    One quick-and-easy way is to get min and max ID values and create a WHILE loop and generate a random variable int value between those (variety of ways to accomplish this) and then seek on that value with a WHERE id = @id filter. This will be a VERY efficient PK index seek operation. If you don't ever do any deletes and you start with 1 and go up you simply need to get the max value and you will be almost guaranteed to get a hit on the first call, only missing if someone rolled back an insert. Even if you have fairly large ranges of missing records this will almost certainly be faster than scanning all rows into a temp table and doing the row_number, NEWID() thing.

    I think it totally depends on how much "random" this random row is required to be. If (p(id) >0 for any id) is quite enough then i agree, no need to enumerate rows. If more strict (p(id1)=p(id2) for any id1,id2) is required then i can't see how to achieve this without row_numbering ids first in general case.

  • Thank you everyone for the replies, very helpful!!

    Unfortunately there's no guarantee that there would be no gaps. Also, I need ti implement a logic similar to this. E.g. pick a random row if the declared CategoryId exists, otherwise pick a random row from the set where categoryId is null. I know it's ugly but I just want to demonstrate the logic and expected results. Thank you.

    DECLARE @T TABLE (Id INT IDENTITY(1,1) PRIMARY KEY, CategoryId INT NULL, SKU VARCHAR(60) NOT NULL)

    INSERT INTO @T ( CategoryId, SKU)

    SELECT 3200, '978-654-321' UNION ALL

    SELECT NULL, '53-555-64' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT NULL, '888-555-66' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT NULL, '23-555-222' UNION ALL

    SELECT 3200, '000-258-963' UNION ALL

    SELECT 3200, '969-555-010' UNION ALL

    SELECT 6565, '969-555-010' UNION ALL

    SELECT 6565, '23-555-222' UNION ALL

    SELECT 6565, '000-258-963' UNION ALL

    SELECT NULL, '888-555-010' UNION ALL

    SELECT 55, '888-555-66' UNION ALL

    SELECT 55, '969-555-010' UNION ALL

    SELECT 33, '969-555-010' UNION ALL

    SELECT NULL, '77765-555-0990'

    DECLARE @DS1MinValue INT

    DECLARE @DS1MaxValue INT

    DECLARE @DS2MinValue INT

    DECLARE @DS2MaxValue INT

    DECLARE @CategoryId INT = 6565

    CREATE TABLE #TBL (Id INT, Rn INT, CategoryId INT, SKU VARCHAR(50))

    INSERT INTO #TBL(Id, Rn, CategoryId, SKU)

    SELECT Id, row_number() over(partition by case when categoryId is null then 0 else categoryId end order by case when categoryId is null then 0 else categoryId end ) AS Rn, CategoryId, SKU

    from @t

    SELECT @DS1MinValue= (select MIN(Rn) from #TBL where categoryId = @CategoryId)

    SELECT @DS1MaxValue= (select MAX(Rn) from #TBL where categoryId = @CategoryId)

    SELECT @DS2MinValue= (select MIN(Rn) from #TBL where categoryId is null)

    SELECT @DS2MaxValue= (select MAX(Rn) from #TBL where categoryId is null)

    DECLARE @DS1Rnd INT = (SELECT ROUND(((@DS1MaxValue - @DS1MinValue) * RAND() + @DS1MinValue), 0))

    DECLARE @DS2Rnd INT = (SELECT ROUND(((@DS2MaxValue - @DS2MinValue) * RAND() + @DS2MinValue), 0))

    if exists (select Id from #TBL where CategoryId = @CategoryId)

    begin

    -- pick a random row from data set where CategoryId IS NOT NULL

    SELECT * fROM #TBL WHERE CategoryId = @CategoryId AND Rn = @DS1Rnd

    end

    else

    begin

    -- pick a random row from data set where CategoryId IS NULL

    SELECT * fROM #TBL WHERE CategoryId IS NULL AND Rn = @DS2Rnd

    end

    DROP TABLE #TBL

  • Create a separate index of just the IDs. Then select a random ID using ORDER BY NEWID(), then use that ID to get the rest of the columns. You certainly don't want to have to sort the entire row.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Handling gaps with my mechanism really is simple and efficient. Been there, done that, got the tshirt.

    But your new requirement about category makes it much more likely you will need to do the sort thing, although possibly with an exists check first. If row exists with your category, do the sort/rownumber thing for that category. If it doesn't exist do the sort/rownumber thing for null categories. As Scott said, these would hopefully all be narrow-index hits.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/9/2015)


    Handling gaps with my mechanism really is simple and efficient. Been there, done that, got the tshirt.

    But your new requirement about category makes it much more likely you will need to do the sort thing, although possibly with an exists check first. If row exists with your category, do the sort/rownumber thing for that category. If it doesn't exist do the sort/rownumber thing for null categories. As Scott said, these would hopefully all be narrow-index hits.

    Hi Kevin, Many thanks for your help. So basically you're saying I should use the script I attached a couple of comments above? Any way I can make that faster? That's very slow when I run it on my db. Thanks

  • ScottPletcher (12/9/2015)


    Create a separate index of just the IDs. Then select a random ID using ORDER BY NEWID(), then use that ID to get the rest of the columns. You certainly don't want to have to sort the entire row.

    Thanks for your response. Can you provide an example, please? Index on the Ids? That's my primary key (clustered index) so not sure what you mean. Thanks!!

  • kiril.lazarov.77 (12/10/2015)


    ScottPletcher (12/9/2015)


    Create a separate index of just the IDs. Then select a random ID using ORDER BY NEWID(), then use that ID to get the rest of the columns. You certainly don't want to have to sort the entire row.

    Thanks for your response. Can you provide an example, please? Index on the Ids? That's my primary key (clustered index) so not sure what you mean. Thanks!!

    Depends on the width of the row. If it's much more than 4 bytes -- and usually it is -- you'l need the extra index to prevent a full table scan. In fact, SQL loves clustered indexes so much, you might even have to force it to use the nonclus index on Id.

    CREATE UNIQUE NONCLUSTERED INDEX [index_name] ON [table_name] ( Id ) WITH ( FILLFACTOR = 99, SORT_IN_TEMPDB = ON ) ON [filegroupname];

    DECLARE @Id int

    SELECT TOP (1) @Id = Id

    FROM [table_name] WITH (INDEX([index_name]))

    ORDER BY NEWID()

    SELECT * --col,col,col...

    FROM [table_name]

    WHERE Id = @Id

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • kiril.lazarov.77 (12/10/2015)


    ScottPletcher (12/9/2015)


    Create a separate index of just the IDs. Then select a random ID using ORDER BY NEWID(), then use that ID to get the rest of the columns. You certainly don't want to have to sort the entire row.

    Thanks for your response. Can you provide an example, please? Index on the Ids? That's my primary key (clustered index) so not sure what you mean. Thanks!!

    Depends on the width of the row. If it's much more than 4 bytes -- and usually it is -- you'l need the extra index to prevent a full table scan. In fact, SQL loves clustered indexes so much, you might even have to force it to use the nonclus index on Id.

    CREATE UNIQUE NONCLUSTERED INDEX [index_name] ON [table_name] ( Id ) WITH ( FILLFACTOR = 99, SORT_IN_TEMPDB = ON ) ON [filegroupname];

    DECLARE @Id int

    SELECT TOP (1) @Id = Id

    FROM [table_name] WITH (INDEX([index_name]))

    ORDER BY NEWID()

    SELECT * --col,col,col...

    FROM [table_name]

    WHERE Id = @Id

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply