Best way to randomly select rows?

  • select SUM(c_amount) from Transactions
    where s_credit_tran_type = 'cash' and b_cancel = 0 and b_credit = 0 and b_failed = 0 and dt_when between '2017-01-01 04:09:54.213' and '2017-01-31 02:09:54.213'

    The query gives me SUM(c_amount) equal to $25590.60 from 1993 rows

    I need to write something that will randomly pull $4000 from the 1993 generated rows from the above code.

    Not sure how to tackle or approach. Thanks in advance for any help.

  • You can easily select rows in random order like so. However, I'm not clear about the part of $4000 though.

    SELECT * FROM MyTable ORDER BY NEWID();

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If there was no random requirement, this sounds like a bin packing problem...grab an unknown number of rows until they add up to either exactly $4000, or as close as possible without going over.

    the random thing make sme think that if there are , say, 20 solution groups, you grab a random set of the 20.

    that requirement sounds difficult, because there's no real limits...  if you said find exactly five elements, and order them by max amount that is close to $4000, for example, i could find an example of that.
    but the unknown quantity makes it hard. you have to cross joinall possible values,, then filter...very slow.

    what are you trying to accomplish?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Eric M Russell - Wednesday, February 1, 2017 9:57 AM

    You can easily select rows in random order like so. However, I'm not clear about the part of $4000 though.

    SELECT * FROM MyTable ORDER BY NEWID();

    Perhaps something like this:
    ;
    WITH CTE AS
    (
        SELECT *, SUM(YourAmount) OVER(ORDER BY NEWID() ROWS UNBOUNDED PRECEDING) AS RunningTotal
        FROM YourTable
    )
    SELECT *
    FROM CTE
    WHERE CTE.RunningTotal <= 4000

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • OK, the following will:
    - create a sample table similar to what you're working with
    - insert the table with rows totaling 5975
    - select TOP X rows whose sum is nearest to 4000 without going over


    CREATE TABLE Transactions
    (
        id INT NOT NULL IDENTITY(1,1),
        c_amount INT NOT null
    );

    INSERT INTO Transactions ( c_amount )
    VALUES (100),(170),(90),(125),(110),(155),(160),(180),(175),(100),(185),(70)
        , (170),(90),(125),(100),(155),(160),(80),(175),(100),(185),(70), (190)
        , (60),(190),(170),(30),(55),(80),(140),(75),(125),(30),(170), (150)
        , (170),(210),(40),(130),(70),(150),(95),(175),(70),(130),(195), (45);


    WITH T (random_id, c_amount)
    AS
    (
     SELECT NEWID() random_id, c_amount
     FROM Transactions
    )
    SELECT random_id, c_amount
    FROM T AS A
    WHERE
     (SELECT SUM(c_amount)
     FROM T AS B
     WHERE B.random_id <= A.random_id) <= 4000
    ORDER BY A.random_id;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thank you for all the kind responses! I will take a look at all the code.

    Chef

  • Eric M Russell - Wednesday, February 1, 2017 11:36 AM

    OK, the following will:
    - create a sample table similar to what you're working with
    - insert the table with rows totaling 5975
    - select TOP X rows whose sum is nearest to 4000 without going over


    CREATE TABLE Transactions
    (
        id INT NOT NULL IDENTITY(1,1),
        c_amount INT NOT null
    );

    INSERT INTO Transactions ( c_amount )
    VALUES (100),(170),(90),(125),(110),(155),(160),(180),(175),(100),(185),(70)
        , (170),(90),(125),(100),(155),(160),(80),(175),(100),(185),(70), (190)
        , (60),(190),(170),(30),(55),(80),(140),(75),(125),(30),(170), (150)
        , (170),(210),(40),(130),(70),(150),(95),(175),(70),(130),(195), (45);


    WITH T (random_id, c_amount)
    AS
    (
     SELECT NEWID() random_id, c_amount
     FROM Transactions
    )
    SELECT random_id, c_amount
    FROM T AS A
    WHERE
     (SELECT SUM(c_amount)
     FROM T AS B
     WHERE B.random_id <= A.random_id) <= 4000
    ORDER BY A.random_id;

    This will not work as expected, because the CTE is run every time it is referenced, so the random IDs in the main query will not match the random IDs in the subquery.  This can easily be seen by running the fol
    WITH CTE AS
    (
        SELECT NEWID() AS id
    )
    SELECT a.id, b.id
    FROM CTE AS a
    FULL OUTER JOIN CTE AS b
        ON a.id = b.id

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I'll admit I don't resort to using CTEs often. I thought the whole idea of a CTE is that it can be referenced multiple times within the same query while only being executed once. If that's not the case, then we'll need to use a temp table instead.


    SELECT NEWID() random_id, c_amount
    INTO #T
    FROM Transactions;

    SELECT random_id, c_amount
    FROM #T AS A
    WHERE
    (SELECT SUM(c_amount)
    FROM #T AS B
    WHERE B.random_id <= A.random_id) <= 4000
    ORDER BY A.random_id;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Thursday, February 2, 2017 1:02 PM

    I'll admit I don't resort to using CTEs often. I thought the whole idea of a CTE is that it can be referenced multiple times within the same query while only being executed once. If that's not the case, then we'll need to use a temp table instead.


    SELECT NEWID() random_id, c_amount
    INTO #T
    FROM Transactions;

    SELECT random_id, c_amount
    FROM #T AS A
    WHERE
    (SELECT SUM(c_amount)
    FROM #T AS B
    WHERE B.random_id <= A.random_id) <= 4000
    ORDER BY A.random_id;

    I first ran into this problem with a ROW_NUMBER() function where the ORDER BY clause wasn't specified well enough to be completely deterministic.

    This updated solution still suffers from the triangular "join".  I posted a solution with the windowed version of the SUM() which will almost certainly be faster.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Eric M Russell - Wednesday, February 1, 2017 11:36 AM

    OK, the following will:
    - create a sample table similar to what you're working with
    - insert the table with rows totaling 5975
    - select TOP X rows whose sum is nearest to 4000 without going over


    CREATE TABLE Transactions
    (
        id INT NOT NULL IDENTITY(1,1),
        c_amount INT NOT null
    );

    INSERT INTO Transactions ( c_amount )
    VALUES (100),(170),(90),(125),(110),(155),(160),(180),(175),(100),(185),(70)
        , (170),(90),(125),(100),(155),(160),(80),(175),(100),(185),(70), (190)
        , (60),(190),(170),(30),(55),(80),(140),(75),(125),(30),(170), (150)
        , (170),(210),(40),(130),(70),(150),(95),(175),(70),(130),(195), (45);


    WITH T (random_id, c_amount)
    AS
    (
     SELECT NEWID() random_id, c_amount
     FROM Transactions
    )
    SELECT random_id, c_amount
    FROM T AS A
    WHERE
     (SELECT SUM(c_amount)
     FROM T AS B
     WHERE B.random_id <= A.random_id) <= 4000
    ORDER BY A.random_id;

    I already have a table called Transactions   ???

    Shoudlnt that be a temp table ?

  • Transactions represents your user table which I'm loading with mockup rows.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • chef423 - Thursday, February 2, 2017 1:31 PM

    I already have a table called Transactions   ???

    Shoudlnt that be a temp table ?

    No, the purpose of that table is to provide sample data, since most of us don't have access to your actual table and you did not provide any sample data.  Since you DO have access to your actual table, you would skip the first part where he is setting up the sample data and only use the second part.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply