Calculating Prime Numbers With One Query

  • Jonathan AC Roberts wrote:

    If they are implementing things like shift-left and shift-right maybe they should also implement unsigned integers.

    It's coming... According to the way they handled things like GENERATE_SERIES(), etc, it should be out in as little as sometime in the next 20 to 30 years or so.  😀

    At least they made Columns_Updated() a binary.  GET_BIT() is going to "help a bit" there. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    If they are implementing things like shift-left and shift-right maybe they should also implement unsigned integers.

    It's coming... According to the way they handled things like GENERATE_SERIES(), etc, it should be out in as little as sometime in the next 20 to 30 years or so.  😀

    At least they made Columns_Updated() a binary.  GET_BIT() is going to "help a bit" there. 😀

    That quickly? 😁

    I'm a bit behind on my reading too, I didn't know about GET_BIT() or Columns_Updated(), GET_BIT() and SET_BIT() could be useful for a prime sieve if it is also fast.

  • @steve-2 Collins,

    Sorry... I tried your latest code on a max value of  "just" 1 million.  I stopped it at 2 hours and 8 minutes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts wrote:

    Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    If they are implementing things like shift-left and shift-right maybe they should also implement unsigned integers.

    It's coming... According to the way they handled things like GENERATE_SERIES(), etc, it should be out in as little as sometime in the next 20 to 30 years or so.  😀

    At least they made Columns_Updated() a binary.  GET_BIT() is going to "help a bit" there. 😀

    That quickly? 😁

    I'm a bit behind on my reading too, I didn't know about GET_BIT() or Columns_Updated(), GET_BIT() and SET_BIT() could be useful for a prime sieve if it is also fast.

    COLUMNS_UPDATED() has been out since I can remember.  You just wouldn't know about it unless you needed some extra insanity in your life when it comes to triggers.  Obviously, I needed more insanity in my life a while back. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    @Steve Collins,

    Sorry... I tried your latest code on a max value of  "just" 1 million.  I stopped it at 2 hours and 8 minutes.

    Well that's disappointing.  Could it be improved?  At least the other code runs in 2 seconds but it doesn't have nearly the name cachet

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This was removed by the editor as SPAM

  • Despite its appearance, this process seems slow at first glance; however, it actually exhibits impressive speed, computing all prime numbers below one million in just two seconds.

    SET STATISTICS IO, TIME OFF;
    SET NOCOUNT ON;
    DROP TABLE IF EXISTS #Primes;
    CREATE TABLE #Primes(Prime int NOT NULL PRIMARY KEY);

    DECLARE @maxNumber int = 1000000;

    INSERT INTO #Primes(Prime)
    SELECT t.value
    FROM (values (2),(3)) t(value)
    WHERE t.value <= @maxNumber
    UNION ALL
    SELECT value
    FROM GENERATE_SERIES(5, @maxNumber, 2)
    WHERE value%6 IN(1, 5);

    DECLARE @Start int = 3,
    @Stop int = CAST(SQRT(@maxNumber) AS int);

    WHILE @Start <= @Stop BEGIN

    DELETE #Primes
    WHERE Prime > @Start
    AND Prime % @Start = 0;

    SELECT @Start=MIN(Prime)
    FROM #Primes
    WHERE Prime > @Start;
    END

    SELECT *
    FROM #Primes
    ORDER BY 1;
  • That's pretty amazing.  Now someone just needs to verify that it's actually producing the correct results using a brute force method.

    Also, I've not looked into it, is there a link somewhere that explains the Mod 6(1,5) optimization?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    That's pretty amazing.  Now someone just needs to verify that it's actually producing the correct results using a brute force method.

    It's basically a primes sieve, but instead of ticking off values that are multiples of each prime to indicate they are not prime it deletes them. This means as it progresses there are fewer and fewer rows on the "primes" table. The lowest number on the table that is higher than the last row tested is guaranteed to be prime as it would have already been deleted if it were not prime. When it has got to the prime that is less or equal to the square root of the maximum number to be found it stops as all divisors greater than the square root have to be multiplied by a number less than the square root to give the number. It's not really a very efficient way to do it as the algorithm has to insert all prime candidates then delete all the ones that are not prime. But it just shows how efficient prime sieves are.

    Jeff Moden wrote:

    is there a link somewhere that explains the Mod 6(1,5) optimization? 

    It's just used to eliminate numbers that are not prime from the table of numbers to be tested. So instead of just selecting odd numbers (n/2) it selects (n/3) numbers which is a significant saving especially as the extra numbers in the odd selection would have to be inserted then deleted from a database table.

    https://math.stackexchange.com/questions/756547/prove-that-every-prime-larger-than-3-gives-a-remainder-of-1-or-5-if-divide

  • Here's another non improvement over other methods afaik.  In theory this one seemed to hold a lot of promise imo.  When I gave it to ChatGPT to craft an explanation it told me this method already had a name

    declare @n           int=100;
    declare @sqrt_n int=sqrt(@n);
    declare @interval int=(select ceiling((@n - (@sqrt_n + 1))/2)+1);
    --select @interval

    /* factors of sqrt primes in cross applied interval */with sqrt_primes_cte(prime) as (
    select fn.N*2+1
    from dbo.fnTally(1, @sqrt_n/2-1) fn
    where not exists(select 1
    from dbo.fnTally(1, sqrt(fn.N/2)) ffn
    where (fn.N*2+1)%(ffn.N*2+1)=0
    and (fn.N*2+1)<>(ffn.N*2+1)))
    select v.prime from (values (2),(3)) v(prime)
    union all
    select sq.prime from sqrt_primes_cte sq
    union all
    select @sqrt_n+fn.n*2+1
    from dbo.fnTally(1, @interval-1) fn
    where not exists (select 1
    from sqrt_primes_cte sq
    cross apply (select fn.n
    from dbo.fnTally(1, @interval/sq.prime+1) fn
    where (fn.n*2+1)*prime>@sqrt_n
    and (fn.n*2+1)*prime<=@n) i(n)
    where (i.n*2+1)*prime = @sqrt_n+fn.n*2+1);

    This query implements the Sieve of Sundaram algorithm to generate prime numbers up to a given value @n. Here's a simple explanation for each of its salient aspects:

    1. The query starts by declaring variables @n, @sqrt_n, and @interval. @n is the upper limit for prime numbers, @sqrt_n is the square root of @n, and @interval is calculated based on these values.

    2. The sqrt_primes_cte Common Table Expression (CTE) generates prime numbers up to the square root of @n. It uses the dbo.fnTally function to generate a sequence of odd numbers and filters out non-prime numbers using a nested subquery. The subquery checks if the current odd number is divisible by any smaller odd number (except itself).

    3. The query then selects the prime number 2 as the first prime number.

    4. The query unions the prime numbers generated by the sqrt_primes_cte CTE.

    5. The query calculates the remaining prime numbers greater than the square root of @n using another subquery. It generates a sequence of odd numbers using the dbo.fnTally function and filters out non-prime numbers by checking if they are divisible by any prime number from the sqrt_primes_cte CTE.

    6. The final result is a list of prime numbers up to the given value @n.

    The Sieve of Sundaram algorithm is an efficient method for generating prime numbers, and this query demonstrates its implementation using SQL Server and the dbo.fnTally function for generating sequences.

    According to my tests this approach is a bit slower than (what I'm still calling) the 2 tally method

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Jonathan AC Roberts wrote:

    Jeff Moden wrote:

    That's pretty amazing.  Now someone just needs to verify that it's actually producing the correct results using a brute force method.

    It's basically a primes sieve, but instead of ticking off values that are multiples of each prime to indicate they are not prime it deletes them. This means as it progresses there are fewer and fewer rows on the "primes" table. The lowest number on the table that is higher than the last row tested is guaranteed to be prime as it would have already been deleted if it were not prime. When it has got to the prime that is less or equal to the square root of the maximum number to be found it stops as all divisors greater than the square root have to be multiplied by a number less than the square root to give the number. It's not really a very efficient way to do it as the algorithm has to insert all prime candidates then delete all the ones that are not prime. But it just shows how efficient prime sieves are.

    Jeff Moden wrote:

    is there a link somewhere that explains the Mod 6(1,5) optimization? 

    It's just used to eliminate numbers that are not prime from the table of numbers to be tested. So instead of just selecting odd numbers (n/2) it selects (n/3) numbers which is a significant saving especially as the extra numbers in the odd selection would have to be inserted then deleted from a database table.

    https://math.stackexchange.com/questions/756547/prove-that-every-prime-larger-than-3-gives-a-remainder-of-1-or-5-if-divide

    Awesome.  Yep... I'm familiar with a few of the different sieves.  I was really interested in the proof for the Mod 6(1,5).  Interesting how they did that.  I REALLY appreciate that link!

    Another interesting thing is that you simulated the manual process of crossing out numbers on paper by deleting them.

    It's a bloody shame that the use of GENERATE_SERIES() defeats minimal logging during inserts.  I smell a "special" fnTallyX2 in the air.  I don't yet know if that'll make up any time yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Another interesting thing is that you simulated the manual process of crossing out numbers on paper by deleting them.

    It's a bloody shame that the use of GENERATE_SERIES() defeats minimal logging during inserts.  I smell a "special" fnTallyX2 in the air.  I don't yet know if that'll make up any time yet.

    I just improved the process a bit. You can use GENERATE_SERIES to specify the multiples of the prime that you want to delete. Also, another thing I learnt today is you can start testing from the square of the prime you are testing (@Start*@Start) in GENERATE_SERIES(@Start*@Start, @maxNumber, @Start), I've not worked out why yet but it is in the pseudocode in the Wikipedia article on Sieve of Eratosthenes. It finds all primes less than 10 million in 10 seconds on my machine:

    SET STATISTICS TIME OFF
    DROP TABLE IF EXISTS #Primes;
    CREATE TABLE #Primes(Prime int NOT NULL PRIMARY KEY);

    DECLARE @maxNumber int = 10000000;

    SET NOCOUNT OFF; -- See how many rows are inserted into #Prime
    SET STATISTICS TIME ON; -- See how long it takes to insert rows into #Prime

    INSERT INTO #Primes(Prime)
    SELECT t.value
    FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59),(61),(67),(71),(73),(79),(83),(89),(97),(101)) t(value)
    WHERE t.value <= @maxNumber
    UNION ALL
    SELECT value
    FROM GENERATE_SERIES(103, @maxNumber, 2) -- Odd numbers
    WHERE value%6 IN(1, 5)
    AND 0 NOT IN (value%5,value%7,value%11,value%13,value%17,value%19,value%23,value%29,value%31,value%37,value%41,value%43,value%47,value%53,value%59,value%61,value%67,value%71,value%73,value%79,value%83,value%89,value%97,value%101)

    SET STATISTICS IO, TIME OFF; -- Don't output lines inside while loop
    SET NOCOUNT ON; -- Don't output lines inside while loop

    DECLARE @Start int = 103,
    @Stop int = CAST(SQRT(@maxNumber) AS int),
    @RowsDeleted int,
    @StartTime datetime2(7);

    WHILE @Start <= @Stop BEGIN

    SET @StartTime = SYSDATETIME();

    DELETE #Primes
    WHERE Prime >= @Start*@Start
    AND EXISTS(SELECT 1
    FROM GENERATE_SERIES(@Start*@Start, @maxNumber, @Start) x
    WHERE x.value = Prime)

    SET @RowsDeleted = @@ROWCOUNT

    PRINT CONCAT('Prime:', @Start,' Deleted:', @RowsDeleted, ' Time(ms):', DATEDIFF_BIG(ms, @StartTime, SYSDATETIME()));

    SELECT @Start=MIN(Prime)
    FROM #Primes
    WHERE Prime > @Start;

    END

    GO

    SELECT COUNT(*)
    FROM #Primes
    ;

     

    Jeff Moden wrote:

    It's a bloody shame that the use of GENERATE_SERIES() defeats minimal logging during inserts.  I smell a "special" fnTallyX2 in the air.  I don't yet know if that'll make up any time yet.

    Guess what? There's a possible "fnTally 2.0" floating around somewhere (if you're cool with it, haha!): https://www.sqlservercentral.com/scripts/user-created-generate_series-function-for-older-sql-server-versions

  • p.s.  I just tested the results of Jonathan's code against a 3rd party generator and it passed.

    https://numbergenerator.org/numberlist/prime-numbers/1-1000000#!low=1&high=1000000&csv=nl&oddeven=&step=1&addfilters=prime_numbers-val-

    Jonathan's code takes about 39 seconds for a max value of 10 million without doing a slice'n'dice trick (which I've not yet tried).

    I do have to go back and check Peter Larsson's method for performance.  It's been a long time since I've done so and the old machine I tested it on way back then had only 32 bit processors running in spinning rust.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @ Jonathan,

    Sorry... My previous post above was referring to the 10 million row run prior to your additional (@Start*@Start) optimization.  I have to try your new optimization.  I didn't look to see if there were any new posts while I was working on that post.

    And, nope... I don't have a problem with the new article you published that came out 3 days ago.  Haven't read it yet, though.

    The fnTallyX2 function that I was talking about would have been only counting by odd numbers.  Not useful for much but primes.  It's kind of like the special purpose inline one I made for my Hierarchy articles.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan AC Roberts wrote:

    I just improved the process a bit. You can use GENERATE_SERIES to specify the multiples of the prime that you want to delete. Also, another thing I learnt today is you can start testing from the square of the prime you are testing (@Start*@Start) in GENERATE_SERIES(@Start*@Start, @maxNumber, @Start), I've not worked out why yet but it is in the pseudocode in the Wikipedia article on Sieve of Eratosthenes. It finds all primes less than 10 million in 10 seconds on my machine:

    SET STATISTICS TIME OFF
    DROP TABLE IF EXISTS #Primes;
    CREATE TABLE #Primes(Prime int NOT NULL PRIMARY KEY);

    DECLARE @maxNumber int = 10000000;

    SET NOCOUNT OFF; -- See how many rows are inserted into #Prime
    SET STATISTICS TIME ON; -- See how long it takes to insert rows into #Prime

    INSERT INTO #Primes(Prime)
    SELECT t.value
    FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59),(61),(67),(71),(73),(79),(83),(89),(97),(101)) t(value)
    WHERE t.value <= @maxNumber
    UNION ALL
    SELECT value
    FROM GENERATE_SERIES(103, @maxNumber, 2) -- Odd numbers
    WHERE value%6 IN(1, 5)
    AND 0 NOT IN (value%5,value%7,value%11,value%13,value%17,value%19,value%23,value%29,value%31,value%37,value%41,value%43,value%47,value%53,value%59,value%61,value%67,value%71,value%73,value%79,value%83,value%89,value%97,value%101)

    SET STATISTICS IO, TIME OFF; -- Don't output lines inside while loop
    SET NOCOUNT ON; -- Don't output lines inside while loop

    DECLARE @Start int = 103,
    @Stop int = CAST(SQRT(@maxNumber) AS int),
    @RowsDeleted int,
    @StartTime datetime2(7);

    WHILE @Start <= @Stop BEGIN

    SET @StartTime = SYSDATETIME();

    DELETE #Primes
    WHERE Prime >= @Start*@Start
    AND EXISTS(SELECT 1
    FROM GENERATE_SERIES(@Start*@Start, @maxNumber, @Start) x
    WHERE x.value = Prime)

    SET @RowsDeleted = @@ROWCOUNT

    PRINT CONCAT('Prime:', @Start,' Deleted:', @RowsDeleted, ' Time(ms):', DATEDIFF_BIG(ms, @StartTime, SYSDATETIME()));

    SELECT @Start=MIN(Prime)
    FROM #Primes
    WHERE Prime > @Start;

    END

    GO

    SELECT COUNT(*)
    FROM #Primes
    ;

    The measurements for the code above includes the table allocation as well as the WHILE loop and the final SELECT?  In the code above the STATISTICS TIME, IO only seem to be applied to the INSERT INTO #Primes(Prime)

    "you can start testing from the square of the prime you are testing (@Start*@Start)" 

    Yep that's right and I had missed this too and it's a nice improvement to the Sieve of Sundaram but not enough.  My Sieve of Sundaram code is not a general solution because it's off by 1 in some cases and it doesn't work at all if the square root of the upper limit N is an odd number.  If it was fast enough I would fix it.  On my laptop the 2 tally method takes 6 seconds to generate up to 1,000,000 and SoS took 12-14 seconds before the @Start*@Start interval improvement.  After it takes 9-11 seconds.  Still not enough to pass 2 tally afaik

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 61 through 75 (of 77 total)

You must be logged in to reply to this topic. Login to reply