Set-based query

  • I have a table with rows where the value in the Period column is 1 - these always exist – DDL/DML below

    create table tblMain (period int, rfreq int, pfreq int)

    insert tblMain (period , rfreq , pfreq )

    select 1 , 1 , 1

    union all

    select 1 , 1 , 3

    union all

    select 1 , 1 , 6

    union all

    select 1 , 3 , 3

    union all

    select 1 , 3 , 6

    union all

    select 1 , 6 , 6

    union all

    select 1 , 6 , 12

    I need to INSERT additional rows into this table with incremented period values (the rfreq and pfreq columns to have same values as for Period 1) up to and including

    a Period number stored in table, tblMaxPeriods (DDL/DML below)

     
    The tw0 tables need to JOIN on their respective rfreq columns

    CREATE TABLE tblMaxPeriods (rfreq INT , MaxPeriod INT)

    insert tblMaxPeriods (rfreq , MaxPeriod)

    select 1 , 5

    union all

    select 3 , 3

    union all

    select 6 , 2

    I could put together a loop solution but would much rather use a set-based approach but can't get my head around it

    The desired result is set out below ie those rows with period >= 2

    select * from tblMain order by period, rfreq, pfreq

    period rfreq pfreq

    -- the next 7 rows are already in the table

    1 1 1

    1 1 3

    1 1 6

    1 3 3

    1 3 6

    1 6 6

    1 6 12

    -- the rows below to be produced by required INSERT code

    2 1 1

    2 1 3

    2 1 6

    2 3 3

    2 3 6

    2 6 6

    2 6 12

    -- no period 3 or higher for rfreq value 6 because table tblMaxPeriods holds a MaxPeriod value of 2 where MaxPeriods.rfreq = 6

    3 1 1

    3 1 3

    3 1 6

    3 3 3

    3 3 6

    -- no period 4 or higher for rfreq value 3 because table tblMaxPeriods holds a MaxPeriod value of 3 where MaxPeriods.rfreq = 3

    4 1 1

    4 1 3

    4 1 6

    5 1 1

    5 1 3

    5 1 6

    In summary, I need to insert new rows into tblMain with incremented period values from 2 up to the value set in table tblMaxPeriods matching MaxPeriods.rfreq to tblmain.rfreq

    so using the data posted in tblMaxPeriods , I will INSERT rows from Period 2 to Period 5 for tblMain.rfreq value 1, Period 2 to Period 3 for tblMain.rfreq value 3 and just Period 2 for tblMain.rfreq value 6

    Thanks in advance

  • Try this :

    select n.num, m.rfreq, m.pfreq

    from tblmain m inner join MaxPeriods p

    onm.rfreq= p.rfreq

    cross join

    (

    select1 as num union all select 2 union all select 3 union all

    select 4 union all select 5 union all select 6 union all select 7

    ) as n

    wherenum<= p.MaxPeriod

    andnot exists (select * from tblmain x where x.period = n.num)

    order by n.num, m.rfreq, m.pfreq

  • Nice solution, KH, I have only a minor improvement. Instead of joining on

    (

    select 1 as num union all select 2 union all select 3 union all

    select 4 union all select 5 union all select 6 union all select 7

    ) as n

    you might want to join on a fixed Numbers table instead - especially if the required max number is larger. Such a table (with e.g 10000 numbers) can be created once and for all with the following statement:

    SELECT TOP 10000 n = IDENTITY(INT, 1, 1) INTO Numbers

    FROM

        sysobjects a1

      CROSS JOIN

        sysobjects a2

      CROSS JOIN

        sysobjects a3

    ALTER TABLE Numbers

            ADD CONSTRAINT Index_Numbers PRIMARY KEY CLUSTERED(n)

    GO

  • Depending on the maximum possible value of MaxPeriod. If it is relatively small like < 10 then there is no harm using a derived table with union all. If the Max is undetermine or dynamic, can try the Number Table Function for better efficent & performance. MVJ has comes up with a neat & fast way to do this. Refer to the link below.

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685

  • Why use a Number Table Function instead of a Number table? Using a Number table is simpler (in my opinion). I also think it is faster.

     

  • Records in Number Table required pre-insert however Number Table Function is dynamic. Altough you can pre-insert up to the maximum of the data type used (int or bigint) but anyway it up to your preference.

  • You do have a point there - you might have to store a lot of data in a Numbers table. I ran the following (simple and stupid) test, and my conclusion is that a Numbers table is about 10 times faster, probably because the Numbers table is indexed:

    select count(*) from Numbers a1 inner join Numbers a2 on a1.n = a2.n + 1 where a1.n <= 20000

    select count(*) from dbo.F_TABLE_NUMBER_RANGE(1,20000) a1 inner join dbo.F_TABLE_NUMBER_RANGE(1,20000) a2

    on a1.Number = a2.Number + 1

    But in "real life", it's a matter of taste in most cases - and I agree that if you only need, say, numbers up to 10, you could probably just as well use a derived table, as you suggested at first.

  • I believe the reason why the Numbers or Tally table solution is so much faster is two-fold.

    1. As Jesper pointed out, the Numbers table will be indexed on the "N" column... in fact, it will probably be a Clustered Primary Key which will make it very fast because, if used properly, will allow the coveted Clustered Index Seek to occur.  You can put such an index on a Table Variable, however....
    2. Table Variables are incapable of using Statistics where Temp Tables do. (Ref: http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k) Statistics are the "other" speed producing demon lurking in the background.

    It's also real easy to quickly make a monster size temporary numbers table at the beginning of a procedure and it only takes about 2 seconds on my little ol' 1.8 Ghx desktop machine (beats all other methods I've seen, so far)...

    SELECT TOP 1000000

           IDENTITY(INT,1,1) AS N

      INTO #TempTally

      FROM Master.dbo.SysComments sc1,

           Master.dbo.SysComments sc2

    Of course, you should add that primary key I was talking about... that takes about 2 additional seconds regardless of whether a Temp Table or Table Variable is used...

     ALTER TABLE #TempTally

           ADD PRIMARY KEY CLUSTERED (N)

    Since Temp Tables and Table Variables both start out in memory and spill over into TempDB (Ref: http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k), I'm thinking that using a function to create a Table Variable to hold a Numbers Table is clearly a loosing proposition compared to the use of permanent or temporary Numbers Tables.  But, to each their own

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • KH/Jesper/Jeff

     

    Thanks Guys for the response

    KH's code is great for up to 7 but the practical Max will be a couple of hundred - Jesper's additional code re the numbers table did the trick - just one problem - here in Production  SELECT INTO is turned off.

    I tried using   a table Variable but am missing something on the INSERT syntax

    DECLARE @Numbers TABLE

           (

           n  INT

           PRIMARY KEY CLUSTERED

                (n)

          &nbsp

      INSERT @Numbers (n)

      SELECT TOP 10000 AS n 

      FROM    sysobjects a1

      CROSS JOIN sysobjects a2

      CROSS JOIN sysobjects a3

    Thanks for your patience

  • >> SELECT INTO is turned off.

    Get the DBA to pre-create the fixed number table with the required number of records. If it is not possible, try the number table function that i posted

  • "SELECT INTO" is actually a setting called "SELECT INTO/BULK COPY" and it does NOT disable the ability to use the SELECT/INTO syntax.  It just means that it will be logged.  Did you try SELECT/INTO?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Or use the number table function to populate the Numbers table (I that's what you meant, KH, I apologize...)

  • Thx KH - solving one thing leads to another prob

    tblMain has an additional column startdate which I need to populate with the startdate from the row with previous Period Number + the freq value (of months) seel below

    create table dbo.tblMain (period int, rfreq int, pfreq int , startdate datetime)

    insert tblMain (period , rfreq , pfreq ,startdate  )

    select 1 , 1 , 1 , '01-Apr-2004'

    union all

    select 1 , 1 , 3 , '01-Apr-2004'

    union all

    select 1 , 1 , 6 , '01-Apr-2004'

    union all

    select 1 , 3 , 3 , '01-Apr-2004'

    union all

    select 1 , 3 , 6 , '01-Apr-2004'

    union all

    select 1 , 6 , 6 , '01-Jan-2004'

    union all

    select 1 , 6 , 12, '01-Jan-2004'

    the maxperiods table is as before in this post and Numbers table exists and is populated

    I tried a couple of things but both failed see below in bold

    select n.num, m.rfreq, m.pfreq --,

    --DATEADD(mm, m.rfreq, m2.StartDate) -- JOIN m2 returned no rows

    -- DATEADD(mm, m.rfreq, SELECT StartDate FROM tblmain WHERE rfreq = m.rfreq AND Period = m.Period - 1) -- only works for Period 2 - NULLS fir the rest

    from tblmain m

    inner join MaxPeriods p on m.rfreq = p.rfreq

    -- inner join tblmain m2 on m.rfreq = p.rfreq and m2.period = m.period + 1

    cross join Numbers as n

    where num <= p.MaxPeriod

    and not exists (select * from tblmain x where x.period = n.num)

    order by n.num, m.rfreq, m.pfreq

    Hope this makes sense and thanks again

  • Oops - sorry should read (modifications in italics)

    select n.num, m.rfreq, m.pfreq --,

    --DATEADD(mm, m.rfreq, m2.StartDate) -- JOIN m2 returned no rows

    -- DATEADD(mm, m.rfreq, (SELECT StartDate FROM tblmain WHERE rfreq = m.rfreq AND pfreq = m.pfreq AND Period = m.Period - 1)) -- only works for Period 2 - NULLS fir the rest

    from tblmain m

    inner join MaxPeriods p on m.rfreq = p.rfreq

    -- inner join tblmain m2 on m.rfreq = m2.rfreq and m.pfreq = m2.pfreq and m2.period = m.period - 1

    cross join Numbers as n

    where num <= p.MaxPeriod

    and not exists (select * from tblmain x where x.period = n.num)

    order by n.num, m.rfreq, m.pfreq

  • >> Or use the number table function to populate the Numbers table (I that's what you meant, KH, I apologize...)

    You are right. Using a number table is definately faster than other methods. For performance and huge sets of numbers required, the number table is recommended.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply