Performance: Every nth item

  • @ Jeff:

    Just need RowNum on the middle select don't you (as opposed to *)?:

    ...

    SELECT t.*

    FROM dbo.Test t,

    (

    SELECT TOP (@PageSize) RowNum

    FROM dbo.Test WITH (NOLOCK)

    WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum

    FROM dbo.Test

    ORDER BY RowNum)

    ORDER BY CustID

    d

    WHERE t.RowNum = d.RowNum

    It's a shame that this is the best solution for paging in TSQL (and it is) - I wish there was a linear-performing algorithm. The larger @Skip gets the more the performance suffers...

    Just curious - does anyone know if either Oracle or MySQL have a better solution?

    Regards,

    Jacob

  • Sure, you could sub RowNum for *... not sure it matters for performance here as it might in other places. * allows it to pick the best index it can here.

    Yep... as the pages progress, it takes longer... still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds. The only way I can think of to get a true linear reaction is to keep a copy of the table with a properly numbered column. Using the Running Count method in the following URL, it would only take 7 seconds or so...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    But, that doesn't solve the problem of which order you want to see things in. Could do that with several abbreviated sister tables... They could be created using SELECT/INTO and IDENTITY in a matter of seconds.

    Still, it's not a real time update and unless you use "twin sister" tables and repoint either a view or a synonym, users will see the delay.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/8/2008)


    Sure, you could sub RowNum for *... not sure it matters for performance here as it might in other places. * allows it to pick the best index it can here.

    Yep... as the pages progress, it takes longer... still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds. The only way I can think of to get a true linear reaction is to keep a copy of the table with a properly numbered column. Using the Running Count method in the following URL, it would only take 7 seconds or so...

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    But, that doesn't solve the problem of which order you want to see things in. Could do that with several abbreviated sister tables... They could be created using SELECT/INTO and IDENTITY in a matter of seconds.

    Still, it's not a real time update and unless you use "twin sister" tables and repoint either a view or a synonym, users will see the delay.

    Keep in mind that if you have a "properly numbered table" (meaning - no gaps) - your query becomes quite a bit easier:

    SELECT t.*

    FROM dbo.Test t,

    (

    SELECT TOP (@PageSize) RowNum

    FROM dbo.Test WITH (NOLOCK)

    WHERE RowNum >@Skip

    ORDER BY CustID)

    d

    WHERE t.RowNum = d.RowNum

    and THAT should be damn near linear....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • @ Jeff: Is this

    ...still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds.

    a real-world figure? I put together a quick-and-dirty test on one of our less capable boxes based on your example code. DDL:

    CREATE TABLE [dbo].[Test](

    [RowNum] [int] IDENTITY(1,1) NOT NULL,

    [CustID] [int] NOT NULL,

    [EmailAddress] [varchar](255) NOT NULL,

    [FirstName] [varchar](50) NOT NULL,

    [LastName] [varchar](50) NOT NULL,

    CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED

    (

    [RowNum] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    go

    CREATE NONCLUSTERED INDEX IX_Test_CustID ON dbo.Test

    (

    CustID

    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    go

    Test code:

    set statistics time on

    --===== Declare the local variables

    DECLARE @PageSize INT --How many rows to appear per page

    DECLARE @PageNum INT --What page number to appear

    DECLARE @Skip INT --Working variable for where to start for page

    --===== Set the local variables for pagesize and page

    -- PageSize and PageNum would be parameters in a stored proc

    SET @PageSize = 100

    SET @PageNum = 40

    SET @Skip = @PageSize*@PageNum

    SELECT t.*

    FROM dbo.Test t,

    (

    SELECT TOP (@PageSize) RowNum

    FROM dbo.Test WITH (NOLOCK)

    WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum

    FROM dbo.Test

    ORDER BY RowNum)

    ORDER BY CustID

    ) d

    WHERE t.RowNum = d.RowNum

    set statistics time off

    and populated it with just under 3 million sanitized rows from one of our prod DBs. Ran your test query against it a few times whilst varying @PageNum. Totally non-scientific results (no buffer flushes, manual runs eyeballing STATISTICS TIME results, 5 runs per test):

    @PageSize = 100, @PageNum = 40: avg 1.2 sec

    @PageSize = 100, @PageNum = 400: avg 41.4 sec

    @PageSize = 100, @PageNum = 4000: 6123 sec (I only ran this one once 😛 )

    Granted this is an underspecced server with 2GB RAM, but still...

    Regards,

    Jacob

  • @ Matt:

    and THAT should be damn near linear....

    Yep - ran the same test as above in my reply to Jeff, same data, same server except using your test code. Avg execution time even at @PageNum = 8000 was < 45 ms. Certainly linear and at such small response times may as well be constant time.

    Depends on the actual task at hand but sometimes it might be better to materialise the inner select query with a contiguous RowNum sequence as Jeff suggests (especially with his quick Running Count method) and use your approach to page through it. I guess it depends on the expected ratio of initial searches to paging requests.

    Regards,

    Jacob

  • Wow... lemme say that again... WOW! SQL Server 2005 sucks for this! Yeah... it was a real world example.... In the real world, we have SQL Server 2000 and we had to use dynamic SQL because there is no programmable TOP in 2k (as all of you know). Our server is the Enterprise Edition running on 16 3.2 GHz CPU's and enough memory to choke a horse with some awesome harddisks.

    I ran the TOP() code on my (humble) desktop machine at the 4000th 100 line page and never did let it finish. I run dynamic sql using just plain ol' EXEC and concatenating the values for page size, etc in, and BLAM!!! It runs in only a half a second on my desktop... talk about parameter sniffing... 9000th page takes 1.2 seconds. Of course, my desktop box is no match for the server at work.

    Also, I made a couple of mistakes modifying the code for the forum... I left CustID in the ORDER BY instead of using RowNum... And, I used SELECT * instead of RowNum.

    Here's the code I just used...

    [font="Courier New"]

    set statistics time on

    --===== Declare the local variables

    DECLARE @PageSize INT --How many rows to appear per page

    DECLARE @PageNum  INT --What page number to appear

    DECLARE @Skip     INT --Working variable for where to start for page

    DECLARE @SQL      VARCHAR(8000)

    --===== Set the local variables for pagesize and page

         -- PageSize and PageNum would be parameters in a stored proc

        SET @PageSize = 100

        SET @PageNum  = 9000

        SET @Skip    = @PageSize*@PageNum

    --SELECT t.RowNum

    --   FROM dbo.Test t,

    --        (

    --         SELECT TOP (@PageSize) RowNum

    --           FROM dbo.Test WITH (NOLOCK)

    --          WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum

    --                                 FROM dbo.Test

    --                                ORDER BY RowNum)

    --          ORDER BY RowNum

    --         ) d

    --  WHERE t.RowNum = d.RowNum

    SET @SQL = '

    SELECT t.RowNum

       FROM dbo.Test t,

            (

             SELECT TOP '+STR(@PageSize)+' RowNum

               FROM dbo.Test WITH (NOLOCK)

              WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum

                                     FROM dbo.Test

                                    ORDER BY RowNum)

              ORDER BY RowNum

             ) d

      WHERE t.RowNum = d.RowNum'

    EXEC (@SQL)

    set statistics time off[/font]

    And, get this... on my box, running the same dynamic code on 2k instead of 2k5 only took about half the time... what did they do to the bloody optimizer in 2k5? And, yes, I'm running sp2 on 2k5 although I don't have anything beyond that (was holding out for sp3).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can verify your results: changing to the dynamic SQL version runs at 1.2 sec at @PageNum = 8000 on my test data/server.

    Now to speculate on why...

    I can't compare the query plans directly atm as the server is now busy, but looking at the dynamic version it's similar to the other plan - with one major difference: both want to do a Left Anti Semi Join, but the normal version uses a Nested Loops operator for this whereas the dynamic version uses a Merge Join.

    I wonder if it's the variable-based TOP implementation?

    Regards,

    Jacob

  • Jacob Luebbers (4/8/2008)


    I can verify your results: changing to the dynamic SQL version runs at 1.2 sec at @PageNum = 8000 on my test data/server.

    Now to speculate on why...

    I can't compare the query plans directly atm as the server is now busy, but looking at the dynamic version it's similar to the other plan - with one major difference: both want to do a Left Anti Semi Join, but the normal version uses a Nested Loops operator for this whereas the dynamic version uses a Merge Join.

    I wonder if it's the variable-based TOP implementation?

    Regards,

    Jacob

    Jacob - you hit it on the nose. Forcing the join hint makes the difference.

    I was getting there from the other side, in that the variable top(@variable) in the subquery was throwing things off, horribly.

    interestingly enough - with the hint in place - 2005 actually beats the dynamic SQL. Not much consolation, I know...

    set statistics time on

    --===== Declare the local variables

    DECLARE @PageSize INT --How many rows to appear per page

    DECLARE @PageNum INT --What page number to appear

    DECLARE @Skip INT --Working variable for where to start for page

    DECLARE @sql VARCHAR(8000)

    --===== Set the local variables for pagesize and page

    -- PageSize and PageNum would be parameters in a stored proc

    SET @PageSize = 100

    SET @PageNum = 9000

    SET @Skip = 900000

    SELECT t.RowNum

    FROM dbo.Test t,

    (

    SELECT TOP (@PageSize) test.RowNum

    FROM dbo.Test WITH (NOLOCK)

    left outer merge join (SELECT TOP(@skip) RowNum

    FROM dbo.Test WITH (NOLOCK)

    ORDER BY RowNum) r

    on test.rownum=r.rownum

    where r.rownum is null

    ORDER BY test.RowNum

    ) d

    where t.RowNum = d.RowNum

    set statistics time off

    print replicate('=',50)

    set statistics time on

    SET @sql = '

    SELECT t.RowNum

    FROM dbo.Test t,

    (

    SELECT TOP '+STR(@PageSize)+' RowNum

    FROM dbo.Test WITH (NOLOCK)

    WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum

    FROM dbo.Test

    ORDER BY RowNum)

    ORDER BY RowNum

    ) d

    WHERE t.RowNum = d.RowNum'

    EXEC (@SQL)

    set statistics time off

    2005 solution: 1063 ms

    2000 solution: 670+670=1340 ms

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ouch! Yep, that does the trick. I wonder if there will ever be a situation where the merge join would be suboptimal. Thinking about it, probably only if there were a small number of rows in the table - probably safe to leave it on always if not using the dynamic version.

    Regards,

    Jacob

  • one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4/8/2008)


    Jacob - you hit it on the nose. Forcing the join hint makes the difference.

    Crud... SQL Server is getting more like Oracle every day 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Miller (4/8/2008)


    one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe

    :hehe: How'z 'bout taday 😛

    http://extras.sqlservercentral.com/prettifier/prettifier.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/9/2008)


    Matt Miller (4/8/2008)


    one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe

    :hehe: How'z 'bout taday 😛

    http://extras.sqlservercentral.com/prettifier/prettifier.aspx

    [font="Courier New"]SET STATISTICS time ON

    --===== Declare the local variables

    DECLARE @PageSize INT --How many rows to appear per page

    DECLARE @PageNum  INT --What page number to appear

    DECLARE @Skip     INT --Working variable for where to start for page

    DECLARE @SQL      VARCHAR(8000)

    --===== Set the local variables for pagesize and page

         -- PageSize and PageNum would be parameters in a stored proc

        SET @PageSize = 100

        SET @PageNum  = 9000

        SET @Skip    = 900000

    SELECT t.RowNum

       FROM dbo.Test t,

            (

             SELECT TOP (@PageSize) test.RowNum

               FROM dbo.Test WITH (NOLOCK)

              LEFT OUTER merge JOIN (SELECT TOP(@skip) RowNum

                                     FROM dbo.Test WITH (NOLOCK)

                                    ORDER BY RowNum) r

                                    ON test.rownum=r.rownum

                    WHERE r.rownum IS NULL

              ORDER BY test.RowNum

             ) d

    WHERE t.RowNum = d.RowNum

    SET STATISTICS time OFF

    PRINT REPLICATE('=',50)

    SET STATISTICS time ON

    SET @SQL = '

    SELECT t.RowNum

       FROM dbo.Test t,

            (

             SELECT TOP '+STR(@PageSize)+' RowNum

               FROM dbo.Test WITH (NOLOCK)

              WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum

                                     FROM dbo.Test

                                    ORDER BY RowNum)

              ORDER BY RowNum

             ) d

      WHERE t.RowNum = d.RowNum'

    EXEC (@SQL)

    SET STATISTICS time OFF

    [/font]

    Cool! The last time I tried it - I got nothing but, well, markup, and the code available couldn't be copied, etc... It was either ugly when posted or ugly when copied out of the post. Both seem to be working now though!

    Must be those fancy changes Steve made to the rendering of code. Thanks!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (4/9/2008)


    Matt Miller (4/8/2008)


    Jacob - you hit it on the nose. Forcing the join hint makes the difference.

    Crud... SQL Server is getting more like Oracle every day 😛

    You mean the optimizer gets dumber and dumber? grumble grumble grumble.....

    For what it's worth - simply throwing

    OPTION (RECOMPILE)

    on the end of your version of the 2005 query would also have the same effect (speeding it up and surpassing the 2000 version). It just seems like another place parameter sniffing rears its ugly little head.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Dumber and dumber is right... they should have left it alone... 2k had a fine optimizer... well, the idiocy started at SP4 which is why I still run sp3a on my desktop.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply