Help us tune a query

  • Christopher Stobbs (7/14/2009)


    but creating the clustered Index on #TopScores after the actual insert?

    Would this help at all?

    If the cluster goes on after the insert then first SQL has to insert 100 000 rows into a heap, then it has to reorganise all 100 000 rows into the cluster. It's nonclustered indexes that are sometimes better built after the insert (and that has to be tested for each case, I've seen both ways). Build the cluster after and you;re forcing SQL to almost do twice the work

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I had a feeling I should I have done some testing before opening my big mouth.

    Thanks for clearing that up for Gail, I must have missunderstood completely.

    🙂

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Hi Mel,

    Was this section of code still needed?

    PointsCategory = ISNULL(NULL, PointsCategory)

    I thought it was decided that the catch -all approach would be performance problem and that perhaps calling a "Master" stored procedure to then work out which procedure to call, was going to be a better option?

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Chris,

    When I tried both options, there wasn't actually much difference in the performance of the two.

    Mel HarbourProject ManagerRed Gate Software

  • Mel Harbour (7/14/2009)


    Chris,

    When I tried both options, there wasn't actually much difference in the performance of the two.

    Thing it, it may give us the option to tweak indexes better. With the ISNULL there, SQL's unlikely to seek on an index on PointsCategory, even if one exists.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/14/2009)


    Christopher Stobbs (7/14/2009)


    but creating the clustered Index on #TopScores after the actual insert?

    Would this help at all?

    If the cluster goes on after the insert then first SQL has to insert 100 000 rows into a heap, then it has to reorganise all 100 000 rows into the cluster. It's nonclustered indexes that are sometimes better built after the insert (and that has to be tested for each case, I've seen both ways). Build the cluster after and you;re forcing SQL to almost do twice the work

    Building the clustered index afterwards does cut down on page splits if the inserts are in no particular order. Building the index afterwards would also allow for a very high speed SELECT/INTO. I wouldn't worry about recompiles on this because they're likely going to happen just do to changes in the data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jack Corbett (7/14/2009)


    Was Jeff referring to the Order By in the ROW_NUMBER() function?

    Yes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/14/2009)


    Building the clustered index afterwards does cut down on page splits if the inserts are in no particular order.

    But on a single statement insert the rows will be sorted before they're inserted if they're inserted into a table with a cluster, so there won't be page splits.

    Building the index afterwards would also allow for a very high speed SELECT/INTO.

    Cluster before insert.

    CREATE TABLE #Test (

    ID INT,

    SomeString VARCHAR(50)

    )

    CREATE CLUSTERED INDEX idx_Something ON #Test (ID)

    INSERT INTO #Test (ID, SomeString)

    SELECT TOP (250000) a.column_id + b.column_id, a.name

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    /*

    Table 'Worktable'. Scan count 1, logical reads 2459, physical reads 0

    Table 'syscolpars'. Scan count 2, logical reads 15, physical reads 0

    SQL Server Execution Times:

    CPU time = 5922 ms, elapsed time = 6301 ms.

    */

    Cluster after insert.

    CREATE TABLE #Test (

    ID INT,

    SomeString VARCHAR(50)

    )

    INSERT INTO #Test (ID, SomeString)

    SELECT TOP (250000) a.column_id + b.column_id, a.name

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    CREATE CLUSTERED INDEX idx_Something ON #Test (ID)

    /*

    Table 'Worktable'. Scan count 1, logical reads 2459, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 12421 ms, elapsed time = 12740 ms.

    (250000 row(s) affected)

    SQL Server parse and compile time:

    CPU time = 4485 ms, elapsed time = 4530 ms.

    Table '#Test___________________________________________00000000000D'. Scan count 3, logical reads 755, physical reads 0

    SQL Server Execution Times:

    CPU time = 11172 ms, elapsed time = 5905 ms.

    */

    Cluster created after a SELECT INTO.

    SELECT TOP (250000) a.column_id + b.column_id AS ID, a.name AS SomeString INTO #Test

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    CREATE CLUSTERED INDEX idx_Something ON #Test (ID)

    /*

    Table 'Worktable'. Scan count 1, logical reads 2459, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 422 ms, elapsed time = 586 ms.

    (250000 row(s) affected)

    SQL Server parse and compile time:

    CPU time = 4359 ms, elapsed time = 4434 ms.

    Table '#Test_____________________________________________00000000000E'. Scan count 3, logical reads 984, physical reads 0

    SQL Server Execution Times:

    CPU time = 10751 ms, elapsed time = 5851 ms.

    */

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (7/14/2009)


    Jack Corbett (7/14/2009)


    Was Jeff referring to the Order By in the ROW_NUMBER() function?

    Yes.

    Ok, misunderstood you earlier. Are you suggesting adding a second temp table, one to store the pre-aggregated then one to store the data with the rownumber, or just moving where the temp table is used to a bit earlier?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/14/2009)


    Jeff Moden (7/14/2009)


    Jack Corbett (7/14/2009)


    Was Jeff referring to the Order By in the ROW_NUMBER() function?

    Yes.

    Ok, misunderstood you earlier. Are you suggesting adding a second temp table, one to store the pre-aggregated then one to store the data with the rownumber, or just moving where the temp table is used to a bit earlier?

    Just moving where the temp table is used to a bit earlier. Haven't played with it yet so I don't know for sure but it seems that an ORDER BY on a SUM can be a bit expensive. Certainly could be wrong, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GilaMonster (7/14/2009)


    But on a single statement insert the rows will be sorted before they're inserted if they're inserted into a table with a cluster, so there won't be page splits.

    Ach... you're right on that of course. I keep thinking about the big ol' test table I wrote that frags the heck out of the clustered index by doing multiple inserts. Still, according to profiler, inserting into a clustered index is pretty expensive... it would appear that not everything shows up in STATISTICS IO or TIME... I wonder why such a disparity...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting. I tested with profiler and the durations were much the same as I saw with stats time. The insert into with the cluster existing was faster by a factor of 3.

    This is SQL 2008 SP1. It might be because of the minimal logging enhancements in 2008. But then this is tempDB and that's got different logging rules anyway.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Something else is going on as well. Look at the difference in durations on your machine compared to mine. The machine I have at work is just a desktop dual core box running 2k5 sp3 Dev Ed. I'm not sure but I believe it's running at something like 2.8Gig and has 2Gig of ram.

    Heh... more proof that the words "It Depends" live on and on. 😛

    Lemme try it on my home machine (1.8Ghz single core p4) and see what I get.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's the trace from my home machine...

    And, here's the code I'm running... All I did was take your code, remove your comments, add mine, make each query dump to a different temp table, and add a couple of GO's to make it all come out pretty on the profiler...

    --====================================================

    GO

    --===== Cluster before insert.

    CREATE TABLE #Test1 (

    ID INT,

    SomeString VARCHAR(50)

    )

    CREATE CLUSTERED INDEX idx_Something ON #Test1 (ID)

    INSERT INTO #Test1 (ID, SomeString)

    SELECT TOP (250000) a.column_id + b.column_id, a.name

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    GO

    --===== Cluster after insert.

    CREATE TABLE #Test2 (

    ID INT,

    SomeString VARCHAR(50)

    )

    INSERT INTO #Test2 (ID, SomeString)

    SELECT TOP (250000) a.column_id + b.column_id, a.name

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    CREATE CLUSTERED INDEX idx_Something ON #Test2 (ID)

    GO

    --===== Cluster created after a SELECT INTO.

    SELECT TOP (250000) a.column_id + b.column_id AS ID, a.name AS SomeString INTO #Test3

    FROM master.sys.columns a CROSS JOIN master.sys.columns b

    CREATE CLUSTERED INDEX idx_Something ON #Test3 (ID)

    GO

    --====================================================

    DROP TABLE #Test1, #Test2, #Test3

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mel Harbour (7/13/2009)


    The UserPoints table currently has about 185,000 rows. Execution plans are attached.

    Mel... I'm trying to setup the proverbial parallel universe for testing and need just a wee bit more info on this... how many unique user ID's does that 185 k rows cover and what's the date range on the rows?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 103 total)

You must be logged in to reply to this topic. Login to reply