Performance Tuning 40,000,000

  • All,

    I involved in a fine tuning work. The table has around 40,000,000 records and the table one composite clustered index as below.

    use emp_eql

    go

    insert into combined_stats_agg

    SELECT mf_cd+2000 as mf_cd, r.mkt_id,

    r.mf_dt, c.mf_nm, AVG(r.mf_val) as mf_val

    FROM

    mj..mj_reg_stats r,

    mj..stat_type c

    WHERE r.mf_cd = c.mf_cd

    GROUP BY r.mf_cd, c.mf_nm, r.mkt_id, r.mf_dt

    Table Structure: mj_reg_stats

    create table mj_reg_stats

    (

    fund_id int,

    mf_dt datetime,

    mkt_id int,

    mf_cd int,

    mf_val float

    cncy_cd char(3)

    )

    clustered index fund_id, mf_dt, mkt_id, mf_cd

    mj_reg_stats - 40,000,000 records

    stat_type - 52 records

    If i execute the above query, it is running for 30 minutes.

    Appreciate your help to tune this query.

    karthik

  • 1. Use ANSI joins: "FROM T1, T2 WHERE ..." should be changed to

    "FROM T1 JOIN T2 ON ..."

    2. Do insert in batches. Try 2,000,000 rows at the time. You can play with batch size to find the optimal one.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Output of the select query is

    (153148 rows affected)

    karthik

  • I have created a nonclustered index on mf_cd column..I see some improvement. yes...the execution time is reduced from 30 minutes to 15 minutes...

    karthik

  • is there any other approach to get the AVG() values?

    how should i reduce the execution time in seconds? give me some tips...

    karthik

  • Do you have an index on mf_cd in mj..stat_type?

    You can try using "OPTION (MAXDOP 1)" and see if you have any performance gain (just add it at the end of your query).

    Can you attach the execution plan (its better to be real one - not estimated). Also, statistcs would be helpsull as well. To get time and IO statistics you can use:

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    statistic will be diplayed in Messages tab if you use Result-To-Grid option.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • When last time statistic was updated on the tables?

    UPDATE STATISTICS table_or_indexed_view_name

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Add non-clustered index on mf_cd in mj_reg_stats table!

    If you check the current query plan, I believe you will see index scan there. Adding the above index together with the index advised in prevuious post, would hopefully result in index seek and therfore in much more acceptable performance.

    Ther is another tip. Drive your queries from smaller tables - it helps some time:

    insert into combined_stats_agg

    SELECT r.mf_cd+2000 as mf_cd, r.mkt_id,

    r.mf_dt, c.mf_nm, AVG(r.mf_val) as mf_val

    FROM mj..stat_type c

    JOIN mj..mj_reg_stats r

    ON r.mf_cd = c.mf_cd

    GROUP BY r.mf_cd, c.mf_nm, r.mkt_id, r.mf_dt

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (8/11/2010)


    When last time statistic was updated on the tables?

    UPDATE STATISTICS table_or_indexed_view_name

    It ran last friday only in DEV.

    karthik

  • Please provide all table defs including all indexes on those tables as well as the actual execution plan for the query. You've been around long enough to know how to ask questions properly, right??

    As a side note: since there is no index that actually supports your join condition you might want to consider preaggregating the data and join the result to your second query. (You'd still benefit from a covering index on r.mf_cd, r.mkt_id, r.mf_dt and r.mf_val though...)

    Also, I'm guessing you're aware of getting unprecise results due to the aggregation of a float data type.

    And, finally, I'm guessing you're aware of getting inconsistant mf_cd values between the tables combined_stats_agg and mj_reg_stats (meaning, the same value will have a different meaning when selecting from the two tables). I truly hope the columns aren't named the same.... :pinch:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • You could try this and see what it does:

    insert into combined_stats_agg

    SELECT r.mf_cd+2000 as mf_cd, r.mkt_id,

    r.mf_dt, c.mf_nm, r.mf_val

    FROM (

    SELECT mf_dt, mkt_id, mf_cd, AVG(mf_val) AS mf_val

    FROM mj..mj_reg_stats

    GROUP BY mf_dt, mkt_id, mf_cd

    ) AS r

    INNER JOIN mj..stat_type c ON r.mf_cd = c.mf_cd

    Scott Pletcher, SQL Server MVP 2008-2010

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply