SELECT Slow in production

  • I'm self-trained in SQL as a part of my job and trying to do remote support of a system in a developing country, i.e. no other support.  Their application has a procedure in SQL Server 2014 - Express that runs for 30 minutes once a day to select about 4000 rows from a 27,000,000 row table.  On my notebook it runs in 30 seconds with a table of 2,300,000 rows.  This piece of the procedure by itself takes almost all of the 30 minutes execution time.

    DECLARE @DateFrom AS DATETIME;

    Set @DateFrom = (select getdate()-2day);

    DECLARE @DateTo AS DATETIME;

    Set @DateTo = (select getdate()+1day);

    INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A)

    (Select Level_id, Monitoring_datetime, VALUE_NUMERIC from dmhdb.dbo.Data Where VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo);

    GO

    Update DMHDB.dbo.CalcEvap set CalcEvap.B = D.VALUE_NUMERIC

    FROM CalcEvap Inner Join Data D

    ON CalcEvap.Monitoring_datetime = D.MONITORING_DATETIME and CalcEvap.LEVEL_ID = D.LEVEL_ID

    Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

    and three more similar UPDATEs for other Variable_IDs.

    The primary key for the Data table is Level_id, Monitoring_datetime, & Variable_id and for the CalcEvap table PK is Level_id & Monitoring_datetime.

    What could I do to get it to run faster or find where the bottleneck is?  The system is running at 10% CPU disk usage, 30% memory usage, Disk queue length 0.00

  • We need the full DDL for all tables involved, including indexes, but for now:

    1 How is the dmhdb.dbo.Data clustered?

    2 The syntax on your update is not best practice and you may be causing additional join(s) when it runs.  Try this instead:

    Update CE --<<-- it's CRITICAL when doing an UPDATE with a join to reference the table alias here --<<--

    set CE.B = D.VALUE_NUMERIC

    FROM DMHDB.dbo.CalcEvap CE

    Inner Join Data D

    ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID

    Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ian 29141 wrote:

    ...The primary key for the Data table is Level_id, Monitoring_datetime, & Variable_id and for the CalcEvap table PK is Level_id & Monitoring_datetime...

    Is that the order of the columns in the indexes?  Are there any other indexes besides the primary keys?

    One problem I see if this is the order of the columns and there are no other indexes, is the WHERE clauses on the Data table do not use Level_id, so this will always have to do a full table scan to solve "Where VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo"

    Is it possible to change the order of the columns in the index to be Monitoring_datetime, Variable_id, Level_id?

    For any further advice we'd probably have to look at the execution plan of these statements.

  • What is your disk I/O?

    One thing that can help is to change your "between" to a > and <.  Between comparisons have been known to have performance issues.

    On top of that, I'd start by looking at your execution plan to determine where the bottleneck is.  Also, make sure your statistics are up to date.  This doesn't appear to be a stored procedure, but an ad-hoc query; changing it to a stored procedure (especially if it is going to be run multiple times) may offer performance boost, but be careful of parameter sniffing problems.

    if you have too many indexes that can slow inserts/updates down, and if you have duplicate or unused indexes, it can slow everything down.

    If you can archive some of the data in "data", you will get performance benefits as well as you will have less data to look through.  Depending on the setup, you may even get a performance boost by making temp tables/table variables and splitting the data up from "data" by variable ID.

    The first step is to figure out which part of the query is "slow" and needs tuning.

    Also, looking at that query, the UPDATE will not run; it will error out because @DateFrom and @DateTo are undefined.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Assuming the PKs are the clustering keys [*], for best performance you'd need to change the order of the keys in the Data table.

    If you always (with perhaps only extremely rare exceptions) specify a Variable_ID = <value> when reading the table, then the table should be clustered on:

    ( Variable_ID, Monitoring_datetime, Level_id )

    If you always specify Monitoring_datetime and don't always specify Variable_ID, then the clustering key should be:

    ( Monitoring_datetime, Variable_ID, Level_id )

    For the CalcEvap table, the existing clustering key of ( Level_id, Monitoring_datetime) is ok.

    [*] A PK does not have to be the clustering key in SQL Server, although by default it will be if you don't explicitly specify a different clustering key and don't explicitly specify NONCLUSTERED when creating the PK.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • The order of the columns in the index can not be changed.  The PK is clustered and there are indexes for Level_id, Monitoring_datetime, and for variable_ID.

  • First of all, you define some variables and then do an insert and then you break the batch with a GO.  That should mean that when you get to the UPDATE after that, you should be getting errors about how you need to declare the @DateFrom and @DateTo variables in the criteria for  that update.  You need to fix that first.

    After that, you're using local variables in your WHERE clause whose values aren't known at run time.  Considering how long it's taking, my next shot would be to use OPTION(RECOMPILE) on the queries and see if that helps.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The run took just as long when using the CE alias in the update.

  • Which is the slow portion of that query?  Are ALL of the UPDATEs and INSERTs slow and taking roughly the same time to complete?

    Is this a "run-once" query or is this run multiple times?

    is this an ad-hoc query or a stored procedure? (thinking ad-hoc as you have a "GO" in your statement).

    If you run this query a lot, you may want to put it in a stored procedure which could offer performance benefits.  If it is a "run-once" query, optimizing it may not be worth the time and effort.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • ian 29141 wrote:

    The run took just as long when using the CE alias in the update.

    If you're running what you posted, it's not actually running.  Like I said in my previous post, you have a GO batch separator in the code that's killing the followup use of variables.

    It would be helpful if you posted the actual code you're using.  I don't want to assume, at this point, that merely removing the GO is what you're actually running.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the guidance in cleaning up the SQL.  It is attached but I haven't been able to run it because the connection to the remote system is down right now.

  • Uploading as txt

    Attachments:
    You must be logged in to view attached files.
  • People tend to be nervous of opening attached files. Can you post it into a code block as you did in your first post?

  • USE [DMHDB]
    SET ANSI_NULLS ON
    SET QUOTED_IDENTIFIER ON

    IF EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'CalcEvap')
    DROP TABLE CalcEvap;

    CREATE TABLE [dbo].[CalcEvap](
    [ID] [decimal](38, 0) IDENTITY(1,1) NOT NULL,
    [LEVEL_ID] [int] NOT NULL,
    [MONITORING_DATETIME] [datetime] NOT NULL,
    [EvapVar_ID] int,
    --A = Air temperature 2
    [A] Float,
    --B = Air pressure 1
    Float,
    --C = Relative humidity 12
    [C] Float,
    --D = Net radiation 9
    [D] Float,
    --E = Wind speed 18
    [E] Float,
    --V = Evap 22
    [V] decimal(18,6),
    --CV = Evap as characters
    [CV] nvarchar(max)
    ,
    CONSTRAINT [PK_CalcEvap_1] PRIMARY KEY CLUSTERED
    (
    [LEVEL_ID] ASC,
    [MONITORING_DATETIME] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]


    -- Set the date range
    DECLARE @DateFrom AS DATETIME;
    Set @DateFrom = (select getdate()-2day)
    --Set @DateFrom = '2016-09-26' --used for local testing
    DECLARE @DateTo AS DATETIME;
    Set @DateTo = (select getdate()+1day)
    --SET @DateTo = '2016-09-29' --used for local testing

    --Load the temp table with all the valid air temperature values
    INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A)
    (Select Level_id, Monitoring_datetime, VALUE_NUMERIC from Data Where VARIABLE_ID = 2 and monitoring_datetime between @DateFrom and @DateTo);

    --The CalcEvap table will have only about 4000 rows

    -- Now add the air pressure (1) for those dates
    Update CE set CE.B = D.VALUE_NUMERIC
    FROM CalcEvap CE Inner Join Data D
    ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID
    Where D.VARIABLE_ID = 1 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

    -- Now add the relative humidity (12) for those dates
    Update CE set CE.C = D.VALUE_NUMERIC
    FROM CalcEvap CE Inner Join Data D
    ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID
    Where D.VARIABLE_ID = 12 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

    -- Now add the Net radiation (9) for those dates
    Update CE set CE.D = D.VALUE_NUMERIC
    FROM CalcEvap CE Inner Join Data D
    ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID
    Where D.VARIABLE_ID = 9 and D.MONITORING_DATETIME between @DateFrom and @DateTo ;

    -- Now add the wind (18) for those dates
    Update CE set CE.E = D.VALUE_NUMERIC
    FROM CalcEvap CE Inner Join Data D
    ON CE.Monitoring_datetime = D.MONITORING_DATETIME and CE.LEVEL_ID = D.LEVEL_ID
    Where D.VARIABLE_ID = 18 and D.MONITORING_DATETIME between @DateFrom and @DateTo;

    --Now clean up the table by removing the invalid readings
    Delete from CalcEvap where A not between -10 and 50; --A = Air temperature 2
    Delete from CalcEvap where B not between 500 and 1500; --B = Air pressure 1
    Delete from CalcEvap where C not between 5 and 110;--C = Relative humidity 12
    Delete from CalcEvap where D not between -500 and 2000;--D = Net radiation 9
    Delete from CalcEvap where E not between 0 and 100;--E = Wind speed 18

    Update DMHDB.dbo.CalcEvap set DMHDB.dbo.CalcEvap.V =
    ((0.408*((4098*POWER(0.6108,((17.27*A)/(A+237.3))))/(POWER((A+237.3),2)))*((D*600)/1000000))+((0.0000665*B)*(6.25/(A+273))*(E*0.748)*((Power(0.6108,((17.27*A)/(A+237.3))))-((Power(0.6108,((17.27*A)/(A+237.3)))*(C/100)))))) /(((4098*POWER(0.6108,((17.27*A)/(A+237.3))))/(POWER((A+237.3),2)))+((0.0000665*B)*(1+0.34*(E*0.748))))

    -- convert to varchar to go into Value field in Data
    Update DMHDB.dbo.CalcEvap set DMHDB.dbo.CalcEvap.CV = V;

    --Clear out the rows that didn't calculate due to missing data
    DELETE DMHDB.dbo.CalcEvap where DMHDB.dbo.CalcEvap.V IS NULL;

    -- Set the variableID for Evap (22) (to be used in the delete stateemnt)
    UPDATE CalcEvap set EvapVar_ID = 22;

    --Make room in Data for the rows
    --("DT" is to delete rows by reference)
    DELETE DT FROM Data DT INNER JOIN CalcEvap CE
    ON DT.MONITORING_DATETIME = CE.MONITORING_DATETIME AND DT.LEVEL_ID = CE.LEVEL_ID AND DT.VARIABLE_ID = CE.EvapVar_ID;

    --Now put the rows in the temp table into the Data table
    INSERT into Data (Level_id, Monitoring_datetime, Variable_id, value)
    (Select Level_ID, Monitoring_datetime,EvapVar_ID, CV from CalcEvap)

     

  • Thank you for the guidance, especially Chris.  The solution was to include the Index_ID in the first SELECT as shown below.  The run time is now 3 seconds on the 27 million rows.

    INSERT into dmhdb.dbo.CalcEvap (Level_id, Monitoring_datetime, A) 
    (Select Level_id, Monitoring_datetime, VALUE_NUMERIC from Data Where VARIABLE_ID = 2 and monitoring_datetime > @DateFrom
    and Level_id in (Select Distinct (Level_ID) from data where variable_id = 9 and monitoring_datetime > @DateFrom));

    I also dropped the @DateTo as unnecessary.  (VARIABLE_ID =2 is used just because it was at the top of the list of all of the variables and VARIABLE_ID = 9 used in the sub-select because it is the rarest parameter.)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply