Index Tuning Wizard On Multiple Databases, and Joins From Calculations

  • You have to find a way to make sure they use the datarange. a scan of 1 M rows just can't be fast.

  • Perhaps not.  But 135K has to be doable in under 30 seconds of CPU time.    If online stores can inventory millions of items for millions of customers and get info back in 200ms then surely I can do something this relatively simple... 

  • olap systems are not the same... they have report tables that are refreshed every x minutes... so you don't wait for the query to run.

    Can you put the function code in a sp. execute the sp and give me the plan... maybe I missed something.

  • Now it's getting weirder...  I moved the VehicleStartDate and VehicleStartMileage to [Common].dbo.tblVehicle so now the join to tblVehicleStart has been eliminated.  Now the straight query (no UDF or SP) runs in .78sec CPU time.

    dbcc dropcleanbuffers

    go

    SELECT

     [Common].dbo.tblVehicle.VehicleID,

     SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

     MAX(tblMerged.StopOdometer) -

      (CASE WHEN ((null IS NULL) AND (NOT [Common].dbo.tblVehicle.StartMileage IS NULL))

       THEN [Common].dbo.tblVehicle.StartMileage

       ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

     MAX(tblMerged.StopOdometer) AS LastOdometer

    FROM tblMerged

     INNER JOIN [Common].dbo.tblVehicle ON

      (tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

      tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

    WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)

     AND (tblMerged.StartTime >= null OR null IS NULL)

     AND (tblMerged.StopTime <= null OR null IS NULL)

    GROUP BY [Common].dbo.tblVehicle.VehicleID, [Common].dbo.tblVehicle.StartMileage

    ...(42 row(s) affected)

    Table 'tblMerged'. Scan count 4, logical reads 3976, physical reads 67, read-ahead reads 3924.

    Table 'Worktable'. Scan count 2, logical reads 160, physical reads 0, read-ahead reads 0.

    Table 'tblVehicle'. Scan count 4, logical reads 12, physical reads 0, read-ahead reads 0.

    Table 'Worktable'. Scan count 14, logical reads 172, physical reads 0, read-ahead reads 0.

    Table 'Worktable'. Scan count 9, logical reads 167, physical reads 0, read-ahead reads 0.

    Table 'Worktable'. Scan count 13, logical reads 171, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 782 ms,  elapsed time = 584 ms.

      |--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]-If ([tblVehicle].[StartMileage]<>NULL) then [tblVehicle].[StartMileage] else [Expr1006]))

           |--Parallelism(Gather Streams)

                |--Hash Match(Aggregate, HASH: ([tblVehicle].[VehicleID]), RESIDUAL: ([tblVehicle].[VehicleID]=[tblVehicle].[VehicleID]) DEFINE: ([Expr1004]=SUM([partialagg1009]), [Expr1005]=MAX([partialagg1010]), [Expr1006]=MIN([partialagg1011]), [tblVehicle].[StartMileage]=ANY([tblVehicle].[StartMileage])))

                     |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicle].[VehicleID]))

                          |--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR [tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))

                               |--Hash Match(Aggregate, HASH: ([tblMerged].[Right6VIN], [tblMerged].[VIN]), RESIDUAL: ([tblMerged].[Right6VIN]=[tblMerged].[Right6VIN] AND [tblMerged].[VIN]=[tblMerged].[VIN]) DEFINE: ([partialagg1009]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]), [partialagg1010]=MAX([tblMerged].[StopOdometer]), [partialagg1011]=MIN([tblMerged].[StartOdometer])))

                               |    |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblMerged].[Right6VIN], [tblMerged].[VIN]))

                               |         |--Clustered Index Scan(OBJECT: ([].[dbo].[tblMerged].[PK_tblMerged]), WHERE: ([tblMerged].[StartOdometer]<>0 AND [tblMerged].[StopOdometer]<>0))

                               |--Table Spool

                                    |--Clustered Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[PK_tblTruck]))

     

    *********************

    BUT.  In the stored procedure, it's still 28 seconds (slightly better than before, at least)... 

    CREATE  PROCEDURE dbo.sp_VehicleDistances

    (

     @StartTime datetime = null,

     @StopTime datetime = null

    )

    AS

    SELECT

     [Common].dbo.tblVehicle.VehicleID,

     SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

     MAX(tblMerged.StopOdometer) -

      (CASE WHEN ((@StopTime IS NULL) AND (NOT [Common].dbo.tblVehicle.StartMileage IS NULL))

       THEN [Common].dbo.tblVehicle.StartMileage

       ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

     MAX(tblMerged.StopOdometer) AS LastOdometer

    FROM tblMerged

     INNER JOIN [Common].dbo.tblVehicle ON

      (tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

      tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

    WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)

     AND (tblMerged.StartTime >= @StartTime OR @StartTime IS NULL)

     AND (tblMerged.StopTime <= @StopTime OR @StopTime IS NULL)

    GROUP BY [Common].dbo.tblVehicle.VehicleID, [Common].dbo.tblVehicle.StartMileage

    GO

    Now I run it......

    dbcc dropcleanbuffers

    go

    exec sp_VehicleDistances null,null

    ...(42 row(s) affected)

    Table 'tblVehicle'. Scan count 114235, logical reads 342705, physical reads 0, read-ahead reads 0.

    Table 'tblMerged'. Scan count 1, logical reads 3976, physical reads 2, read-ahead reads 3985.

    SQL Server Execution Times:

       CPU time = 27938 ms,  elapsed time = 28327 ms.

    SQL Server Execution Times:

       CPU time = 27938 ms,  elapsed time = 28327 ms.

        |--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]-If ([@StopTime]=NULL AND [tblVehicle].[StartMileage]<>NULL) then [tblVehicle].[StartMileage] else [Expr1006]))

                |--Stream Aggregate(GROUP BY: ([tblVehicle].[VehicleID]) DEFINE: ([Expr1004]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]), [Expr1005]=MAX([tblMerged].[StopOdometer]), [Expr1006]=MIN([tblMerged].[StartOdometer]), [tblVehicle].[StartMileage]=ANY([tblVehicle].[StartMileage])))

                     |--Sort(ORDER BY: ([tblVehicle].[VehicleID] ASC))

                          |--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR [tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))

                               |--Clustered Index Scan(OBJECT: ([].[dbo].[tblMerged].[PK_tblMerged]), WHERE: ((([tblMerged].[StartOdometer]<>0 AND [tblMerged].[StopOdometer]<>0) AND ([tblMerged].[StartTime]>=[@StartTime] OR [@StartTime]=NULL)) AND ([tblMerged].[StopTime]<=[@StopTime] OR [@StopTime]=NULL)))

                               |--Clustered Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[PK_tblTruck]))

    I am thoroughly confused.    All I notice is that the work tables are not in the stats i/o.

     

  • well you still have you try what I proposed for the where condition and the join (with indexed calculated column). That's speed up the thing a lot.

    Try that then I'll see if a covering index can be added to further the improvement... in combinaison with the new clustered index I proposed...

  • Changing the clustered key is not a good option for us, but I will play with indexes some more.

    And pardon if this is being a newb, but I don't understand what these things have to do with the UDF/SP taking 28 seconds when the pure SQL query takes less than one second. 

  • try that with the new sp :

    declare one variable by parameter and transfer the parameters values into those variables, then use those variables in the select. then compare the time it takes to run this new version and the old one with the same parameters.

  • Ok I guess we're getting somewhere...  It's not the UDF or the SP that's causing the problem. 

    When I use constants for @StartTime/@StopTime (either nulls, or dates like '1/1/1960' and '1/1/2050') even the UDF/SP runs at under 1 second.  But if I stay with the Variable names, even if I explicitly set them to null, it's back up to 30 seconds. 

    I don't understand why it's so much faster with constants than with non-changing variables...?

    What do you think?

  • It's called parameter sniffing. There's a few articles on that subject on this site. Basically when a sp is first used, a plan is created and saved, but every once in a while you get a new 'unlikely' set of parameters that would require a totally different execution plan, but since there's a cached plan, the old one is used and performance goes down the drain. Use that trick and you should be fine... untill the table is larger and HAVE TO use the other tricks I gave you.

  • Look at

    http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

    Or try to add a WITH RECOMPILE to your stored proc (not sure if you can add it to your UDF - I suppose you can) so that it will be recompiled each time...

    Or, from memory, the trick to overcome param sniffing issues is to add two dummy parameters to your stored proc - eg @dummyStartTime & @dummyFinishTime.  You give these values TYPICAL default values and use these variables in your proc rather than your @startTime and @FinishTime parameters.  Before you execute the proc, you set @dummyStartTime = @startTime and ditto for your finish time.

    That way, SQL creates a nice query plan using typical values because it will use the defaults as set in your dummy parameters.. But, you adjust these parameters using the real parameter values from startTime & finishTime so you get the query you want with an appropriate query plan.

    If you put the dummy params at the end of the parameter list you won't need to specify their values as they have defaults - if this is for the UDF (I haven't been paying that much attention to the volumes of code you've posted - sorry) then you will have to specify DEFAULT as a parameter value when calling the UDF....

    Hope that helps

    But you should really try the computed column too!

  • I find it to work without the defaults on the params... are they absolutely necessary, or just for so other odd occurances?

  • How else could SQL "sniff" the parameters?  It would need something to work with as a basis for forming the query plan.  I haven't really had need to make use of it (although I have a feeling that in a couple of areas of our app it might benefit) - but from what I have read around the place - on here and elsewhere - you need to "trick" it to make a plan using the typical parameter values...

    Defaults could be left off unless it is a UDF - then you need to use the DEFAULT keyword when calling the UDF...

    Am I missing something?  

  • Maybe it's just me... but I never had to make the guys use defaults (in 5 cases). The problem was consistant and with the use of a local variable the problem was always solved. I think it's because the server must recheck the value of the local variable to create the plan because the params mean nothing for that proc. Maybe we should start a new thread for this one??

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply