Index Tuning Wizard On Multiple Databases, and Joins From Calculations

  • The problem is not the right() operation. The problem is that this operation forces the server to do a scan of ALL ROWS to check for a match. If you can use the value without altering it, then a binary search algorithm can be applied to search for the data, greatly reducing the workload.

    Can the vehicule start table be merge with the vehicule table. This seems like it doesn't need to be in a separate table. Also decimal (7,1) seems like it would be a better datatype choice (assuming you want to keep the 0.1 km) of the milage. If you don't want the decimals then a int would do the job just fine.

    On tblMerged, make the id unique index NON clustered, then the clustered index : VIN, StartTime, unique as well .

    Make sure you do this on a test server.. it'll take a lot of time to reorgonize those indexes. Also the starttime and endtime can be converted to smallint (unless you need 1/300 second precision). Same thing for the milage data to decimal. The only drawback I see from this is that the clustered index would be quite large because of the vin which is varchar(17). But it's worth a try.

  • And as Remi suggested, created a computed column to store the 6 rightmost characters.  Consider what you would do if faced with the same situation using a phonebook's index.  If you had to find all surnames ending in ITH, you would have to scan the entire index!  But, if you had an identical index which was automatically kept in sync (ie, using a computed column with an index on it), and the index was just on the last 3 characters, you could directly find the ITH entry and use it to get the page numbers (ie, database bookmarks) to get the data you wanted...

    Sure, there is a very slight (read negligable) penalty for keeping the extra disk space of your 6 characters, but I am sure that you are willing to suffer it to improve the query      Remember: "disk space is cheap!" (well compared to a duplicate server anyway)

  • Yep, especially since another server without a drive is kind of useless .

  • Is there a good reason why the the UDF takes 30 seconds to run when pass in @StartTime=null and @Stoptime=null, but it only takes 14 seconds to run when I copy the code out of the UDF and paste it into Query Analyzer and replace @StartTime and @StopTime with null?  (The UDF returns a table with 42 rows and 4 columns (1 varchar(6) and 3 floats).)

  • send the execution plan(S) again please, and identity the slowest clearly.

  • Alright, here's what you asked for. 

    Method #1, 14 seconds: Here is the entire UDF.  What I did to get 14 seconds is pull out the entire SELECT statement and replace every @StartTime and @StopTime with null.

    CREATE  FUNCTION dbo.fn_VehicleDistances

    (

     @StartTime datetime = null,

     @StopTime datetime = null

    )

    RETURNS @VehicleDistances TABLE (VehicleID varchar(6), MilesCollected float, MilesDriven float, LastOdometer float)

    AS

    BEGIN

    INSERT INTO @VehicleDistances

    SELECT

     [Common].dbo.tblVehicle.VehicleID,

     SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

     MAX(tblMerged.StopOdometer) -

      (CASE WHEN ((@StopTime IS NULL) AND (NOT tblVehicleStart.StartMileage IS NULL))

       THEN dbo.tblVehicleStart.StartMileage

       ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

     MAX(tblMerged.StopOdometer) AS LastOdometer

    FROM tblMerged

     INNER JOIN [Common].dbo.tblVehicle ON

      (tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

      tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

     LEFT OUTER JOIN dbo.tblVehicleStart ON

      [Common].dbo.tblVehicle.VIN = dbo.tblVehicleStart.VIN

    WHERE (tblMerged.StartOdometer > 0) AND (tblMerged.StopOdometer > 0)

     AND (tblMerged.StartTime >= @StartTime OR @StartTime IS NULL)

     AND (tblMerged.StopTime <= @StopTime OR @StopTime IS NULL)

    GROUP BY [Common].dbo.tblVehicle.VehicleID, dbo.tblVehicleStart.StartMileage

    So here is what I actually run in QA:

    SELECT

     [Common].dbo.tblVehicle.VehicleID,

     SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

     MAX(tblMerged.StopOdometer) -

      (CASE WHEN ((null IS NULL) AND (NOT tblVehicleStart.StartMileage IS NULL))

       THEN dbo.tblVehicleStart.StartMileage

       ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

     MAX(tblMerged.StopOdometer) AS LastOdometer

    FROM tblMerged

     INNER JOIN [Common].dbo.tblVehicle ON

      (tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

      tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

     LEFT OUTER JOIN dbo.tblVehicleStart ON

      [Common].dbo.tblVehicle.VIN = dbo.tblVehicleStart.VIN

    WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)

     AND (tblMerged.StartTime >= null OR null IS NULL)

     AND (tblMerged.StopTime <= null OR null IS NULL)

    GROUP BY [Common].dbo.tblVehicle.VehicleID, dbo.tblVehicleStart.StartMileage

    Execution Plan #1:

      |--Compute Scalar(DEFINE: ([Expr1009]=[Expr1007]-If ([tblVehicleStart].[StartMileage]<>NULL) then [tblVehicleStart].[StartMileage]

    else [Expr1008]))

           |--Parallelism(Gather Streams)

                |--Stream Aggregate(GROUP BY: ([tblVehicleStart].[StartMileage], [tblVehicle].[VehicleID]) DEFINE:

    ([Expr1006]=SUM([partialagg1015]), [Expr1007]=MAX([partialagg1016]), [Expr1008]=MIN([partialagg1017])))

                     |--Sort(ORDER BY: ([tblVehicleStart].[StartMileage] ASC, [tblVehicle].[VehicleID] ASC))

                          |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicleStart].[StartMileage],

    [tblVehicle].[VehicleID]))

                               |--Hash Match(Partial Aggregate, HASH: ([tblVehicle].[VehicleID], [tblVehicleStart].[StartMileage]),

    RESIDUAL: ([tblVehicle].[VehicleID]=[tblVehicle].[VehicleID] AND [tblVehicleStart].[StartMileage]=[tblVehicleStart].[StartMileage])

    DEFINE: ([partialagg1015]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]),

    [partialagg1016]=MAX([tblMerged].[StopOdometer]), [partialagg1017]=MIN([tblMerged].[StartOdometer])))

                                    |--Hash Match(Right Outer Join, HASH: ([tblVehicleStart].[VIN])=([tblVehicle].[VIN]), RESIDUAL:

    ([tblVehicle].[VIN]=[tblVehicleStart].[VIN]))

                                         |--Parallelism(Distribute Streams, PARTITION COLUMNS: ([tblVehicleStart].[VIN]))

                                         |    |--Clustered Index Scan(OBJECT: ([].[dbo].[tblVehicleStart].[PK_tblVehicleStart]))

                                         |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicle].[VIN]))

                                              |--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR

    [tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))

                                                   |--Index Scan(OBJECT: ([].[dbo].[tblMerged].[IX_test5]),  WHERE:

    (([tblMerged].[StartOdometer]<0 OR [tblMerged].[StartOdometer]>0) AND ([tblMerged].[StopOdometer]<0 OR

    [tblMerged].[StopOdometer]>0)))

                                                   |--Table Spool

                                                        |--Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[IX_tblVehicle_VIN]))

     

    Method #2, ~30 seconds:  Simply call the UDF with @StartTime = @StopTime = null.

    select * from fn_vehicledistances(null,null)

    Execution Plan #2 (Of course, when calling the UDF, this doesn't tell us much.):

      |--Table Scan(OBJECT: (fn_vehicledistances))

  • Well how many rows are returned with each versions??

  • 42 rows each.  They both return exactly the same results.  This is why I don't understand how one can possibly be twice as fast since it's the same query.

  • Can you send the number of rows from that, time, and the plans too :

    set statistics io on

    set statistics time on

    dbcc dropcleanbuffers

    go

    Select * from dbo.fn(null, null)

    go

    dbcc dropcleanbuffers

    go

    Select * from dbo.fn(@date1, @date2)

    go

    set statistics io off

    set statistics time off

  • set statistics io on

    set statistics time on

    dbcc dropcleanbuffers

    go

    select * from fn_vehicledistances(null,null)

    dbcc dropcleanbuffers

    go

    select * from fn_vehicledistances('1/1/1960','1/1/2010')

    set statistics io off

    set statistics time off

    go

    ---------------

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

       CPU time = 15 ms,  elapsed time = 22 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 22 ms.

    (42 row(s) affected)

    Table '#6F357288'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 32438 ms,  elapsed time = 36222 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

       CPU time = 15 ms,  elapsed time = 23 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 1 ms.

    (42 row(s) affected)

    Table '#711DBAFA'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 30344 ms,  elapsed time = 32399 ms.

    SQL Server Execution Times:

       CPU time = 0 ms,  elapsed time = 0 ms.

    ----------------------

     select * from fn_vehicledistances(null,null)

      |--Table Scan(OBJECTfn_vehicledistances))

     select * from fn_vehicledistances('1/1/1960','1/1/2010')

      |--Table Scan(OBJECTfn_vehicledistances))

  • Actually... here is the plan from the query (not inside the UDF):

    Table 'tblVehicleStart'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0.

    Table 'Worktable'. Scan count 30238, logical reads 30396, physical reads 0, read-ahead reads 0.

    Table 'tblVehicle'. Scan count 4, logical reads 4, physical reads 1, read-ahead reads 0.

    Table 'tblMerged'. Scan count 4, logical reads 3976, physical reads 67, read-ahead reads 3924.

    Table 'Worktable'. Scan count 28752, logical reads 28910, physical reads 0, read-ahead reads 0.

    Table 'Worktable'. Scan count 28153, logical reads 28311, physical reads 0, read-ahead reads 0.

    Table 'Worktable'. Scan count 27088, logical reads 27246, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 34470 ms,  elapsed time = 9329 ms.

    CPU time is still as high but elapsed is only about 30% of that.  I appear to be losing parallelism in the UDF.  The database server has 2 physical CPUs (4 logical) which I suppose would explain this.  I don't completely understand why it can't multiprocess when it's contained in a UDF... sigh.

  • What happens if you shortent the range of the search to let's say 2000 to 2001 so that the # of rows returned is different?

  • dbcc dropcleanbuffers

    go

    select * from fn_vehicledistances('6/1/2005','6/15/2005')

    ------------

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 0 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

       CPU time = 16 ms,  elapsed time = 30 ms.

    SQL Server parse and compile time:

       CPU time = 0 ms, elapsed time = 1 ms.

    (31 row(s) affected)

    Table '#0307610B'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times:

       CPU time = 4344 ms,  elapsed time = 4467 ms.

     

    Much faster.  But remember this is only aggregating over 19K rows because of the WHERE filtering for @StartTime and @StopTime, while with (null,null), there are 135K. 

    And running the SELECT statement without the UDF and substituting in the proper dates gives about the same CPU time but the elapsed time is only 1.74 seconds.

  • So how exactly are you using this function in the main select (join to it, or use in a subquery)?

  • It will primarily be used as the record source for an Access report (where the default timeout is 30 seconds, plus people will get fidgety if it takes much longer than that).  And, unfortunately, even that tblMerged is only 135K rows now, it will likely grow to well over 1M over the next few months.

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply