Index Tuning Wizard On Multiple Databases, and Joins From Calculations

Question

Index Tuning Wizard On Multiple Databases, and Joins From Calculations

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

The problem is not the right() operation. The problem is that this operation forces the server to do a scan of ALL ROWS to check for a match. If you can use the value without altering it, then a binary search algorithm can be applied to search for the data, greatly reducing the workload.

Can the vehicule start table be merge with the vehicule table. This seems like it doesn't need to be in a separate table. Also decimal (7,1) seems like it would be a better datatype choice (assuming you want to keep the 0.1 km) of the milage. If you don't want the decimals then a int would do the job just fine.

On tblMerged, make the id unique index NON clustered, then the clustered index : VIN, StartTime, unique as well .

Make sure you do this on a test server.. it'll take a lot of time to reorgonize those indexes. Also the starttime and endtime can be converted to smallint (unless you need 1/300 second precision). Same thing for the milage data to decimal. The only drawback I see from this is that the clustered index would be quite large because of the vin which is varchar(17). But it's worth a try.

Ian Yates SSCoach Points: 19738 More actions · Answer 2

And as Remi suggested, created a computed column to store the 6 rightmost characters. Consider what you would do if faced with the same situation using a phonebook's index. If you had to find all surnames ending in ITH, you would have to scan the entire index! But, if you had an identical index which was automatically kept in sync (ie, using a computed column with an index on it), and the index was just on the last 3 characters, you could directly find the ITH entry and use it to get the page numbers (ie, database bookmarks) to get the data you wanted...

Sure, there is a very slight (read negligable) penalty for keeping the extra disk space of your 6 characters, but I am sure that you are willing to suffer it to improve the query Remember: "disk space is cheap!" (well compared to a duplicate server anyway)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Yep, especially since another server without a drive is kind of useless .

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 4

Is there a good reason why the the UDF takes 30 seconds to run when pass in @StartTime=null and @Stoptime=null, but it only takes 14 seconds to run when I copy the code out of the UDF and paste it into Query Analyzer and replace @StartTime and @StopTime with null? (The UDF returns a table with 42 rows and 4 columns (1 varchar(6) and 3 floats).)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

send the execution plan(S) again please, and identity the slowest clearly.

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 6

Alright, here's what you asked for.

Method #1, 14 seconds: Here is the entire UDF. What I did to get 14 seconds is pull out the entire SELECT statement and replace every @StartTime and @StopTime with null.

CREATE FUNCTION dbo.fn_VehicleDistances

(

@StartTime datetime = null,

@StopTime datetime = null

)

RETURNS @VehicleDistances TABLE (VehicleID varchar(6), MilesCollected float, MilesDriven float, LastOdometer float)

AS

BEGIN

INSERT INTO @VehicleDistances

SELECT

[Common].dbo.tblVehicle.VehicleID,

SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

MAX(tblMerged.StopOdometer) -

(CASE WHEN ((@StopTime IS NULL) AND (NOT tblVehicleStart.StartMileage IS NULL))

THEN dbo.tblVehicleStart.StartMileage

ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

MAX(tblMerged.StopOdometer) AS LastOdometer

FROM tblMerged

INNER JOIN [Common].dbo.tblVehicle ON

(tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

LEFT OUTER JOIN dbo.tblVehicleStart ON

[Common].dbo.tblVehicle.VIN = dbo.tblVehicleStart.VIN

WHERE (tblMerged.StartOdometer > 0) AND (tblMerged.StopOdometer > 0)

AND (tblMerged.StartTime >= @StartTime OR @StartTime IS NULL)

AND (tblMerged.StopTime <= @StopTime OR @StopTime IS NULL)

GROUP BY [Common].dbo.tblVehicle.VehicleID, dbo.tblVehicleStart.StartMileage

So here is what I actually run in QA:

SELECT

[Common].dbo.tblVehicle.VehicleID,

SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,

MAX(tblMerged.StopOdometer) -

(CASE WHEN ((null IS NULL) AND (NOT tblVehicleStart.StartMileage IS NULL))

THEN dbo.tblVehicleStart.StartMileage

ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,

MAX(tblMerged.StopOdometer) AS LastOdometer

FROM tblMerged

INNER JOIN [Common].dbo.tblVehicle ON

(tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR

tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)

LEFT OUTER JOIN dbo.tblVehicleStart ON

[Common].dbo.tblVehicle.VIN = dbo.tblVehicleStart.VIN

WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)

AND (tblMerged.StartTime >= null OR null IS NULL)

AND (tblMerged.StopTime <= null OR null IS NULL)

GROUP BY [Common].dbo.tblVehicle.VehicleID, dbo.tblVehicleStart.StartMileage

Execution Plan #1:

|--Compute Scalar(DEFINE: ([Expr1009]=[Expr1007]-If ([tblVehicleStart].[StartMileage]<>NULL) then [tblVehicleStart].[StartMileage]

else [Expr1008]))

|--Parallelism(Gather Streams)

|--Stream Aggregate(GROUP BY: ([tblVehicleStart].[StartMileage], [tblVehicle].[VehicleID]) DEFINE:

([Expr1006]=SUM([partialagg1015]), [Expr1007]=MAX([partialagg1016]), [Expr1008]=MIN([partialagg1017])))

|--Sort(ORDER BY: ([tblVehicleStart].[StartMileage] ASC, [tblVehicle].[VehicleID] ASC))

|--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicleStart].[StartMileage],

[tblVehicle].[VehicleID]))

|--Hash Match(Partial Aggregate, HASH: ([tblVehicle].[VehicleID], [tblVehicleStart].[StartMileage]),

RESIDUAL: ([tblVehicle].[VehicleID]=[tblVehicle].[VehicleID] AND [tblVehicleStart].[StartMileage]=[tblVehicleStart].[StartMileage])

DEFINE: ([partialagg1015]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]),

[partialagg1016]=MAX([tblMerged].[StopOdometer]), [partialagg1017]=MIN([tblMerged].[StartOdometer])))

|--Hash Match(Right Outer Join, HASH: ([tblVehicleStart].[VIN])=([tblVehicle].[VIN]), RESIDUAL:

([tblVehicle].[VIN]=[tblVehicleStart].[VIN]))

|--Parallelism(Distribute Streams, PARTITION COLUMNS: ([tblVehicleStart].[VIN]))

| |--Clustered Index Scan(OBJECT: ([].[dbo].[tblVehicleStart].[PK_tblVehicleStart]))

|--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicle].[VIN]))

|--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR

[tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))

|--Index Scan(OBJECT: ([].[dbo].[tblMerged].[IX_test5]), WHERE:

(([tblMerged].[StartOdometer]<0 OR [tblMerged].[StartOdometer]>0) AND ([tblMerged].[StopOdometer]<0 OR

[tblMerged].[StopOdometer]>0)))

|--Table Spool

|--Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[IX_tblVehicle_VIN]))

Method #2, ~30 seconds: Simply call the UDF with @StartTime = @StopTime = null.

select * from fn_vehicledistances(null,null)

Execution Plan #2 (Of course, when calling the UDF, this doesn't tell us much.):

|--Table Scan(OBJECT: (fn_vehicledistances))

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Well how many rows are returned with each versions??

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 8

42 rows each. They both return exactly the same results. This is why I don't understand how one can possibly be twice as fast since it's the same query.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Can you send the number of rows from that, time, and the plans too :

set statistics io on

set statistics time on

dbcc dropcleanbuffers

go

Select * from dbo.fn(null, null)

go

dbcc dropcleanbuffers

go

Select * from dbo.fn(@date1, @date2)

go

set statistics io off

set statistics time off

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 10

set statistics io on

set statistics time on

dbcc dropcleanbuffers

go

select * from fn_vehicledistances(null,null)

dbcc dropcleanbuffers

go

select * from fn_vehicledistances('1/1/1960','1/1/2010')

set statistics io off

set statistics time off

go

---------------

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 15 ms, elapsed time = 22 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 22 ms.

(42 row(s) affected)

Table '#6F357288'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:

CPU time = 32438 ms, elapsed time = 36222 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 15 ms, elapsed time = 23 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

(42 row(s) affected)

Table '#711DBAFA'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:

CPU time = 30344 ms, elapsed time = 32399 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

----------------------

select * from fn_vehicledistances(null,null)

|--Table Scan(OBJECTfn_vehicledistances))

select * from fn_vehicledistances('1/1/1960','1/1/2010')

|--Table Scan(OBJECTfn_vehicledistances))

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 11

Actually... here is the plan from the query (not inside the UDF):

Table 'tblVehicleStart'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0.

Table 'Worktable'. Scan count 30238, logical reads 30396, physical reads 0, read-ahead reads 0.

Table 'tblVehicle'. Scan count 4, logical reads 4, physical reads 1, read-ahead reads 0.

Table 'tblMerged'. Scan count 4, logical reads 3976, physical reads 67, read-ahead reads 3924.

Table 'Worktable'. Scan count 28752, logical reads 28910, physical reads 0, read-ahead reads 0.

Table 'Worktable'. Scan count 28153, logical reads 28311, physical reads 0, read-ahead reads 0.

Table 'Worktable'. Scan count 27088, logical reads 27246, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:

CPU time = 34470 ms, elapsed time = 9329 ms.

CPU time is still as high but elapsed is only about 30% of that. I appear to be losing parallelism in the UDF. The database server has 2 physical CPUs (4 logical) which I suppose would explain this. I don't completely understand why it can't multiprocess when it's contained in a UDF... sigh.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

What happens if you shortent the range of the search to let's say 2000 to 2001 so that the # of rows returned is different?

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 13

dbcc dropcleanbuffers

go

select * from fn_vehicledistances('6/1/2005','6/15/2005')

------------

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 16 ms, elapsed time = 30 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

(31 row(s) affected)

Table '#0307610B'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:

CPU time = 4344 ms, elapsed time = 4467 ms.

Much faster. But remember this is only aggregating over 19K rows because of the WHERE filtering for @StartTime and @StopTime, while with (null,null), there are 135K.

And running the SELECT statement without the UDF and substituting in the proper dates gives about the same CPU time but the elapsed time is only 1.74 seconds.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 14

So how exactly are you using this function in the main select (join to it, or use in a subquery)?

Steve Fontaine SSC-Addicted Points: 415 More actions · Answer 15

It will primarily be used as the record source for an Access report (where the default timeout is 30 seconds, plus people will get fidgety if it takes much longer than that). And, unfortunately, even that tblMerged is only 135K rows now, it will likely grow to well over 1M over the next few months.