August 18, 2005 at 12:01 pm
You have to find a way to make sure they use the datarange. a scan of 1 M rows just can't be fast.
August 18, 2005 at 12:08 pm
Perhaps not. But 135K has to be doable in under 30 seconds of CPU time. If online stores can inventory millions of items for millions of customers and get info back in 200ms then surely I can do something this relatively simple...
August 18, 2005 at 12:13 pm
olap systems are not the same... they have report tables that are refreshed every x minutes... so you don't wait for the query to run.
Can you put the function code in a sp. execute the sp and give me the plan... maybe I missed something.
August 18, 2005 at 12:24 pm
Now it's getting weirder... I moved the VehicleStartDate and VehicleStartMileage to [Common].dbo.tblVehicle so now the join to tblVehicleStart has been eliminated. Now the straight query (no UDF or SP) runs in .78sec CPU time.
dbcc dropcleanbuffers
go
SELECT
[Common].dbo.tblVehicle.VehicleID,
SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,
MAX(tblMerged.StopOdometer) -
(CASE WHEN ((null IS NULL) AND (NOT [Common].dbo.tblVehicle.StartMileage IS NULL))
THEN [Common].dbo.tblVehicle.StartMileage
ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,
MAX(tblMerged.StopOdometer) AS LastOdometer
FROM tblMerged
INNER JOIN [Common].dbo.tblVehicle ON
(tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR
tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)
WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)
AND (tblMerged.StartTime >= null OR null IS NULL)
AND (tblMerged.StopTime <= null OR null IS NULL)
GROUP BY [Common].dbo.tblVehicle.VehicleID, [Common].dbo.tblVehicle.StartMileage
...(42 row(s) affected)
Table 'tblMerged'. Scan count 4, logical reads 3976, physical reads 67, read-ahead reads 3924.
Table 'Worktable'. Scan count 2, logical reads 160, physical reads 0, read-ahead reads 0.
Table 'tblVehicle'. Scan count 4, logical reads 12, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 14, logical reads 172, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 9, logical reads 167, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 13, logical reads 171, physical reads 0, read-ahead reads 0.
SQL Server Execution Times:
CPU time = 782 ms, elapsed time = 584 ms.
|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]-If ([tblVehicle].[StartMileage]<>NULL) then [tblVehicle].[StartMileage] else [Expr1006]))
|--Parallelism(Gather Streams)
|--Hash Match(Aggregate, HASH: ([tblVehicle].[VehicleID]), RESIDUAL: ([tblVehicle].[VehicleID]=[tblVehicle].[VehicleID]) DEFINE: ([Expr1004]=SUM([partialagg1009]), [Expr1005]=MAX([partialagg1010]), [Expr1006]=MIN([partialagg1011]), [tblVehicle].[StartMileage]=ANY([tblVehicle].[StartMileage])))
|--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblVehicle].[VehicleID]))
|--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR [tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))
|--Hash Match(Aggregate, HASH: ([tblMerged].[Right6VIN], [tblMerged].[VIN]), RESIDUAL: ([tblMerged].[Right6VIN]=[tblMerged].[Right6VIN] AND [tblMerged].[VIN]=[tblMerged].[VIN]) DEFINE: ([partialagg1009]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]), [partialagg1010]=MAX([tblMerged].[StopOdometer]), [partialagg1011]=MIN([tblMerged].[StartOdometer])))
| |--Parallelism(Repartition Streams, PARTITION COLUMNS: ([tblMerged].[Right6VIN], [tblMerged].[VIN]))
| |--Clustered Index Scan(OBJECT: ([].[dbo].[tblMerged].[PK_tblMerged]), WHERE: ([tblMerged].[StartOdometer]<>0 AND [tblMerged].[StopOdometer]<>0))
|--Table Spool
|--Clustered Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[PK_tblTruck]))
*********************
BUT. In the stored procedure, it's still 28 seconds (slightly better than before, at least)...
CREATE PROCEDURE dbo.sp_VehicleDistances
(
@StartTime datetime = null,
@StopTime datetime = null
)
AS
SELECT
[Common].dbo.tblVehicle.VehicleID,
SUM(tblMerged.StopOdometer - tblMerged.StartOdometer) AS MilesCollected,
MAX(tblMerged.StopOdometer) -
(CASE WHEN ((@StopTime IS NULL) AND (NOT [Common].dbo.tblVehicle.StartMileage IS NULL))
THEN [Common].dbo.tblVehicle.StartMileage
ELSE MIN(tblMerged.StartOdometer) END) AS MilesDriven,
MAX(tblMerged.StopOdometer) AS LastOdometer
FROM tblMerged
INNER JOIN [Common].dbo.tblVehicle ON
(tblMerged.VIN = [Common].dbo.tblVehicle.VIN OR
tblMerged.Right6VIN = [Common].dbo.tblVehicle.VehicleID)
WHERE (tblMerged.StartOdometer <> 0) AND (tblMerged.StopOdometer <> 0)
AND (tblMerged.StartTime >= @StartTime OR @StartTime IS NULL)
AND (tblMerged.StopTime <= @StopTime OR @StopTime IS NULL)
GROUP BY [Common].dbo.tblVehicle.VehicleID, [Common].dbo.tblVehicle.StartMileage
GO
Now I run it......
dbcc dropcleanbuffers
go
exec sp_VehicleDistances null,null
...(42 row(s) affected)
Table 'tblVehicle'. Scan count 114235, logical reads 342705, physical reads 0, read-ahead reads 0.
Table 'tblMerged'. Scan count 1, logical reads 3976, physical reads 2, read-ahead reads 3985.
SQL Server Execution Times:
CPU time = 27938 ms, elapsed time = 28327 ms.
SQL Server Execution Times:
CPU time = 27938 ms, elapsed time = 28327 ms.
|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]-If ([@StopTime]=NULL AND [tblVehicle].[StartMileage]<>NULL) then [tblVehicle].[StartMileage] else [Expr1006]))
|--Stream Aggregate(GROUP BY: ([tblVehicle].[VehicleID]) DEFINE: ([Expr1004]=SUM([tblMerged].[StopOdometer]-[tblMerged].[StartOdometer]), [Expr1005]=MAX([tblMerged].[StopOdometer]), [Expr1006]=MIN([tblMerged].[StartOdometer]), [tblVehicle].[StartMileage]=ANY([tblVehicle].[StartMileage])))
|--Sort(ORDER BY: ([tblVehicle].[VehicleID] ASC))
|--Nested Loops(Inner Join, WHERE: ([tblMerged].[VIN]=[tblVehicle].[VIN] OR [tblMerged].[Right6VIN]=[tblVehicle].[VehicleID]))
|--Clustered Index Scan(OBJECT: ([].[dbo].[tblMerged].[PK_tblMerged]), WHERE: ((([tblMerged].[StartOdometer]<>0 AND [tblMerged].[StopOdometer]<>0) AND ([tblMerged].[StartTime]>=[@StartTime] OR [@StartTime]=NULL)) AND ([tblMerged].[StopTime]<=[@StopTime] OR [@StopTime]=NULL)))
|--Clustered Index Scan(OBJECT: ([Common].[dbo].[tblVehicle].[PK_tblTruck]))
I am thoroughly confused. All I notice is that the work tables are not in the stats i/o.
August 18, 2005 at 12:38 pm
well you still have you try what I proposed for the where condition and the join (with indexed calculated column). That's speed up the thing a lot.
Try that then I'll see if a covering index can be added to further the improvement... in combinaison with the new clustered index I proposed...
August 18, 2005 at 12:42 pm
Changing the clustered key is not a good option for us, but I will play with indexes some more.
And pardon if this is being a newb, but I don't understand what these things have to do with the UDF/SP taking 28 seconds when the pure SQL query takes less than one second.
August 18, 2005 at 12:47 pm
try that with the new sp :
declare one variable by parameter and transfer the parameters values into those variables, then use those variables in the select. then compare the time it takes to run this new version and the old one with the same parameters.
August 18, 2005 at 12:56 pm
Ok I guess we're getting somewhere... It's not the UDF or the SP that's causing the problem.
When I use constants for @StartTime/@StopTime (either nulls, or dates like '1/1/1960' and '1/1/2050') even the UDF/SP runs at under 1 second. But if I stay with the Variable names, even if I explicitly set them to null, it's back up to 30 seconds.
I don't understand why it's so much faster with constants than with non-changing variables...?
What do you think?
August 18, 2005 at 1:09 pm
It's called parameter sniffing. There's a few articles on that subject on this site. Basically when a sp is first used, a plan is created and saved, but every once in a while you get a new 'unlikely' set of parameters that would require a totally different execution plan, but since there's a cached plan, the old one is used and performance goes down the drain. Use that trick and you should be fine... untill the table is larger and HAVE TO use the other tricks I gave you.
August 19, 2005 at 7:01 pm
Look at
http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Or try to add a WITH RECOMPILE to your stored proc (not sure if you can add it to your UDF - I suppose you can) so that it will be recompiled each time...
Or, from memory, the trick to overcome param sniffing issues is to add two dummy parameters to your stored proc - eg @dummyStartTime & @dummyFinishTime. You give these values TYPICAL default values and use these variables in your proc rather than your @startTime and @FinishTime parameters. Before you execute the proc, you set @dummyStartTime = @startTime and ditto for your finish time.
That way, SQL creates a nice query plan using typical values because it will use the defaults as set in your dummy parameters.. But, you adjust these parameters using the real parameter values from startTime & finishTime so you get the query you want with an appropriate query plan.
If you put the dummy params at the end of the parameter list you won't need to specify their values as they have defaults - if this is for the UDF (I haven't been paying that much attention to the volumes of code you've posted - sorry) then you will have to specify DEFAULT as a parameter value when calling the UDF....
Hope that helps
But you should really try the computed column too!
August 19, 2005 at 8:57 pm
I find it to work without the defaults on the params... are they absolutely necessary, or just for so other odd occurances?
August 20, 2005 at 2:26 am
How else could SQL "sniff" the parameters? It would need something to work with as a basis for forming the query plan. I haven't really had need to make use of it (although I have a feeling that in a couple of areas of our app it might benefit) - but from what I have read around the place - on here and elsewhere - you need to "trick" it to make a plan using the typical parameter values...
Defaults could be left off unless it is a UDF - then you need to use the DEFAULT keyword when calling the UDF...
Am I missing something?
August 20, 2005 at 6:37 am
Maybe it's just me... but I never had to make the guys use defaults (in 5 cases). The problem was consistant and with the use of a local variable the problem was always solved. I think it's because the server must recheck the value of the local variable to create the plan because the params mean nothing for that proc. Maybe we should start a new thread for this one??
Viewing 13 posts - 31 through 42 (of 42 total)
You must be logged in to reply to this topic. Login to reply