June 23, 2011 at 11:57 pm
I just added 750,000 Canadian zip code records to my existing U.S. zip code table.
Query execution time for a U.S. zip code radius search: ~150 milliseconds
Query execution time for new table radius search with Canadian zip codes: ~1000 milliseconds
So adding all the Canadian zip codes makes the query take 6x longer. Is this what you would expect? Just checking. Thanks.
June 24, 2011 at 12:04 am
June 24, 2011 at 12:41 am
matt6749 (6/23/2011)
I just added 750,000 Canadian zip code records to my existing U.S. zip code table.Query execution time for a U.S. zip code search: ~150 milliseconds
Query execution time for new table with Canadian zip codes: ~1000 milliseconds
So adding all the Canadian zip codes makes the query take 6x longer. Is this what you would expect? Just checking. Thanks.
Read the gail's article added in my signature.
Muthukkumaran Kaliyamoorthy
https://www.sqlserverblogforum.com/
June 24, 2011 at 1:16 am
RE: Gail's article, very good, but I don't think it's necessary for this general question.
June 24, 2011 at 1:32 am
Sriram.RM (6/24/2011)
Did you re index the table after inserting the records? whats the difference in the execution plan?
Yes, rebuilt indexes.. My guess is that the increased query time makes sense because I just added another 750,000 records. ?
June 24, 2011 at 1:45 am
Ideally Addition of 750000 records shouldn cause much problems.... Did you check the execution plan?
If you want any of us to analyze further into the issue i suggest you to follow the process mentioned in Gail's article as stated above
June 24, 2011 at 1:56 am
Sriram.RM (6/24/2011)
Ideally Addition of 750000 records shouldn cause much problems.... Did you check the execution plan?If you want any of us to analyze further into the issue i suggest you to follow the process mentioned in Gail's article as stated above
Ok - I've attached the execution plan. Looks like the clustered index scan is most of the cost.. which is good I think..?
June 24, 2011 at 2:09 am
looking at the execution plan , the query 2 accounts for 76 % of your cost and a significant part of it is the index scan.
You might want to see how to make this plan more efficient by getting the optimizer to do an index seek , esp since your only filtering for 5000 approx.
For reasons why see below :-
http://blog.sqlauthority.com/2007/03/30/sql-server-index-seek-vs-index-scan-table-scan/
June 24, 2011 at 2:34 am
Yes, I will look at the query plan more. However, my general question remains - a radius search with 44k zip records is going to be slower that a radius search with 750k zip records. There is obviously going to be some decrease in speed. My question is whether the query time increase from ~150ms to ~850ms is reasonable.
June 24, 2011 at 2:36 am
June 24, 2011 at 2:52 am
Jayanth_Kurup (6/24/2011)
Form past experience NO , the data volume will result in slower query performance , but in no way are the proportional , i.e double the data <> 1/2 the speed.
I see.. With this zip code radius search, though, there are a bunch of INNER JOINS with the zip code table, so (for me) it's pretty involved.
My data shows: zip table size increased 15x, query time increased 5.5x.
DECLARE @start_lat float
DECLARE @MaxNorthLat float
DECLARE @MaxSouthLat float
SELECT @start_lat = (SELECT Z.Latitude FROM dbo.ZipCodesTest Z WHERE Z.ZipCode = @zip)
SET @MaxNorthLat = @start_lat + (@radius*360/24859.82)
SET @MaxSouthLat = @start_lat - (@radius*360/24859.82)
SELECT * FROM
(
SELECT
TotalRows = COUNT(*) OVER(),
RowNum = ROW_NUMBER()OVER (ORDER BY dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)),
P.ProviderID, P.OccupationID, P.LastName, P.FirstName, P.Designations, P.NumRecommendations, P.City, P.Zip, P.Phone, z1.ZipCode, z1.State, DistanceInMiles = dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)
FROM dbo.Providers P WITH (NOLOCK)
INNER JOIN dbo.ZipCodesTest z1 ON z1.ZipCode = P.Zip
INNER JOIN dbo.ZipCodesTest z2 ON dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude) < @radius
AND z1.Latitude BETWEEN @MaxSouthLat AND @MaxNorthLat
AND (@zip IS NULL OR z2.ZipCode = @zip)
)
AS XYZ -- you need this AS XYZ
WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1
ORDER BY RowNum ASC
June 24, 2011 at 8:10 am
Please post the code for fnDistance and then I'll tell you why there's such a time difference. I might also be able to tell you how to fix it, as well.
To summarize, if an INDEX SEEK is being used, then this increase in record volumn should have virtually no impact on performance. Since you have guaranteed that nothing less than an INDEX SCAN will be involved, you have also guaranteed the decrease in performance by increasing the volume.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 24, 2011 at 11:30 am
Jeff Moden (6/24/2011)
@matt6749,Please post the code for fnDistance and then I'll tell you why there's such a time difference. I might also be able to tell you how to fix it, as well.
To summarize, if an INDEX SEEK is being used, then this increase in record volumn should have virtually no impact on performance. Since you have guaranteed that nothing less than an INDEX SCAN will be involved, you have also guaranteed the decrease in performance by increasing the volume.
Thanks Jeff. I can't seem to be able to figure out how to make it use an Index Seek rather than Scan.
Here is the function:
FUNCTION [dbo].[fnDistance]
(
-- returns distance when passed two points, latitude and longitude
@Lat1 float,
@Long1 float,
@Lat2 float,
@Long2 float
)
RETURNS float
AS
BEGIN
DECLARE @Temp FLOAT
SET @Temp = SIN(@Lat1/57.2957795130823) * SIN(@Lat2/57.2957795130823) + COS(@Lat1/57.2957795130823)
* COS(@Lat2/57.2957795130823) * COS(@Long2/57.2957795130823 - @Long1/57.2957795130823)
IF @Temp > 1
SET @Temp = 1
ELSE IF @Temp < -1
SET @Temp = -1
RETURN (3958.75586574 * ACOS(@Temp) )
END
June 24, 2011 at 6:56 pm
Just so you know, I'm working on this and I've solved the problem of the Clustered Index Scan. I've just got some more to do on it.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 24, 2011 at 9:50 pm
Ok... here's the way I'd do this job so that I can get the best speed possible out of it.
First, you have to dump that scalar function for Distance and replace it with a nice, high performance iTVF (inline Table Valued Function)...
CREATE FUNCTION dbo.fnDistance
--===== Returns distance when passed two points, latitude and longitude
-- Converted to a high performance iTVF by Jeff Moden - 24 Jun 2011
(
@Lat1 FLOAT,
@Lon1 FLOAT,
@Lat2 FLOAT,
@Lon2 FLOAT
)
RETURNS TABLE
AS
RETURN WITH
cteFirstPass AS
( --=== Calculate the GREAT CIRCLE distance in Radians using the Haversine Formula
SELECT fp1Temp = (SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)))
+ (COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)))
* COS(RADIANS(@Lon2) - RADIANS(@Lon1))
),
cteSecondPass AS
( --=== This makes a correction not accounted for in the Haversine Formula above
SELECT fp2Temp = CASE WHEN fp1Temp > 1 THEN 1 WHEN fp1Temp < -1 THEN -1 ELSE fp1Temp END
FROM cteFirstPass
) --=== This does the final calculation for distance
-- where 3958.75586574 is the Radius of the Earth in miles (as some see it)
SELECT Distance = 3958.75586574 * ACOS(fp2Temp)
FROM cteSecondPass
GO
The next thing is indexing. I don't know what the indexes are on your ZipCode table because I'm using SQL Server 2005 and couldn't open your 2008 execution plan. With that in mind, I strongly recommend the following 2 indexes. If your clustered index isn't the one I recommend, then drop it for mine. Of course, you'll need to change the table name and columns to match yours. I didn't want to change my table to match yours.
CREATE CLUSTERED INDEX [IX_ZipCodes_Lat_Lon] ON [dbo].[ZipCodes]
(
[Lat] ASC,
[Lon] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ZipCodes_ZipCode] ON [dbo].[ZipCodes]
(
[ZipCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
And, for those that may be wondering... NO! I wasn't the one who created that table-name as a Plural. :sick:
Up next is the code to solve most of the rest of the problem and it does so very, very quickly. The details are all in the code except for one. You might not want to join this with another table because I think it'll make for some rather bad performing code. You can try it but I'd direct the output of this code to a Temp Table (using SELECT/INTO) and then join that to the other tables that you want.
--=================================================================================================
-- This section simulates parameters for a stored proc or iTVF
--=================================================================================================
--===== Declare the parameters for a stored proc or iTVF
DECLARE @pZip CHAR(5), --Change the data-type to match your table
@pRadius INT
;
--===== Preset the parameters as if they were passed to a stored proc or iTVF
SELECT @pZip = '48326',
@pRadius = 15 --Miles assumed because of the 3958.75586574 constant in the function.
; --As a sidebar, you really should document these things in the code.
--=================================================================================================
-- Setup and presets
--=================================================================================================
--===== Declare some obviously-named local variables
DECLARE @StartLat FLOAT,
@StartLon FLOAT,
@MaxNorthLat FLOAT,
@MaxSouthLat FLOAT,
@DegreesPerMile FLOAT
;
--===== Precalculate the Lat/Lon for the ZipCode we're looking for and then
-- figure out the Latitude optimization from that using the search radius
SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth
@StartLat = Lat, --Change Lat to the same as in your table
@StartLon = Lon, --Change Lon to the same as in your table
@MaxNorthLat = Lat + (@pRadius*@DegreesPerMile), --Change Lat to the same as in your table
@MaxSouthLat = Lat - (@pRadius*@DegreesPerMile) --Change Lon to the same as in your table
FROM dbo.ZipCodes --Change this to your table name
WHERE ZipCode = @pZip
;
--=================================================================================================
-- This finds all the Zip Codes and actual distances between the provided Zip Code and the
-- Zip Codes in the Zip Code table. Most people make the mistake of trying to calculate and
-- use the Distance in the Join Criteria which guarantees the best you'll do is a Clustered
-- Index Scan which, in truth, is nothing more than a full table scan. I did it the other
-- way around so I could get the Clustered Index Seek on the indexes I built. That's another
-- story. The Clustered Index should NOT be on the ZipCode... it should be on Lat/Lon and
-- then have a NonClustered Index on ZipCode. It makes for Index Seeks on just about any
-- query you could run.
--
-- If you have the indexes I recommended in place, this is NASTY fast.
--=================================================================================================
--===== Using the Max/Min Latitude optimization, return a list of ZipCodes within @Radius of the
-- given ZipCode (@pZip)
WITH
cteFindDistances AS
( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF.
SELECT d.Distance, z.ZipCode
FROM dbo.ZipCodes z
CROSS APPLY dbo.fnDistance(@StartLat, @StartLon, z.Lat, z.Lon) d
WHERE z.Lat >= @MaxSouthLat
AND z.Lat <= @MaxNorthLat
) --=== Return only those within the desired radius.
SELECT Distance, ZipCode
FROM cteFindDistances
WHERE Distance <= @pRadius
ORDER BY Distance
;
That works on all US 5 digit ZipCodes (43,602 in this version) in 36 milliseconds on a 9 year old, single 1.8Ghz CPU with just 1 Gig of RAM. Imagine how fast it'll run on your server.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply