Quick question on zip code searches: U.S. and Canada

  • I just added 750,000 Canadian zip code records to my existing U.S. zip code table.

    Query execution time for a U.S. zip code radius search: ~150 milliseconds

    Query execution time for new table radius search with Canadian zip codes: ~1000 milliseconds

    So adding all the Canadian zip codes makes the query take 6x longer. Is this what you would expect? Just checking. Thanks.

  • Did you re index the table after inserting the records? whats the difference in the execution plan?

    [font="Times New Roman"]For better assistance in answering your questions
    Click Here[/url][/font]

  • matt6749 (6/23/2011)


    I just added 750,000 Canadian zip code records to my existing U.S. zip code table.

    Query execution time for a U.S. zip code search: ~150 milliseconds

    Query execution time for new table with Canadian zip codes: ~1000 milliseconds

    So adding all the Canadian zip codes makes the query take 6x longer. Is this what you would expect? Just checking. Thanks.

    Read the gail's article added in my signature.

    Muthukkumaran Kaliyamoorthy
    https://www.sqlserverblogforum.com/

  • RE: Gail's article, very good, but I don't think it's necessary for this general question.

  • Sriram.RM (6/24/2011)


    Did you re index the table after inserting the records? whats the difference in the execution plan?

    Yes, rebuilt indexes.. My guess is that the increased query time makes sense because I just added another 750,000 records. ?

  • Ideally Addition of 750000 records shouldn cause much problems.... Did you check the execution plan?

    If you want any of us to analyze further into the issue i suggest you to follow the process mentioned in Gail's article as stated above

    [font="Times New Roman"]For better assistance in answering your questions
    Click Here[/url][/font]

  • Sriram.RM (6/24/2011)


    Ideally Addition of 750000 records shouldn cause much problems.... Did you check the execution plan?

    If you want any of us to analyze further into the issue i suggest you to follow the process mentioned in Gail's article as stated above

    Ok - I've attached the execution plan. Looks like the clustered index scan is most of the cost.. which is good I think..?

  • looking at the execution plan , the query 2 accounts for 76 % of your cost and a significant part of it is the index scan.

    You might want to see how to make this plan more efficient by getting the optimizer to do an index seek , esp since your only filtering for 5000 approx.

    For reasons why see below :-

    http://blog.sqlauthority.com/2007/03/30/sql-server-index-seek-vs-index-scan-table-scan/

    Jayanth Kurup[/url]

  • Yes, I will look at the query plan more. However, my general question remains - a radius search with 44k zip records is going to be slower that a radius search with 750k zip records. There is obviously going to be some decrease in speed. My question is whether the query time increase from ~150ms to ~850ms is reasonable.

  • Form past experience NO , the data volume will result in slower query performance , but in no way are the proportional , i.e double the data <> 1/2 the speed.

    Jayanth Kurup[/url]

  • Jayanth_Kurup (6/24/2011)


    Form past experience NO , the data volume will result in slower query performance , but in no way are the proportional , i.e double the data <> 1/2 the speed.

    I see.. With this zip code radius search, though, there are a bunch of INNER JOINS with the zip code table, so (for me) it's pretty involved.

    My data shows: zip table size increased 15x, query time increased 5.5x.

    DECLARE @start_lat float

    DECLARE @MaxNorthLat float

    DECLARE @MaxSouthLat float

    SELECT @start_lat = (SELECT Z.Latitude FROM dbo.ZipCodesTest Z WHERE Z.ZipCode = @zip)

    SET @MaxNorthLat = @start_lat + (@radius*360/24859.82)

    SET @MaxSouthLat = @start_lat - (@radius*360/24859.82)

    SELECT * FROM

    (

    SELECT

    TotalRows = COUNT(*) OVER(),

    RowNum = ROW_NUMBER()OVER (ORDER BY dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)),

    P.ProviderID, P.OccupationID, P.LastName, P.FirstName, P.Designations, P.NumRecommendations, P.City, P.Zip, P.Phone, z1.ZipCode, z1.State, DistanceInMiles = dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)

    FROM dbo.Providers P WITH (NOLOCK)

    INNER JOIN dbo.ZipCodesTest z1 ON z1.ZipCode = P.Zip

    INNER JOIN dbo.ZipCodesTest z2 ON dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude) < @radius

    AND z1.Latitude BETWEEN @MaxSouthLat AND @MaxNorthLat

    AND (@zip IS NULL OR z2.ZipCode = @zip)

    )

    AS XYZ -- you need this AS XYZ

    WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1

    ORDER BY RowNum ASC

  • @matt6749,

    Please post the code for fnDistance and then I'll tell you why there's such a time difference. I might also be able to tell you how to fix it, as well.

    To summarize, if an INDEX SEEK is being used, then this increase in record volumn should have virtually no impact on performance. Since you have guaranteed that nothing less than an INDEX SCAN will be involved, you have also guaranteed the decrease in performance by increasing the volume.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/24/2011)


    @matt6749,

    Please post the code for fnDistance and then I'll tell you why there's such a time difference. I might also be able to tell you how to fix it, as well.

    To summarize, if an INDEX SEEK is being used, then this increase in record volumn should have virtually no impact on performance. Since you have guaranteed that nothing less than an INDEX SCAN will be involved, you have also guaranteed the decrease in performance by increasing the volume.

    Thanks Jeff. I can't seem to be able to figure out how to make it use an Index Seek rather than Scan.

    Here is the function:

    FUNCTION [dbo].[fnDistance]

    (

    -- returns distance when passed two points, latitude and longitude

    @Lat1 float,

    @Long1 float,

    @Lat2 float,

    @Long2 float

    )

    RETURNS float

    AS

    BEGIN

    DECLARE @Temp FLOAT

    SET @Temp = SIN(@Lat1/57.2957795130823) * SIN(@Lat2/57.2957795130823) + COS(@Lat1/57.2957795130823)

    * COS(@Lat2/57.2957795130823) * COS(@Long2/57.2957795130823 - @Long1/57.2957795130823)

    IF @Temp > 1

    SET @Temp = 1

    ELSE IF @Temp < -1

    SET @Temp = -1

    RETURN (3958.75586574 * ACOS(@Temp) )

    END

  • Just so you know, I'm working on this and I've solved the problem of the Clustered Index Scan. I've just got some more to do on it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here's the way I'd do this job so that I can get the best speed possible out of it.

    First, you have to dump that scalar function for Distance and replace it with a nice, high performance iTVF (inline Table Valued Function)...

    CREATE FUNCTION dbo.fnDistance

    --===== Returns distance when passed two points, latitude and longitude

    -- Converted to a high performance iTVF by Jeff Moden - 24 Jun 2011

    (

    @Lat1 FLOAT,

    @Lon1 FLOAT,

    @Lat2 FLOAT,

    @Lon2 FLOAT

    )

    RETURNS TABLE

    AS

    RETURN WITH

    cteFirstPass AS

    ( --=== Calculate the GREAT CIRCLE distance in Radians using the Haversine Formula

    SELECT fp1Temp = (SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)))

    + (COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)))

    * COS(RADIANS(@Lon2) - RADIANS(@Lon1))

    ),

    cteSecondPass AS

    ( --=== This makes a correction not accounted for in the Haversine Formula above

    SELECT fp2Temp = CASE WHEN fp1Temp > 1 THEN 1 WHEN fp1Temp < -1 THEN -1 ELSE fp1Temp END

    FROM cteFirstPass

    ) --=== This does the final calculation for distance

    -- where 3958.75586574 is the Radius of the Earth in miles (as some see it)

    SELECT Distance = 3958.75586574 * ACOS(fp2Temp)

    FROM cteSecondPass

    GO

    The next thing is indexing. I don't know what the indexes are on your ZipCode table because I'm using SQL Server 2005 and couldn't open your 2008 execution plan. With that in mind, I strongly recommend the following 2 indexes. If your clustered index isn't the one I recommend, then drop it for mine. Of course, you'll need to change the table name and columns to match yours. I didn't want to change my table to match yours.

    CREATE CLUSTERED INDEX [IX_ZipCodes_Lat_Lon] ON [dbo].[ZipCodes]

    (

    [Lat] ASC,

    [Lon] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_ZipCodes_ZipCode] ON [dbo].[ZipCodes]

    (

    [ZipCode] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    And, for those that may be wondering... NO! I wasn't the one who created that table-name as a Plural. :sick:

    Up next is the code to solve most of the rest of the problem and it does so very, very quickly. The details are all in the code except for one. You might not want to join this with another table because I think it'll make for some rather bad performing code. You can try it but I'd direct the output of this code to a Temp Table (using SELECT/INTO) and then join that to the other tables that you want.

    --=================================================================================================

    -- This section simulates parameters for a stored proc or iTVF

    --=================================================================================================

    --===== Declare the parameters for a stored proc or iTVF

    DECLARE @pZip CHAR(5), --Change the data-type to match your table

    @pRadius INT

    ;

    --===== Preset the parameters as if they were passed to a stored proc or iTVF

    SELECT @pZip = '48326',

    @pRadius = 15 --Miles assumed because of the 3958.75586574 constant in the function.

    ; --As a sidebar, you really should document these things in the code.

    --=================================================================================================

    -- Setup and presets

    --=================================================================================================

    --===== Declare some obviously-named local variables

    DECLARE @StartLat FLOAT,

    @StartLon FLOAT,

    @MaxNorthLat FLOAT,

    @MaxSouthLat FLOAT,

    @DegreesPerMile FLOAT

    ;

    --===== Precalculate the Lat/Lon for the ZipCode we're looking for and then

    -- figure out the Latitude optimization from that using the search radius

    SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth

    @StartLat = Lat, --Change Lat to the same as in your table

    @StartLon = Lon, --Change Lon to the same as in your table

    @MaxNorthLat = Lat + (@pRadius*@DegreesPerMile), --Change Lat to the same as in your table

    @MaxSouthLat = Lat - (@pRadius*@DegreesPerMile) --Change Lon to the same as in your table

    FROM dbo.ZipCodes --Change this to your table name

    WHERE ZipCode = @pZip

    ;

    --=================================================================================================

    -- This finds all the Zip Codes and actual distances between the provided Zip Code and the

    -- Zip Codes in the Zip Code table. Most people make the mistake of trying to calculate and

    -- use the Distance in the Join Criteria which guarantees the best you'll do is a Clustered

    -- Index Scan which, in truth, is nothing more than a full table scan. I did it the other

    -- way around so I could get the Clustered Index Seek on the indexes I built. That's another

    -- story. The Clustered Index should NOT be on the ZipCode... it should be on Lat/Lon and

    -- then have a NonClustered Index on ZipCode. It makes for Index Seeks on just about any

    -- query you could run.

    --

    -- If you have the indexes I recommended in place, this is NASTY fast.

    --=================================================================================================

    --===== Using the Max/Min Latitude optimization, return a list of ZipCodes within @Radius of the

    -- given ZipCode (@pZip)

    WITH

    cteFindDistances AS

    ( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF.

    SELECT d.Distance, z.ZipCode

    FROM dbo.ZipCodes z

    CROSS APPLY dbo.fnDistance(@StartLat, @StartLon, z.Lat, z.Lon) d

    WHERE z.Lat >= @MaxSouthLat

    AND z.Lat <= @MaxNorthLat

    ) --=== Return only those within the desired radius.

    SELECT Distance, ZipCode

    FROM cteFindDistances

    WHERE Distance <= @pRadius

    ORDER BY Distance

    ;

    That works on all US 5 digit ZipCodes (43,602 in this version) in 36 milliseconds on a 9 year old, single 1.8Ghz CPU with just 1 Gig of RAM. Imagine how fast it'll run on your server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply