November 24, 2014 at 4:03 pm
This is code for doing a zip code radius search that seems to work well. I wonder if it is possible to eliminate using the #Temp table in it though to make it quicker? My table sizes are growing and I want to avoid having performance issues in the future. Thanks.
--===== Declare some obviously-named local variables
DECLARE @StartLat FLOAT,
@StartLon FLOAT,
@MaxNorthLat FLOAT,
@MaxSouthLat FLOAT,
@DegreesPerMile FLOAT
;
--===== Precalculate the Lat/Long for the ZipCode we're looking for and then
-- figure out the Latitude optimization from that using the search radius
SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth
@StartLat = Latitude,
@StartLon = Longitude,
@MaxNorthLat = Latitude + (@radius*@DegreesPerMile),
@MaxSouthLat = Latitude - (@radius*@DegreesPerMile)
FROM dbo.ZipCodes
WHERE ZipCode = @zip;
--========== done precalculating
-- Create CTE
WITH
cteFindDistances AS
( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF. - This code thanks to Jeff Moden
SELECT d.Distance, z.ZipCode, z.State
FROM dbo.ZipCodes z
CROSS APPLY dbo.fnDistanceITVF(@StartLat, @StartLon, z.Latitude, z.Longitude) d
WHERE z.Latitude >= @MaxSouthLat
AND z.Latitude <= @MaxNorthLat
) --=== Return only those within the desired radius
-- Use the CTE
SELECT Distance, ZipCode
INTO #TempZip
FROM cteFindDistances
WHERE Distance <= @radius
SELECT * FROM
(
SELECT
TotalRows = COUNT(*) OVER(),
RowNum = ROW_NUMBER()OVER (ORDER BY T.Distance),
P.ProviderID, P.Title, P.Company, P.City, P.State, DistanceInMiles = T.Distance
FROM dbo.Providers P WITH (NOLOCK)
INNER JOIN #TempZip T ON T.ZipCode = P.Zip
)
AS XYZ
WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1
ORDER BY RowNum ASC
DROP TABLE #TempZip
November 24, 2014 at 4:10 pm
Would have to see dbo.fnDistanceITVF to be able to determine if it would wrap in well.
In some ways, the bigger your underlying data sets the better the #tmp performs, not worse. However, using an index of some form on the intermediary table may be effective for you. Would want to see the .sqlplan before advising that though.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 24, 2014 at 4:12 pm
This is the ITVF:
[dbo].[fnDistanceITVF]
--===== Returns distance when passed two points, latitude and longitude
-- Converted to a high performance iTVF by Jeff Moden - 24 Jun 2011
(
@Lat1 FLOAT,
@Lon1 FLOAT,
@Lat2 FLOAT,
@Lon2 FLOAT
)
RETURNS TABLE
AS
RETURN WITH
cteFirstPass AS
( --=== Calculate the GREAT CIRCLE distance in Radians using the Haversine Formula
SELECT fp1Temp = (SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)))
+ (COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)))
* COS(RADIANS(@Lon2) - RADIANS(@Lon1))
),
cteSecondPass AS
( --=== This makes a correction not accounted for in the Haversine Formula above
SELECT fp2Temp = CASE WHEN fp1Temp > 1 THEN 1 WHEN fp1Temp < -1 THEN -1 ELSE fp1Temp END
FROM cteFirstPass
)
--=== This does the final calculation for distance
-- where 3958.75586574 is the Radius of the Earth in miles (as some see it)
SELECT Distance = 3958.75586574 * ACOS(fp2Temp)
FROM cteSecondPass
November 24, 2014 at 4:36 pm
So the basic idea of this is to find the # of applicable businesses within a radius from a zip code. Got it.
... <deletes a bunch of stuff>
Can we see the schema for dbo.Providers and dbo.ZipCodes?
A lot of my recommendations will depend on your index setups on the Latitude information and the provider's table's ability to connect to the indexes on ZipCodes.
General Thought: You can with some reasonability pre-calc the Latitude boundaries and use that as a filter after a join to providers. That will thin down the possible entries significanly, but not overly significantly. You've just picked up every provider in a (ex: 25 mile) stripe through the country.
From there you're filtering again based on distance to remove a bunch of stuff after a distance calcualtion to deal with the longitude. Depending on the multipliers here (let's say 10 providers/zip code) you're reducing from 10,000 zipcodes vs. 100,000 providers in a mid-step.
Really... I'd have to tinker with it. See some .sqlplans. Check the data heuristics.
However, if Providers is the one growing (I can't see Zip Codes changing much)... I'd avoid removing that temp table, actually. It's saving you from a lot of headaches. Pre-Build #TempZip to have a Unique Clustered Index on ZipCode and make sure your other tables have a covering index on ZipCode as the leading column to assist in seeks.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 24, 2014 at 4:46 pm
Thanks Craig, that's helpful to know that the temp table may be saving me headaches.
Here are the structures/indexes:
[dbo].[Providers](
[ProviderID] [int] IDENTITY(1,1) NOT NULL, -- PK, Unique, indexed, non-clustered
[FirstName] [nvarchar](100) NULL,
[LastName] [nvarchar](100) NULL,
[Address1] [nvarchar](100) NULL,
[Address2] [nvarchar](100) NULL,
[City] [nvarchar](100) NULL,
[State] [nvarchar](100) NULL,
[Zip] [nvarchar](10) NULL -- Indexed, Non-unique, clustered
[dbo].[ZipCodes](
[ZipCode] [nvarchar](10) NOT NULL,
[Latitude] [decimal](18, 8) NULL,
[Longitude] [decimal](18, 8) NULL,
[City] [nvarchar](200) NULL,
[State] [nvarchar](200) NULL,
[County] [nvarchar](200) NULL,
Index: Latitude (ASC), Longitude (ASC) -- Non-unique, Clustered
Index: ZipCode -- PK, unique, non-clustered
November 24, 2014 at 4:54 pm
You clustered ZipCodes on Lat and Lon? You don't use a zipcode shortcut for city/state addressing I take it. Most of your access to that table is by Latitude search, then, I suppose?
More surprising to me is Providers is already clustered on ZipCode. That's a very unique data pattern to expect, where your searches by zip code hit the tables more often than other read. However, if it's your pattern, excellent setup. 🙂
Yeah, in this case, I'd personally just look at indexing #TempZip. If you need to optimize before the Providers search even further, I'd review ways to speed up the pre-#temp reduction by distance, such as squaring it off with @maxLon/@minLon too so you dealt with a square instead of a stripe of ZipCodes, and then used distance to remove the corners. My guess is if you wrap everything in time controls, though, that's a minimal concern.
Under normal circumstances there's a problem with a query when we look to optimize it. In this case you're pre-optimizing. There's such a thing as over-engineering, and we may be approaching that. What's the runtime on this query? Are you seeing specific wait types that have you concerned? Have you tried going into dev and duplicating your providers table x10, or x100, and seen if the result return time becomes unacceptable? Is there something in the Execution Plan that's triggering a concern?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 24, 2014 at 5:13 pm
Thanks very much Craig. Your knowledge is very impressive. - Matt
- The providers table has 60k records. Performance now is good, but I may increase the table size to 500k records.
- Most searches on the Providers table are done by ZipCode (radius search) or by City/State. If searching by City/State, I lookup the corresponding zipcode and then search on that. Occasional searches on Providers are by LastName.
A) Not sure where the clustered index should be on ZipCodes. Maybe not on Lat/Long?
B) >> "My guess is if you wrap everything in time controls, though, that's a minimal concern." Can you explain what you mean by "time controls?"
C) Pre-indexing #TempZip .... not sure how to do that? Create an index on it once the table is created?
November 24, 2014 at 5:26 pm
matt6749 (11/24/2014)
Thanks very much Craig. Your knowledge is very impressive. - Matt
Only until one of the big kids around here steps in and reminds me I'm a noob. 😉
- The providers table has 60k records. Performance now is good, but I may increase the table size to 500k records.
Then start there. In dev generate up a 500k table, even if you just replicate the exact same records 10x. Then side by side create equivalent queries and test them. See if you get a significant performance degradation. If you do, snag the .sqlplan (see the link in my signature for index/tuning help for a walkthrough how if you're unfamiliar) and we can work against that.
- Most searches on the Providers table are done by ZipCode (radius search) or by City/State. If searching by City/State, I lookup the corresponding zipcode and then search on that. Occasional searches on Providers are by LastName.
That makes sense, then. Always go for your most common approach. You're working around the concern with a zip code list so your city/state searches shouldn't be too bad. You'll still probably end up with scans but that's a weight of upkeep vs. usage value that you'll need to determine downstream, once your data load levels off and you can get usage method details from the logs.
A) Not sure where the clustered index should be on ZipCodes. Maybe not on Lat/Long?
This is a question for after reviewing your most common reasons for accessing the ZipCodes table. If you usually attach ZipCodes after searching for stuff as a join to bring along extra data (for example, you only store zip codes in your provider's address tables) then it being the clustered would make sense. If you only usually come to the ZipCodes table to search for ZipCodes by distances, then leave it right where it is.
B) >> "My guess is if you wrap everything in time controls, though, that's a minimal concern." Can you explain what you mean by "time controls?"
C) Pre-indexing #TempZip .... not sure how to do that? Create an index on it once the table is created?
Sorry, example for both. Notice the Print statements and the adjustment in the middle for CREATE TABLE #TempZip:
--===== Declare some obviously-named local variables
DECLARE @StartLat FLOAT,
@StartLon FLOAT,
@MaxNorthLat FLOAT,
@MaxSouthLat FLOAT,
@DegreesPerMile FLOAT
;
-- Timer Tests
DECLARE @StartTime DATETIME2
SET @StartTime = GETDATE()
--===== Precalculate the Lat/Long for the ZipCode we're looking for and then
-- figure out the Latitude optimization from that using the search radius
SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth
@StartLat = Latitude,
@StartLon = Longitude,
@MaxNorthLat = Latitude + (@radius*@DegreesPerMile),
@MaxSouthLat = Latitude - (@radius*@DegreesPerMile)
FROM dbo.ZipCodes
WHERE ZipCode = @zip;
PRINT 'Done Determining MaxLatitudes time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))
SET @StartTime = GETDATE()
IF OBJECT_ID('tempdb..#TempZip') IS NOT NULL
DROP TABLE #TempZip
CREATE TABLE #TempZip
(ZipCode INT NOT NULL UNIQUE CLUSTERED,
Distance NUMERIC (38,20) NOT NULL -- Or whatever the datatype should be
)
/*Alternate Create Table:
CREATE TABLE #TempZip
(ZipCode INT NOT NULL,
Distance NUMERIC (38,20) NOT NULL -- Or whatever the datatype should be
)
CREATE UNIQUE CLUSTERED INDEX idx_TempZip ON #TempZip
(ZipCode)
*/
--========== done precalculating
PRINT 'Create table #TempZip time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))
SET @StartTime = GETDATE()
-- Create CTE
WITH
cteFindDistances AS
( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF. - This code thanks to Jeff Moden
SELECT d.Distance, z.ZipCode, z.State
FROM dbo.ZipCodes z
CROSS APPLY dbo.fnDistanceITVF(@StartLat, @StartLon, z.Latitude, z.Longitude) d
WHERE z.Latitude >= @MaxSouthLat
AND z.Latitude <= @MaxNorthLat
) --=== Return only those within the desired radius
-- Use the CTE
INSERT INTO #TempZip
SELECT ZipCode, Distance
FROM cteFindDistances
WHERE Distance <= @radius
PRINT 'DataLoad into #TempZip time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))
SET @StartTime = GETDATE()
SELECT * FROM
(
SELECT
TotalRows = COUNT(*) OVER(),
RowNum = ROW_NUMBER()OVER (ORDER BY T.Distance),
P.ProviderID, P.Title, P.Company, P.City, P.State, DistanceInMiles = T.Distance
FROM dbo.Providers P WITH (NOLOCK)
INNER JOIN #TempZip T ON T.ZipCode = P.Zip
)
AS XYZ
WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1
ORDER BY RowNum ASC
PRINT 'DataReturn time in ms:' + CONVERT( VARCHAR(30), DATEDIFF( ms, @StartTime, GETDATE()))
SET @StartTime = GETDATE()
DROP TABLE #TempZip
EDIT: Forgot the VARCHAR(30) converts
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 24, 2014 at 5:49 pm
This is immensely helpful. Thanks a lot Craig. - Matt
November 24, 2014 at 6:30 pm
No problem. If you get stuck after doing some experimentation just come on back and we'll help you with whatever specific point is causing you a headache.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply