April 2, 2012 at 1:59 pm
I have a query that I built to get a list of companies within a certain distance from a list of zip codes. Right now, with a list of 1056 zip codes, the query takes 17 minutes. I'm sure that I'm doing something wrong, but I don't know how to make the query fast enough to be used real-time (if even possible). So here's the overview: the query pulls a zipcode from the Location table, joins that zipcode to a table with zipcodes and their respective latitude and longitude, then uses that to find out the number of businesses within a 2-, 5-, 10- and 20-mile radius using a function that determines distance based on latitude and longitude.
Here is the query:
SELECT
Location.zipcode AS ZipCode,
(SELECT count(V.vendorid)
FROM Vendor AS V
WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 2
AND V.Active = 1
) AS Within2Miles,
(SELECT count(V.vendorid)
FROM Vendor AS V
WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 5
AND V.Active = 1
) AS Within5Miles,
(SELECT count(V.vendorid)
FROM Vendor AS V
WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 10
AND V.Active = 1
) AS Within10Miles,
(SELECT count(V.vendorid)
FROM Vendor AS V
WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 20
AND V.Active = 1
) AS Within20Miles
FROM Location
INNER JOIN zip AS LZ
ON LZ.zip = Location.zipcode
Here is the function:
[dbo].[GetGeoDistance]
(@inLon1 float,
@inLat1 float,
@inLon2 float,
@inLat2 float
)
RETURNS float
AS
BEGIN
-- Declare the return variable here
DECLARE @Result float
DECLARE @a float
DECLARE @C float
DECLARE @radLon float
DECLARE @radLat float
DECLARE @lat1 float
DECLARE @lat2 float
DECLARE @lon1 float
DECLARE @lon2 float
DECLARE @Pi float
SELECT @PI = PI()
SET @lat1 = (@inLat1 * @PI)/180
SET @lat2 = (@inLat2 * @PI)/180
SET @lon1 = (@inLon1 * @PI)/180
SET @lon2 = (@inLon2 * @PI)/180
SET @radLat = ((@inLat2 - @inLat1) * @PI)/180
SET @radLon = ((@inLon2 - @inLon1) * @PI)/180
SET @a = SIN(@radLat/2) * SIN(@radLat/2) + SIN(@radLon/2) * SIN(@radLon/2) * COS(@lat1) * COS(@lat2)
SET @C = 2 * ATN2(SQRT(@a), SQRT(1-@a))
SET @Result = (6371/1.609344) * @C
-- Return the result of the function
RETURN @Result
END
The only things I can change are: the query (obviously), the function, and the Location table. I have used the database tuning advisor and created the suggested statistics and indexes. That got me from 59 minutes to 17 minutes. The one other thing I thought might help is to add the lat and long to the Location table so that I could eliminate that join. The only (minor) problem is that the table is repopulated each time the report is run, so I'd have to insert the lat and long at that point.
Can anyone offer any advice? If I can't get this down to a reasonable time, I'm going to have to set it up as a job to run at night, but I don't really want to do that unless I have to.
Thanks in advance for any help!
April 2, 2012 at 2:05 pm
It is most likely that scalar function that is causing you issues. Scalar functions can absolutely destroy your server.
If you can post some ddl and sample data I think you could call that function once instead of 4. If you can also post execution plans as they exist now that would be a big help.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 2, 2012 at 2:34 pm
If you run a trace on this query you will see that for every pair in the join it is calling the function. On top of that you are doing the calculation 4 times, once for each of the ranges. Given you must be doing a cross join to check each pair you have 4 460 544 calls to the function (1056 X 1056 X 4). Even if each call only takes 2/1000 of a second you are talking about 14 minutes just for the calls to the function.
You could probably cut run time to about 25% just by changing this to a single query returning:
Location.zipcode AS 'ZipCode', V.vendorid,
dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) as 'distance'
Include as single: where dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 20
to reduce the result set.
Then work through the result to get your final information.
If you can provide the schema we could advise a bit more.
Cheers
Leo
Leo
Nothing in life is ever so complicated that with a little work it can't be made more complicated.
April 2, 2012 at 3:19 pm
With nothing to use for testing, here is a shot in the dark:
create function [dbo].[itvf_GetGeoDistance] (
@inLon1 float,
@inLat1 float,
@inLon2 float,
@inLat2 float
)
RETURNS table with schemabinding
AS RETURN
SELECT (6371/1.609344) * (2 * ATN2(SQRT((SIN((((@inLat2 - @inLat1) * PI())/180)/2) * SIN((((@inLat2 - @inLat1) * PI())/180)/2) + SIN((((@inLon2 - @inLon1) * PI())/180)/2) * SIN((((@inLon2 - @inLon1) * PI())/180)/2) * COS(((@inLat1 * PI())/180)) * COS(((@inLat2 * PI())/180)))), SQRT(1-(SIN((((@inLat2 - @inLat1) * PI())/180)/2) * SIN((((@inLat2 - @inLat1) * PI())/180)/2) + SIN((((@inLon2 - @inLon1) * PI())/180)/2) * SIN((((@inLon2 - @inLon1) * PI())/180)/2) * COS(((@inLat1 * PI())/180)) * COS(((@inLat2 * PI())/180)))))) as GeoDistance;
GO
select
loc.zipcode AS ZipCode,
sum(case when ggd.GeoDistance <= 2 then 1 else 0 end) as Within2Miles,
sum(case when ggd.GeoDistance <= 5 then 1 else 0 end) as Within5Miles,
sum(case when ggd.GeoDistance <= 10 then 1 else 0 end) as Within10Miles,
sum(case when ggd.GeoDistance <= 20 then 1 else 0 end) as Within20Miles
from
dbo.Location loc
inner join dbo.zip lz
on (loc.zip = lz.zipcode)
cross join dbo.Vendor v
cross apply dbo.itvf_GetGeoDistance(v.longitude, v.latitude, lz.longitude, lz.latitude) ggd
where
v.Active = 1
group by
loc.zipcode;
April 3, 2012 at 12:19 am
All great suggestions!
Leo, I can't use the GetGeoDistance function once because I need four different distances (unless I'm misunderstanding your post).
Lynn, I tried your functionality and that cut the response time to 1/3 (5-ish minutes), but that's still a long ways from being web app friendly.
I have a zip file with the schemas, some sample data and the current execution plan (before Lynn's suggestions), I just attached it to the main post.
Thanks again!
April 3, 2012 at 7:36 am
It would be a lot easier if your sample data was insert statements. The idea here is for you to make it as easy as possible for the people helping. A lot of people will not even download a zip file.
Here I have posted your ddl. Notice over on the left when you are posting the IFCode shortcuts. The code one is very useful when posting lengthy code.
CREATE TABLE [dbo].[Location](
[locationID] [int] IDENTITY(1,1) NOT NULL,
[zipcode] [varchar](10) NULL,
CONSTRAINT [PK_Location] PRIMARY KEY CLUSTERED
(
[locationID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
CREATE TABLE [dbo].[Vendor](
[VendorID] [int] IDENTITY(1,1) NOT NULL,
[Vendor] [varchar](50) NOT NULL,
[Company] [varchar](100) NULL,
[Department] [varchar](50) NULL,
[Prefix] [varchar](10) NULL,
[FirstName] [varchar](50) NULL,
[MiddleName] [varchar](20) NULL,
[LastName] [varchar](50) NULL,
[Suffix] [varchar](10) NULL,
[Title] [varchar](50) NULL,
[Address] [text] NULL,
[City] [varchar](50) NULL,
[State] [varchar](50) NULL,
[Zip] [varchar](20) NULL,
[Country] [varchar](50) NULL,
[PhoneWork] [varchar](20) NULL,
[PhoneHome] [varchar](20) NULL,
[PhoneFax] [varchar](20) NULL,
[Domain] [varchar](255) NULL,
[Website_URL] [varchar](255) NULL,
[Email] [varchar](255) NULL,
[Team] [varchar](50) NULL,
[Person] [varchar](255) NULL,
[PayableCode] [varchar](50) NULL,
[ReceivableCode] [varchar](50) NULL,
[Currency] [char](3) NULL,
[DateFormat] [varchar](50) NULL,
[Active] [tinyint] NULL,
[Flag0] [varchar](50) NULL,
[Flag1] [varchar](50) NULL,
[Flag2] [varchar](50) NULL,
[Flag3] [varchar](50) NULL,
[Flag4] [varchar](50) NULL,
[Flag5] [varchar](50) NULL,
[Flag6] [varchar](50) NULL,
[Flag7] [varchar](50) NULL,
[Flag8] [varchar](50) NULL,
[Flag9] [varchar](50) NULL,
[County] [varchar](50) NULL,
[PhoneMobile] [varchar](20) NULL,
[TaxRef] [varchar](255) NULL,
[VendorRef] [varchar](255) NULL,
[SubscriberID] [varchar](255) NULL,
[AcceptedInsurance] [text] NULL,
[PreferredProvider] [varchar](1) NULL,
[TraumaResponseFee] [money] NULL,
[OnSiteFee] [money] NULL,
[Latitude] [varchar](255) NULL,
[Longitude] [varchar](255) NULL,
CONSTRAINT [PK_Vendor] PRIMARY KEY NONCLUSTERED
(
[VendorID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY],
CONSTRAINT [IX_Vendor] UNIQUE CLUSTERED
(
[Vendor] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Vendor] WITH NOCHECK ADD CONSTRAINT [FK_Vendor_Person] FOREIGN KEY([Person])
REFERENCES [dbo].[Person] ([Person])
GO
ALTER TABLE [dbo].[Vendor] CHECK CONSTRAINT [FK_Vendor_Person]
GO
ALTER TABLE [dbo].[Vendor] WITH CHECK ADD CONSTRAINT [FK_Vendor_Team] FOREIGN KEY([Team])
REFERENCES [dbo].[Team] ([Team])
GO
ALTER TABLE [dbo].[Vendor] CHECK CONSTRAINT [FK_Vendor_Team]
GO
/****** Object: Table [dbo].[zip] Script Date: 04/02/2012 20:32:30 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[zip](
[zip] [nvarchar](5) NULL,
[state] [nvarchar](2) NULL,
[city] [nvarchar](22) NULL,
[longitude] [nvarchar](6) NULL,
[latitude] [nvarchar](6) NULL
) ON [PRIMARY]
GO
--Edit--
I posted your ddl without first trying it, there are a couple foreign key references in the vendor table to tables we don't have. They seem to work just fine after removing those.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 3, 2012 at 7:52 am
The problem you are facing is that you are calculating the number of vendors that are 2,5,10, and 20 miles away from each zipcode in the Location table. At it looks like you are doing this every time you need this information.
If you really need this information, preaggregate it and then use that on the web site. How often do the Location, Zip, and Vendor tables change?
April 3, 2012 at 7:54 am
So just to clarify, you want a list of the counts of all vendors within each distance range for every single zip code on your list? You said something about a web page, are you displaying the total count of all vendors within those distances as they relate to each zip code? Seems like storing this in a permanent table would yield much better performance since this data really doesn't change very often.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 3, 2012 at 7:55 am
Seems that Lynn and I both agree that storing this data would be better than calculating it on the fly...didn't see the response as I was typing mine. 😛
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 3, 2012 at 9:36 am
some thoughts,
1) could you calculate the extremis lat/long @ 2,5,10,20 miles for each of the compass points (up to you whether you use [4 point: NSEW], [8 points: N,NE,E,SE,S,SW,S,NW] or [16 points: N,NNW,NW,WNW,W,WSW,SW,SSW,S etc..] or more. and once you have this data set choose zip codes which are within at least one of these extremis pairs. It won't be a perfect radius but it will be close (closer, the more points you use)
2) Could you precalculate the distances between all zip codes (there are approx 43000 of them: this would give you a reference table of 1.85 Billion combinations of which only a very small percentage woudl be within 20 miles of each other)
3) if you can upgrade to SQL2008 R2 you may be able to make use of the geospatial extenders
4) can you consume a web resource to provide you with the list of nearby zip codes
http://www.freemaptools.com/find-zip-codes-inside-radius.htm
5) invest in some commercial software to do it for you?
http://www.zip-codes.com/zip-code-radius-app.asp
(no affiliation, never used it, don't know how good it is...)
April 3, 2012 at 10:13 am
Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.
Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:
- User uploads spreadsheet containing between 1 and 10000 zip codes
- Zip codes are inserted into Location table
- Query runs and gets the distances for all of the zip codes in the Location table.
- Excel report is generated for user to download.
Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.
Again, thank you all for your help. Having other people's perspectives helps a lot.
Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂
April 3, 2012 at 10:56 am
shermola1 (4/3/2012)
Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:
- User uploads spreadsheet containing between 1 and 10000 zip codes
- Zip codes are inserted into Location table
- Query runs and gets the distances for all of the zip codes in the Location table.
- Excel report is generated for user to download.
Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.
Again, thank you all for your help. Having other people's perspectives helps a lot.
Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂
Please forgive my nasty tone this morning, must have been in a mood when I got to work today. 🙂 My apologies if I sounded like I was being critical. And certainly when posting sqlplans you should zip them as Gail suggested because they can get pretty big.
They moved the ability to script data in 2008. You can still get to it but it is kind of hidden.
Right click the database --> Tasks --> Generate Scripts...
This will bring up the wizard. When you get to the "Set Scripting Options" you can click the Advanced button. Then scroll down to "Types of data to script". by default it will be "Schema only". You can change this to "Schema and data". It will generate your ddl and all the inserts.
Sometimes those code shortcuts can be a pain. I typically only really use the code ones and I usually just type an empty code one which saves me some headaches. I just use
[ code ] content stuff here [ / code ] (minus the spaces of course)
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 3, 2012 at 11:38 am
shermola1 (4/3/2012)
Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:
- User uploads spreadsheet containing between 1 and 10000 zip codes
- Zip codes are inserted into Location table
- Query runs and gets the distances for all of the zip codes in the Location table.
- Excel report is generated for user to download.
Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.
Again, thank you all for your help. Having other people's perspectives helps a lot.
Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂
Aaron's idea is right inline with mine about preagregating the data. If you add the four distance columns to the Zip table, calculate the number of vendors in each range, all you have to do is join the Zip table and Location table to get the information you require.
When you update the Zip or Vendor table, you recalculate the data. THis could be done at night when the web site is hopefully quieter.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply