Nested Query Performance Help

  • I have a query that I built to get a list of companies within a certain distance from a list of zip codes. Right now, with a list of 1056 zip codes, the query takes 17 minutes. I'm sure that I'm doing something wrong, but I don't know how to make the query fast enough to be used real-time (if even possible). So here's the overview: the query pulls a zipcode from the Location table, joins that zipcode to a table with zipcodes and their respective latitude and longitude, then uses that to find out the number of businesses within a 2-, 5-, 10- and 20-mile radius using a function that determines distance based on latitude and longitude.

    Here is the query:

    SELECT

    Location.zipcode AS ZipCode,

    (SELECT count(V.vendorid)

    FROM Vendor AS V

    WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 2

    AND V.Active = 1

    ) AS Within2Miles,

    (SELECT count(V.vendorid)

    FROM Vendor AS V

    WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 5

    AND V.Active = 1

    ) AS Within5Miles,

    (SELECT count(V.vendorid)

    FROM Vendor AS V

    WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 10

    AND V.Active = 1

    ) AS Within10Miles,

    (SELECT count(V.vendorid)

    FROM Vendor AS V

    WHERE dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 20

    AND V.Active = 1

    ) AS Within20Miles

    FROM Location

    INNER JOIN zip AS LZ

    ON LZ.zip = Location.zipcode

    Here is the function:

    [dbo].[GetGeoDistance]

    (@inLon1 float,

    @inLat1 float,

    @inLon2 float,

    @inLat2 float

    )

    RETURNS float

    AS

    BEGIN

    -- Declare the return variable here

    DECLARE @Result float

    DECLARE @a float

    DECLARE @C float

    DECLARE @radLon float

    DECLARE @radLat float

    DECLARE @lat1 float

    DECLARE @lat2 float

    DECLARE @lon1 float

    DECLARE @lon2 float

    DECLARE @Pi float

    SELECT @PI = PI()

    SET @lat1 = (@inLat1 * @PI)/180

    SET @lat2 = (@inLat2 * @PI)/180

    SET @lon1 = (@inLon1 * @PI)/180

    SET @lon2 = (@inLon2 * @PI)/180

    SET @radLat = ((@inLat2 - @inLat1) * @PI)/180

    SET @radLon = ((@inLon2 - @inLon1) * @PI)/180

    SET @a = SIN(@radLat/2) * SIN(@radLat/2) + SIN(@radLon/2) * SIN(@radLon/2) * COS(@lat1) * COS(@lat2)

    SET @C = 2 * ATN2(SQRT(@a), SQRT(1-@a))

    SET @Result = (6371/1.609344) * @C

    -- Return the result of the function

    RETURN @Result

    END

    The only things I can change are: the query (obviously), the function, and the Location table. I have used the database tuning advisor and created the suggested statistics and indexes. That got me from 59 minutes to 17 minutes. The one other thing I thought might help is to add the lat and long to the Location table so that I could eliminate that join. The only (minor) problem is that the table is repopulated each time the report is run, so I'd have to insert the lat and long at that point.

    Can anyone offer any advice? If I can't get this down to a reasonable time, I'm going to have to set it up as a job to run at night, but I don't really want to do that unless I have to.

    Thanks in advance for any help!

  • It is most likely that scalar function that is causing you issues. Scalar functions can absolutely destroy your server.

    If you can post some ddl and sample data I think you could call that function once instead of 4. If you can also post execution plans as they exist now that would be a big help.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you run a trace on this query you will see that for every pair in the join it is calling the function. On top of that you are doing the calculation 4 times, once for each of the ranges. Given you must be doing a cross join to check each pair you have 4 460 544 calls to the function (1056 X 1056 X 4). Even if each call only takes 2/1000 of a second you are talking about 14 minutes just for the calls to the function.

    You could probably cut run time to about 25% just by changing this to a single query returning:

    Location.zipcode AS 'ZipCode', V.vendorid,

    dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) as 'distance'

    Include as single: where dbo.GetGeoDistance(V.longitude, V.latitude, LZ.longitude, LZ.latitude) <= 20 to reduce the result set.

    Then work through the result to get your final information.

    If you can provide the schema we could advise a bit more.

    Cheers

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • With nothing to use for testing, here is a shot in the dark:

    create function [dbo].[itvf_GetGeoDistance] (

    @inLon1 float,

    @inLat1 float,

    @inLon2 float,

    @inLat2 float

    )

    RETURNS table with schemabinding

    AS RETURN

    SELECT (6371/1.609344) * (2 * ATN2(SQRT((SIN((((@inLat2 - @inLat1) * PI())/180)/2) * SIN((((@inLat2 - @inLat1) * PI())/180)/2) + SIN((((@inLon2 - @inLon1) * PI())/180)/2) * SIN((((@inLon2 - @inLon1) * PI())/180)/2) * COS(((@inLat1 * PI())/180)) * COS(((@inLat2 * PI())/180)))), SQRT(1-(SIN((((@inLat2 - @inLat1) * PI())/180)/2) * SIN((((@inLat2 - @inLat1) * PI())/180)/2) + SIN((((@inLon2 - @inLon1) * PI())/180)/2) * SIN((((@inLon2 - @inLon1) * PI())/180)/2) * COS(((@inLat1 * PI())/180)) * COS(((@inLat2 * PI())/180)))))) as GeoDistance;

    GO

    select

    loc.zipcode AS ZipCode,

    sum(case when ggd.GeoDistance <= 2 then 1 else 0 end) as Within2Miles,

    sum(case when ggd.GeoDistance <= 5 then 1 else 0 end) as Within5Miles,

    sum(case when ggd.GeoDistance <= 10 then 1 else 0 end) as Within10Miles,

    sum(case when ggd.GeoDistance <= 20 then 1 else 0 end) as Within20Miles

    from

    dbo.Location loc

    inner join dbo.zip lz

    on (loc.zip = lz.zipcode)

    cross join dbo.Vendor v

    cross apply dbo.itvf_GetGeoDistance(v.longitude, v.latitude, lz.longitude, lz.latitude) ggd

    where

    v.Active = 1

    group by

    loc.zipcode;

  • All great suggestions!

    Leo, I can't use the GetGeoDistance function once because I need four different distances (unless I'm misunderstanding your post).

    Lynn, I tried your functionality and that cut the response time to 1/3 (5-ish minutes), but that's still a long ways from being web app friendly.

    I have a zip file with the schemas, some sample data and the current execution plan (before Lynn's suggestions), I just attached it to the main post.

    Thanks again!

  • It would be a lot easier if your sample data was insert statements. The idea here is for you to make it as easy as possible for the people helping. A lot of people will not even download a zip file.

    Here I have posted your ddl. Notice over on the left when you are posting the IFCode shortcuts. The code one is very useful when posting lengthy code.

    CREATE TABLE [dbo].[Location](

    [locationID] [int] IDENTITY(1,1) NOT NULL,

    [zipcode] [varchar](10) NULL,

    CONSTRAINT [PK_Location] PRIMARY KEY CLUSTERED

    (

    [locationID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    CREATE TABLE [dbo].[Vendor](

    [VendorID] [int] IDENTITY(1,1) NOT NULL,

    [Vendor] [varchar](50) NOT NULL,

    [Company] [varchar](100) NULL,

    [Department] [varchar](50) NULL,

    [Prefix] [varchar](10) NULL,

    [FirstName] [varchar](50) NULL,

    [MiddleName] [varchar](20) NULL,

    [LastName] [varchar](50) NULL,

    [Suffix] [varchar](10) NULL,

    [Title] [varchar](50) NULL,

    [Address] [text] NULL,

    [City] [varchar](50) NULL,

    [State] [varchar](50) NULL,

    [Zip] [varchar](20) NULL,

    [Country] [varchar](50) NULL,

    [PhoneWork] [varchar](20) NULL,

    [PhoneHome] [varchar](20) NULL,

    [PhoneFax] [varchar](20) NULL,

    [Domain] [varchar](255) NULL,

    [Website_URL] [varchar](255) NULL,

    [Email] [varchar](255) NULL,

    [Team] [varchar](50) NULL,

    [Person] [varchar](255) NULL,

    [PayableCode] [varchar](50) NULL,

    [ReceivableCode] [varchar](50) NULL,

    [Currency] [char](3) NULL,

    [DateFormat] [varchar](50) NULL,

    [Active] [tinyint] NULL,

    [Flag0] [varchar](50) NULL,

    [Flag1] [varchar](50) NULL,

    [Flag2] [varchar](50) NULL,

    [Flag3] [varchar](50) NULL,

    [Flag4] [varchar](50) NULL,

    [Flag5] [varchar](50) NULL,

    [Flag6] [varchar](50) NULL,

    [Flag7] [varchar](50) NULL,

    [Flag8] [varchar](50) NULL,

    [Flag9] [varchar](50) NULL,

    [County] [varchar](50) NULL,

    [PhoneMobile] [varchar](20) NULL,

    [TaxRef] [varchar](255) NULL,

    [VendorRef] [varchar](255) NULL,

    [SubscriberID] [varchar](255) NULL,

    [AcceptedInsurance] [text] NULL,

    [PreferredProvider] [varchar](1) NULL,

    [TraumaResponseFee] [money] NULL,

    [OnSiteFee] [money] NULL,

    [Latitude] [varchar](255) NULL,

    [Longitude] [varchar](255) NULL,

    CONSTRAINT [PK_Vendor] PRIMARY KEY NONCLUSTERED

    (

    [VendorID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY],

    CONSTRAINT [IX_Vendor] UNIQUE CLUSTERED

    (

    [Vendor] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    ALTER TABLE [dbo].[Vendor] WITH NOCHECK ADD CONSTRAINT [FK_Vendor_Person] FOREIGN KEY([Person])

    REFERENCES [dbo].[Person] ([Person])

    GO

    ALTER TABLE [dbo].[Vendor] CHECK CONSTRAINT [FK_Vendor_Person]

    GO

    ALTER TABLE [dbo].[Vendor] WITH CHECK ADD CONSTRAINT [FK_Vendor_Team] FOREIGN KEY([Team])

    REFERENCES [dbo].[Team] ([Team])

    GO

    ALTER TABLE [dbo].[Vendor] CHECK CONSTRAINT [FK_Vendor_Team]

    GO

    /****** Object: Table [dbo].[zip] Script Date: 04/02/2012 20:32:30 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[zip](

    [zip] [nvarchar](5) NULL,

    [state] [nvarchar](2) NULL,

    [city] [nvarchar](22) NULL,

    [longitude] [nvarchar](6) NULL,

    [latitude] [nvarchar](6) NULL

    ) ON [PRIMARY]

    GO

    --Edit--

    I posted your ddl without first trying it, there are a couple foreign key references in the vendor table to tables we don't have. They seem to work just fine after removing those.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The problem you are facing is that you are calculating the number of vendors that are 2,5,10, and 20 miles away from each zipcode in the Location table. At it looks like you are doing this every time you need this information.

    If you really need this information, preaggregate it and then use that on the web site. How often do the Location, Zip, and Vendor tables change?

  • So just to clarify, you want a list of the counts of all vendors within each distance range for every single zip code on your list? You said something about a web page, are you displaying the total count of all vendors within those distances as they relate to each zip code? Seems like storing this in a permanent table would yield much better performance since this data really doesn't change very often.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Seems that Lynn and I both agree that storing this data would be better than calculating it on the fly...didn't see the response as I was typing mine. 😛

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • some thoughts,

    1) could you calculate the extremis lat/long @ 2,5,10,20 miles for each of the compass points (up to you whether you use [4 point: NSEW], [8 points: N,NE,E,SE,S,SW,S,NW] or [16 points: N,NNW,NW,WNW,W,WSW,SW,SSW,S etc..] or more. and once you have this data set choose zip codes which are within at least one of these extremis pairs. It won't be a perfect radius but it will be close (closer, the more points you use)

    2) Could you precalculate the distances between all zip codes (there are approx 43000 of them: this would give you a reference table of 1.85 Billion combinations of which only a very small percentage woudl be within 20 miles of each other)

    3) if you can upgrade to SQL2008 R2 you may be able to make use of the geospatial extenders

    4) can you consume a web resource to provide you with the list of nearby zip codes

    http://www.freemaptools.com/find-zip-codes-inside-radius.htm

    5) invest in some commercial software to do it for you?

    http://www.zip-codes.com/zip-code-radius-app.asp

    (no affiliation, never used it, don't know how good it is...)

  • Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.

    Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:

    - User uploads spreadsheet containing between 1 and 10000 zip codes

    - Zip codes are inserted into Location table

    - Query runs and gets the distances for all of the zip codes in the Location table.

    - Excel report is generated for user to download.

    Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.

    Again, thank you all for your help. Having other people's perspectives helps a lot.

    Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂

  • shermola1 (4/3/2012)


    Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.

    Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:

    - User uploads spreadsheet containing between 1 and 10000 zip codes

    - Zip codes are inserted into Location table

    - Query runs and gets the distances for all of the zip codes in the Location table.

    - Excel report is generated for user to download.

    Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.

    Again, thank you all for your help. Having other people's perspectives helps a lot.

    Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂

    Please forgive my nasty tone this morning, must have been in a mood when I got to work today. 🙂 My apologies if I sounded like I was being critical. And certainly when posting sqlplans you should zip them as Gail suggested because they can get pretty big.

    They moved the ability to script data in 2008. You can still get to it but it is kind of hidden.

    Right click the database --> Tasks --> Generate Scripts...

    This will bring up the wizard. When you get to the "Set Scripting Options" you can click the Advanced button. Then scroll down to "Types of data to script". by default it will be "Schema only". You can change this to "Schema and data". It will generate your ddl and all the inserts.

    Sometimes those code shortcuts can be a pain. I typically only really use the code ones and I usually just type an empty code one which saves me some headaches. I just use

    [ code ] content stuff here [ / code ] (minus the spaces of course)

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • shermola1 (4/3/2012)


    Thanks again for the suggestions! Sean, I didn't do Insert statements because I wanted to give a larger dataset than I was willing to type out manually and I didn't think there was a way to export data into insert statements (is there?). Also, I used a zip file because I read Gail Shaw's suggestions on "How to Post Performance Problems" and that is what Gail suggested. I'm sure I didn't do it perfectly, but I did try.

    Lynn & Sean, the Vendor and the Zip tables do not change very often, the location table changes every time the query is run. Here's an example of how it works:

    - User uploads spreadsheet containing between 1 and 10000 zip codes

    - Zip codes are inserted into Location table

    - Query runs and gets the distances for all of the zip codes in the Location table.

    - Excel report is generated for user to download.

    Aaron, I like your idea about precompiling the data into a table, then querying that table to get the distances for the report. I'll also have to look into the geospatial extenders since I'm using 2008 R2.

    Again, thank you all for your help. Having other people's perspectives helps a lot.

    Edit: I did try to use the IFCODE shortcut and it was giving me grief. I don't remember exactly what the problem was, just that I was frustrated and figured I must be trying to put a square peg in a round hole. I think it had to do with the fact that I originally started to post the question in the Ask SQL Server Central area, realized my mistake and pasted it here. That's also why my formatting was all wonky. Sorry for both of those things. 🙂

    Aaron's idea is right inline with mine about preagregating the data. If you add the four distance columns to the Zip table, calculate the number of vendors in each range, all you have to do is join the Zip table and Location table to get the information you require.

    When you update the Zip or Vendor table, you recalculate the data. THis could be done at night when the web site is hopefully quieter.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply