April 28, 2012 at 5:48 am
Hi guys. Good day to all.
This is the first time I've post in this forum though I was for sometime now looking at the posts of others to gain knowledge in this great RDBMS and to learn from the best on this field.
I wanted to ask some help from the community about optimizing the query I've made in finding the nearest airport based on latitude and longitude using STDistance method of the Geography datatype.
Here's the scenario, I've imported a csv file that contains all the airports in the world which is around 40k plus records. Also, I've imported another csv file which contain the logs from an airplane. It contains the lat/long, fuel used, altitude, etc. The size of this file will depend on how long the airplane is flying. One file I have imported consists of 12k rows. The table is structured something like this:
Log_Details table:
Log_Detail_ID,Log_Detail_Latitude,Log_Detail_Longitude,Log_Detail_Altitude
1, 40.3215, -112.35, 2000
2, 39.5423, -111.45, 1500
3, 41.2563, -113.53, 1500
......
Airport table:
Airport_Ident,Airport_Name,Airport_Latitude,Airport_Longitude,Airport_Geog
MCIAA, Mactan Cebu Airport, 10.6543, -15.1233, 0x00013213112
LAX, Los Angeles, 75.1235, -55.2135, 0x000132654546
NAIA, Ninoy Airport, 15.4566, -26.5463, 0x4564789797
....
Based on the Log_Details table records, I wanted to find the nearest airport based on the Latitude and Longitude. I made a test query to see how long does it take get the nearest airport based on the Log_Details. This is the only set based query I can think of to get the nearest airport.
SELECT Log_Detail_ID,Log_Detail_Latitude,Log_Detail_Longitude
FROM dbo.Log_Details l
CROSS APPLY (SELECT TOP(1) Airport_Ident,Airport_Name,Airport_Latitude,Airport_Longitude FROM dbo.Airport
ORDER BY Airport_Geog.STDistance((GEOGRAPHY::Point(Log_Detail_Latitude, Log_Detail_Longitude, 4326))));
When I ran this query, after it reached 1 min plus, I cancelled it.
That's why I wanted the help of the community if there is any other way that I could speed up the performance of the query considering there are many records in both table? Or is there another way of getting the record without using STDistance method. I don't if this method is slow. Just wanted to know if there is any faster alternatives.
Thank you in advance. 🙂
April 28, 2012 at 11:38 am
The "nearest" airport anywhere in a given flight will never be more than half the distance between the start and end airports of the flight. One very easy and effective optimization to make is to calculate the min and max latitude based on that knowledge and limit the airports you look at to that. A secondary and slightly more complex calulation is to do the same with longitude.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 28, 2012 at 8:43 pm
Hello Jeff.
Thanks for the immediate reply. Sorry it this long for me get back, don't have a computer at home. Have to go to an internet shop.
I kind of get a blur picture of your suggestion. Can you give me an example? Sorry my analytical skills are not really that good :crying:
April 28, 2012 at 9:02 pm
Hello Jeff.
I was thinking about your suggestion. Is it somehow similar to this?
I will get the min/max lat and long based on the flight log. Get the distance between the min/max lat/long and limit the airports based on that distance?
Thank you.
April 28, 2012 at 9:09 pm
Here is a link to a blog by Isaac Kunen discussing "nearest neighbor" solutions using spatial functions in SQL 2008. (This query will actually become fairly easy in SQL 2012.)
http://blogs.msdn.com/b/isaac/archive/2008/10/23/nearest-neighbors.aspx
Edited to drop a lot of nonsense.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 29, 2012 at 12:44 pm
facturan.antonio (4/28/2012)
Hello Jeff.I was thinking about your suggestion. Is it somehow similar to this?
I will get the min/max lat and long based on the flight log. Get the distance between the min/max lat/long and limit the airports based on that distance?
Thank you.
Ummm... kind of but a little bit differently.
1. Get the distance between the two airports for the flight and divide that distance by two.
2. Convert that distance to degrees of Latitude.
3. Subtract and Add that those degrees of Latitude from the Latitude for the given flight point. Those should be your new min an max Latitudes to search for closest airports for points during the flight. It will greatly increase the speed of your original calculation because it can eliminate virtually all other airports in the world for shorter flights.
If you create the same limits for Longitude (taking curvature of the Earth into consideration), you can quickly focus in on a pretty tight window.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 29, 2012 at 12:52 pm
Jeff (and OP) if I read this problem correctly, he doesn't have an endpoint yet and is trying to find said endpoint. Also using that technique you could miss an airport near the center of the flightline if you're looking to locate emergency landing areas.
You're basically just trying to find the closest point in the list from a selected point. This means you'll always have to calculate all of the rows against that point, sort the result, and pick the minimum. That's going to be a very heavy handed process. What Jeff's trying to point out is you need to restrict the list of what you're comparing so you can get a better query, and it needs to be precomputed or an existing attribute.
Try restricting it to airports in the same country, or those on the same side of the equator, or similar. You'll find the process will take a lot less time. In the meanwhile I'll see if I can come up with anything. Is that table you described the FULL table and the only attributes you have available, or was it simplified for the purposes of this example? If there's more attributes we may have some other ideas.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2012 at 1:04 pm
Evil Kraig F (4/29/2012)
Jeff (and OP) if I read this problem correctly, he doesn't have an endpoint yet and is trying to find said endpoint. Also using that technique you could miss an airport near the center of the flightline if you're looking to locate emergency landing areas.You're basically just trying to find the closest point in the list from a selected point. This means you'll always have to calculate all of the rows against that point, sort the result, and pick the minimum. That's going to be a very heavy handed process. What Jeff's trying to point out is you need to restrict the list of what you're comparing so you can get a better query, and it needs to be precomputed or an existing attribute.
Try restricting it to airports in the same country, or those on the same side of the equator, or similar. You'll find the process will take a lot less time. In the meanwhile I'll see if I can come up with anything. Is that table you described the FULL table and the only attributes you have available, or was it simplified for the purposes of this example? If there's more attributes we may have some other ideas.
Actually, it's simpler than all of that for restrictions if you don't know what the end point is. You can't fly much farther than you have fuel for and, hopefully, you know how much fuel you have so could estimate the range left. Convert that range to degrees of Latitutde and use the same suggestion as before.
So far as any "mid point" error goes, how would there be an error unless you were off course? You can't be more than half way away the total distance from either end point unless you're fairly well off course.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 29, 2012 at 1:20 pm
Jeff Moden (4/29/2012)
So far as any "mid point" error goes, how would there be an error unless you were off course? You can't be more than half way away the total distance from either end point unless you're fairly well off course.
Unless I'm misunderstanding what you mean, a graphical example:
This is rough and isn't exactly computed (heck, the blue areas overlap a bit) so I fudged in the larger direction. However, with the blue circles being 'the halfway distances' as I understood what you were describing, notice that Chicago gets left out of the discussion. Depending on where you are on your flight line between Cleveland and Kansas City, there are positions in the flight line where that is the closest airport. At exact calculations I believe Indianapolis would get left out too, but it's a very close thing and GIMP doesn't give me exactly what I want for computations there.
That's more what I meant. If Indianapolis wasn't part of the discussion, the closest airport is definately Chicago, and it's left out of the search. Either that or I *really* didn't understand what you meant.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2012 at 2:51 pm
Nope... not the way I meant it. For the first and second optimizations, you would limit the lat/lons to the following yellow square.
That would be a bit conservative and would cover even the adventures of "Wrongway Peachfuzz" :-D. If you really want to narrow it down quickly, you can use the more radical "mid point" approach as follows (again, use the yellow square limits):
--Jeff Moden
Change is inevitable... Change for the better is not.
April 29, 2012 at 2:55 pm
Jeff Moden (4/29/2012)
Nope... not the way I meant it. For the first and second optimizations, you would limit the lat/lons to the following yellow square.
D'oh, okay, now I getcha. 🙂 Sometimes you just need to have it drawn out in pictures! :w00t:
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 29, 2012 at 3:11 pm
Actually, I'm amazed. You couldn't have picked a better region to make your point about the "half way" mark. Well done.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 30, 2012 at 3:02 am
Anytime you start asking questions about performance, it helps to have some test data.
So I created a table of Airports from the attachment. It's not a very good table because it only has about 2700 airports with geospatial coordinates but it will do. I got it here: http://www.partow.net/miscellaneous/airportdatabase/
Next I had to create data for a flight. I did this by assuming the plane flies in a straight line (from Manila to Pune, India) and captures 1000 data points.
DECLARE @ORILat FLOAT
,@ORILongFLOAT
,@DESLatFLOAT
,@DESLongFLOAT
,@segmentsINT
SELECT @ORILat= 14.50861111-- MNL (Manila)
,@ORILong= 121.0194444
,@DESLat= 18.58194444-- PNQ (Pune, India)
,@DESLong= 73.91944444
,@segments= 1000
DECLARE @Log_Details TABLE
(Log_Detail_ID INT,Log_Detail_Latitude FLOAT,Log_Detail_Longitude FLOAT,Log_Detail_Altitude INT)
;WITH Nbrs_3(n) AS (SELECT 1 UNION SELECT 0)
,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)
,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)
,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)
,Nbrs (n) AS (SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2)
,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs)
INSERT @Log_Details
SELECT TOP (@segments) 1
,@ORILat + (@DESLat - @OriLat) * n / @segments
,@ORILong - (@ORILong - @DESLong) * n / @segments
,1000 + (1000*(n % 15))
FROM Tally
The next step, of which I will not bore you with the gory details was to simulate the haversine formula for calcuating distance from spatial coordinates because I'm running in SQL 2005 so don't have access to that neat STDistance function. You'll see that on the ORDER BY clause below (it is close but not perfect).
The code below runs the 1000 points against all airports and then vs. airports only within a narrow range of latitude and longitude.
-- Convert degrees to radians
DECLARE @b-2 FLOAT
SELECT @b-2 = ACOS(-1.)/180.
SET STATISTICS TIME ON
SELECT *
FROM @Log_Details
CROSS APPLY (
SELECT TOP(1) ICAO, [Name],LatDec,LongDec
FROM Airports
ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x
SET STATISTICS TIME OFF
DECLARE @LatDiff FLOAT, @LongDiff FLOAT
SELECT @LatDiff = 0.5*ABS(@OriLat - @DesLat), @LongDiff = 0.5*ABS(@OriLong - @DesLong)
SET STATISTICS TIME ON
SELECT *
FROM @Log_Details
CROSS APPLY (
SELECT TOP(1) ICAO, [Name], LatDec, LongDec
FROM Airports
WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and
LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff
ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x
SET STATISTICS TIME OFF
-- Now create an index and time it again
CREATE INDEX latlong ON Airports (LatDec, LongDec)
SET STATISTICS TIME ON
SELECT *
FROM @Log_Details
CROSS APPLY (
SELECT TOP(1) ICAO, [Name], LatDec, LongDec
FROM Airports
WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and
LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff
ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))) x
SET STATISTICS TIME OFF
The last step creates an index on Airports latitude and longitude and runs the second query again.
The timing results are:
(1000 row(s) affected)
SQL Server Execution Times:
CPU time = 7613 ms, elapsed time = 7676 ms.
(1000 row(s) affected)
SQL Server Execution Times:
CPU time = 686 ms, elapsed time = 687 ms.
(1000 row(s) affected)
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 609 ms.
As you can see, limiting the airports in the range of the search helped significantly. I think I did it more or less the way Jeff was suggesting. The index helped even more.
No doubt you'll want to check that the airports along the path are correct but they start with MNL and end at PNQ.
This version appears to operate just slightly faster and you could use it if you don't need to return the rest of the stuff about the closest airport (just the name).
SELECT l.*
,(SELECT TOP(1) ICAO
FROM Airports
WHERE LatDec BETWEEN Log_Detail_Latitude - @LatDiff AND Log_Detail_Latitude + @LatDiff and
LongDec BETWEEN Log_Detail_Longitude - @LongDiff AND Log_Detail_Longitude + @LongDiff
ORDER BY 6731*2*ATN2(SQRT(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.))), SQRT(1.-(SQUARE(SIN(@b*(Log_Detail_Latitude-LatDec)/2.)) + COS(LatDec*@b)*COS(Log_Detail_Latitude*@b)*SQUARE(SIN(@b*(Log_Detail_Longitude-LongDec)/2.)))))
) as Airport
FROM @Log_Details l
Query plan cost is just slightly better than the others.
Edit: Note that the longitude difference is not corrected properly for trans-Pacific crossings. That is the added complexity that Jeff mentioned. Just trying to show the performance comparison here.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
April 30, 2012 at 5:48 am
Even the Latitude calculations can get a bit screwy on things like trans-Pacific flights because the actual flight path will usually follow the "Great Circle Path" which is hardly ever parallel to lines of Latitude. For those, you'd need to calculate the Lat/Lon Limits using the half-distance thing for every point. That'll still be lightning quick compared to trying to compare to all the airports of the world.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 30, 2012 at 7:05 am
Hi guys.
I'm really grateful for all the help you've shared. Analyzing the formulas takes a lot of time for me to somehow grasp what does it do. I'm not really good in mathematics 🙂 I'm still trying to understand what do these functions do. Guess have to read more about this. Anyway, earlier today, I've tried The Dixie Flatline's post but somehow it takes so long, maybe it's because I don't have an index in place in the Airports table for Lat and Long fields. I will try dwain.c's post and hopefully it work for me. As for Jeff's suggestion for trans-Pacific flights :-), I still have to read about this as I don't have any clue.
Thanks you guys.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply