April 8, 2010 at 6:28 am
I have a table with about 2800 rows in it that contains the outbound part of UK postcodes - and the longitude and latitude of (I guess) the centre of that postcode.
I have a formula that will calculate the distance between two postcodes.
Here's the requirement. I'm building an application for a manufacturer that has hundreds of franchised dealers around the country. They all have an ever changing list of used products for sale.
On their web site they want people to be able to enter their postcode, and show them the 5 nearest (to them geographically) products that match their requirements.
So, the logic involved is (I think) ...
find the nearest dealer to the entered postcode and see if they have any of that product in stock
if they have less than 5 in stock, find the next nearest dealer and see if thave any of that product in stock
Repeat until you have 5 products to show.
Would you have a look up table that already has all the calculated distances in between every postcode? I could create a table by looping through the list of postcodes and calculating all the distances - but this table would end up with over 8 million rows. However the data would be well indexed as it would just be two Ints to hold the PostcodeIDs and another field to hold the distance.
So, running a query that says 'Put the dealers in order of distance between the entered postcode and their postcode' would be fast?
An alternative would be to select all the dealers postcodes into a cursor, looping through this building a temporary table containing the distances from the entered postcode (doing the distance calculation on the fly), sorting the temporary table by distance and then looping through that checking if they have product and building another table to contain the product data that will be returned.
It all sounds a bit clunky and heavy-handed - yet loads of websites let you enter your postcode and show you a list of things nearby, in distance order. How is it done?
I'd appreciate your take on how you would do this.
Thanks for any input.
April 8, 2010 at 6:34 am
I had a website that I maintained (after it was written) that included postal code look-up for nearest Liquor Store (no jokes now:-P). He used Google Maps API to do the calculations. Although I would imagine with SQL Server 2008 R2 you will be able to do this within SQL. It will have some new data types for dealing with mapping data.
Shawn Melton
Twitter: @wsmelton
Blog: wsmelton.github.com
Github: wsmelton
April 8, 2010 at 6:43 am
Hi and thanks for your reply.
I did look at Google Maps API recently for another thing - but it hurt my head - getting too old to learn new stuff. Also, it's unsupported so if they decide to change things in the future ... or start charging ...
I'd rather do this all within SQL Server so I can control what's going on.
Cheers
April 8, 2010 at 7:34 am
there is an article "zip code Radius search" that can help get you started:
http://www.sqlservercentral.com/scripts/Miscellaneous/30878/
it basically requires a table of postalcode/latitude/longitude, and finds all the records that match within the inputed distance;
there is another thread, can't find it at the moment, that makes a smarter calculation, basically by finding the "current" latitude, adding +/- ten miles(for example), and finding all the lattitudes between the two calculated values.
then it repeats for longitude;
with an index on either column, that makes it pretty quick with index seeks i believe.
Lowell
April 8, 2010 at 9:13 am
That's great. Thanks very much.
April 8, 2010 at 10:18 am
I would use a precomputed table of distances between all zipcodes and all stores.
Since you only have a few hundred stores this would give a much smaller table than the full zipcode to zipcode table - only about 500.000 rows.
Queries against this table will be extremely fast.
You could also use the new 2008 geography datatype with a spatial index. This is also a good solution, but probably not quite as fast in this simple case as the precomputed alternative.
A spatial index really shines if you have millions of elements in your database, like the position of every house in the UK.
/SG
April 8, 2010 at 12:58 pm
Thanks for your reply.
If I wanted to use the pre-computed table for other things - do you think the 8 million row version would be manageable?
April 9, 2010 at 12:58 am
Sure, 8 million rows should be no problem. Queries against such a table with an appropriate index will still be blindingly fast.
/SG
April 9, 2010 at 2:39 am
Thanks again to everyone for your help.
One last question. Say I had four postcodes with IDs of 1, 2, 3, and 4
I could structure a table with two PostcodeID columns containing data like this ...
1,2
1,3
1,4
2,1
2,3
2,4
3,1
3,2
3,4
4,1
4,2
4,3
Or like this
1,2
1,3
1,4
2,3
2,4
3,4
In the first situation my Where clause would be ...
Where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID
In the second situation my Where clause would be ..
Where (Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID)
OR (Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID)
The second situation means about half as many rows in the table but the Where clause has two conditions to match.
Which do you think would be faster? In both cases each pair of PostcodeIDs is unique so can be a primary key
April 9, 2010 at 4:01 am
sku370870 (4/9/2010)
In the first situation my Where clause would be ...Where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID
In the second situation my Where clause would be ..
Where (Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID)
OR (Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID)
The second situation means about half as many rows in the table but the Where clause has two conditions to match.
Which do you think would be faster? In both cases each pair of PostcodeIDs is unique so can be a primary key
Anytime you start throwing an "OR" into a where clause, your query is likely to start producing scans instead of seeks.
You can work around this by using a "UNION"...
select * from xyz
where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID
UNION
select * from xyz
where Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID
In this way, each individual select will do a seek against the PK.
So, considering all of this, I'd go with your first option - twice the data, but it makes the select simpler.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
April 9, 2010 at 5:18 am
I agree completely with Wayne.
/SG
April 9, 2010 at 5:46 am
sku370870 (4/9/2010)
The second situation means about half as many rows in the table but the Where clause has two conditions to match.
Surely you can get the best of both worlds? Store the data in the compact format, ensuring that PostCode1 always sorts lower than PostCode2. You can then always use a single AND search, so long as you arrange your two variables in the same order.
By the way, I would almost certainly have used the geography type here.
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply