How is this done ... or how would you do this?

Question

How is this done ... or how would you do this?

webskater

SSCarpal Tunnel

Points: 4214
More actions
April 8, 2010 at 6:28 am

#231481

I have a table with about 2800 rows in it that contains the outbound part of UK postcodes - and the longitude and latitude of (I guess) the centre of that postcode.
I have a formula that will calculate the distance between two postcodes.
Here's the requirement. I'm building an application for a manufacturer that has hundreds of franchised dealers around the country. They all have an ever changing list of used products for sale.
On their web site they want people to be able to enter their postcode, and show them the 5 nearest (to them geographically) products that match their requirements.
So, the logic involved is (I think) ...
find the nearest dealer to the entered postcode and see if they have any of that product in stock
if they have less than 5 in stock, find the next nearest dealer and see if thave any of that product in stock
Repeat until you have 5 products to show.
Would you have a look up table that already has all the calculated distances in between every postcode? I could create a table by looping through the list of postcodes and calculating all the distances - but this table would end up with over 8 million rows. However the data would be well indexed as it would just be two Ints to hold the PostcodeIDs and another field to hold the distance.
So, running a query that says 'Put the dealers in order of distance between the entered postcode and their postcode' would be fast?
An alternative would be to select all the dealers postcodes into a cursor, looping through this building a temporary table containing the distances from the entered postcode (doing the distance calculation on the fly), sorting the temporary table by distance and then looping through that checking if they have product and building another table to contain the product data that will be returned.
It all sounds a bit clunky and heavy-handed - yet loads of websites let you enter your postcode and show you a list of things nearby, in distance order. How is it done?
I'd appreciate your take on how you would do this.
Thanks for any input.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Shawn Melton SSC-Insane Points: 24685 More actions · Answer 1

I had a website that I maintained (after it was written) that included postal code look-up for nearest Liquor Store (no jokes now:-P). He used Google Maps API to do the calculations. Although I would imagine with SQL Server 2008 R2 you will be able to do this within SQL. It will have some new data types for dealing with mapping data.

Shawn Melton
Twitter: @wsmelton
Blog: wsmelton.github.com
Github: wsmelton

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 2

Hi and thanks for your reply.

I did look at Google Maps API recently for another thing - but it hurt my head - getting too old to learn new stuff. Also, it's unsupported so if they decide to change things in the future ... or start charging ...

I'd rather do this all within SQL Server so I can control what's going on.

Cheers

Lowell SSC Guru Points: 323518 More actions · Answer 3

there is an article "zip code Radius search" that can help get you started:

http://www.sqlservercentral.com/scripts/Miscellaneous/30878/

it basically requires a table of postalcode/latitude/longitude, and finds all the records that match within the inputed distance;

there is another thread, can't find it at the moment, that makes a smarter calculation, basically by finding the "current" latitude, adding +/- ten miles(for example), and finding all the lattitudes between the two calculated values.

then it repeats for longitude;

with an index on either column, that makes it pretty quick with index seeks i believe.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 4

webskater

SSCarpal Tunnel

Points: 4214

April 8, 2010 at 9:13 am

#1148378

That's great. Thanks very much.

Stefan_G SSCertifiable Points: 6609 More actions · Answer 5

I would use a precomputed table of distances between all zipcodes and all stores.

Since you only have a few hundred stores this would give a much smaller table than the full zipcode to zipcode table - only about 500.000 rows.

Queries against this table will be extremely fast.

You could also use the new 2008 geography datatype with a spatial index. This is also a good solution, but probably not quite as fast in this simple case as the precomputed alternative.

A spatial index really shines if you have millions of elements in your database, like the position of every house in the UK.

/SG

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 6

Thanks for your reply.

If I wanted to use the pre-computed table for other things - do you think the 8 million row version would be manageable?

Stefan_G SSCertifiable Points: 6609 More actions · Answer 7

Sure, 8 million rows should be no problem. Queries against such a table with an appropriate index will still be blindingly fast.

/SG

webskater SSCarpal Tunnel Points: 4214 More actions · Answer 8

Thanks again to everyone for your help.

One last question. Say I had four postcodes with IDs of 1, 2, 3, and 4

I could structure a table with two PostcodeID columns containing data like this ...

1,2

1,3

1,4

2,1

2,3

2,4

3,1

3,2

3,4

4,1

4,2

4,3

Or like this

1,2

1,3

1,4

2,3

2,4

3,4

In the first situation my Where clause would be ...

Where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID

In the second situation my Where clause would be ..

Where (Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID)

OR (Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID)

The second situation means about half as many rows in the table but the Where clause has two conditions to match.

Which do you think would be faster? In both cases each pair of PostcodeIDs is unique so can be a primary key

WayneS SSC Guru Points: 95460 More actions · Answer 9

sku370870 (4/9/2010)
In the first situation my Where clause would be ...
Where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID
In the second situation my Where clause would be ..
Where (Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID)
OR (Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID)
The second situation means about half as many rows in the table but the Where clause has two conditions to match.
Which do you think would be faster? In both cases each pair of PostcodeIDs is unique so can be a primary key

Anytime you start throwing an "OR" into a where clause, your query is likely to start producing scans instead of seeks.

You can work around this by using a "UNION"...

select * from xyz

where Postcode1ID = @Postcode1ID and Postcode2ID = @Postcode2ID

UNION

select * from xyz

where Postcode1ID = @Postcode2ID and Postcode2ID = @Postcode1ID

In this way, each individual select will do a seek against the PK.

So, considering all of this, I'd go with your first option - twice the data, but it makes the select simpler.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Stefan_G SSCertifiable Points: 6609 More actions · Answer 10

Stefan_G

SSCertifiable

Points: 6609

April 9, 2010 at 5:18 am

#1148962

I agree completely with Wayne.

/SG

Paul White SSC Guru Points: 150467 More actions · Answer 11

sku370870 (4/9/2010)
The second situation means about half as many rows in the table but the Where clause has two conditions to match.

Surely you can get the best of both worlds? Store the data in the compact format, ensuring that PostCode1 always sorts lower than PostCode2. You can then always use a single AND search, so long as you arrange your two variables in the same order.

By the way, I would almost certainly have used the geography type here.

http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi