This article shows how you can "geocode" street address information natively in T-SQL with MS SQL 2005, and as a byproduct, how you can validate, normalize, and parse address information.
Note that this approach will work in SQL 2008 as well, though SQL 2008 provides geospatial types and functions that could be used in lieu of the numeric data type used here.
Geocoding is the process of converting a geographical address into latitude and longitude (GPS coordinates). Geocoding address information has a number of benefits:
- Means to interoperate with web-based mapping services
- Means to interoperate with GPS devices
- Makes it possible to calculate distance between addresses
- Makes geospatial analysis possible (i.e. identifying geographically similar addresses, etc.
- Compact way to represent a geographical location
- And more...
Besides these core benefits, there are a number of valuable byproducts of geocoding, including:
- Address validation
- Address normalization
- Street number identification
- Zip Code lookup
- County lookup
I'll gloss over coordinate and GPS theory. The bottom line is that in modern systems latitude and longitude are most often represented as a decimal number in the range -180.000000 to 180.000000. (These values are really degrees, from 0 to 360, but positive / negative is used to separate hemispheres.) Consequently, in SQL Server 2005, I'd suggest using a data type of numeric(9,6) for the coordinate data.
The actual work of geocoding is heavy lifting that needs to be performed by a remote service that has a database of address information. We simply want to pass the address information to such a remote service, and retrieve the XML results. There are several suitable services we could use. I have selected Google's service for this example.
So how should we call the Google geocoding service from SQL Server? We need to make an HTTP request, so we need to use one of these:
- CLR procedure
- External stored procedure
- ActiveX call
All of these have pros and cons. I realize that many would recommend CLR -- but I want to write only native T-SQL code. This means that we'll be using sp_OACreate to use an ActiveX object in this example. First we need to make sure that ActiveX support is enabled in the database:
EXEC sp_configure 'show advanced options', 1
RECONFIGURE
EXEC sp_configure 'Ole Automation Procedures', 1
RECONFIGURE
The basic theory of operation is that we will use the ServerXMLHttp object that is already installed on the server. We will instantiate that object, set the URL property, call the send method, and then retrieve the results.
The core of the procedure is this:
EXEC @Result = sp_OACreate 'MSXML2.ServerXMLHttp', @Obj OUT
EXEC @Result = sp_OAMethod @Obj, 'open', NULL, 'GET', @URL, false
EXEC @Result = sp_OAMethod @Obj, 'setRequestHeader', NULL, 'Content-Type', 'application/x-www-form-urlencoded'
EXEC @Result = sp_OAMethod @Obj, send, NULL, ''
EXEC @Result = sp_OAGetProperty @Obj, 'status', @HTTPStatus OUT
EXEC @Result = sp_OAGetProperty @Obj, 'responseXML.xml', @Response OUT
EXEC @Result = sp_OADestroy @Obj
SET @XML = CAST(@Response AS XML)
SET @GPSLatitude = @XML.value('(/GeocodeResponse/result/geometry/location/lat) [1]', 'numeric(9,6)')
SET @GPSLongitude = @XML.value('(/GeocodeResponse/result/geometry/location/lng) [1]', 'numeric(9,6)')
Please see the attached script for the complete declaration of the spGeocode stored procedure.
Once the spGeocode stored procedure has been created by executing the attached script, we can geocode an address like this:
EXEC spGeocode
@Address = '1234 N. Main Street',
@City = 'Santa Ana',
@State = 'CA'
GPSLatitude | GPSLongitude | City | State | PostalCode | Address |
33.755327 | -117.867622 | Santa Ana | CA | 92701 | 1234 N Main St |
Neat! We get the Latitude and Longitude returned just as we had hoped. As an added bonus, we also have the Postal Code.
But as useful, if not more so, is the validation and scrubbing of the address. For example, suppose we didn't have separate fields for City, State and Zip. We could call the procedure as follows:
EXEC spGeocode '1234 N. Main Street, Santa Ana, CA'
GPSLatitude | GPSLongitude | City | State | PostalCode | Address |
33.755327 | -117.867622 | Santa Ana | CA | 92701 | 1234 N Main St |
How about that! We get the same results back, with City, State and Postal Code split out into separate columns.
Or what if we have imprecise information, such as this?
EXEC spGeocode '123 ELm STreeT, CArlsbAd, CA'
The results are:
GPSLatitude | GPSLongitude | City | State | PostalCode | Address |
33.188985 | -117.378193 | Oceanside | CA | 92054 | 123 Elm St |
Notice that the city is actually neighboring "Oceanside", and not "Carlsbad".
Finally, if you actually run this sample code, you will see three additional columns returned:
- County
- MapURL
- XMLResults
MapURL is useful but not fancy: I just concatenated together a static URL to open Google Maps to the GPS coordinates that were returned to us.
MapURL |
http://maps.google.com/maps?f=q&hl=en&q=33.188985+-117.378193 |
XMLResults contains the raw results that were returned by the geocoding service. Here is an excerpt:
<GeocodeResponse>
<status>OK</status>
<result>
<type>street_address</type>
<formatted_address>123 Elm St, Oceanside, CA 92054, USA</formatted_address>
<address_component>
<long_name>123</long_name>
<short_name>123</short_name>
<type>street_number</type>
</address_component>
<address_component>
<long_name>Elm St</long_name>
<short_name>Elm St</short_name>
<type>route</type>
</address_component>
<address_component>
<long_name>Oceanside</long_name>
<short_name>Oceanside</short_name>
<type>locality</type>
<type>political</type>
</address_component>
...
<geometry>
<location>
<lat>33.1889849</lat>
<lng>-117.3781934</lng>
</location>
<location_type>RANGE_INTERPOLATED</location_type>
...
</geometry>
<partial_match>true</partial_match>
</result>
</GeocodeResponse>
A few notes on this sample code:
- See http://code.google.com/apis/maps/documentation/geocoding for documentation on Google's geocoding service
- Even though ServerXMLHttp can return a native XML property, ActiveX / ADO does not currently support parameters of the XML, varchar(MAX) or nvarchar(MAX) types--so we are stuck using a varchar(8000) for our results, and then casting this to the XML type.
- I realize that the Microsoft purists will point out that a CLR procedure is at least a more modern (if not a better) way to make the HTTP call, but that seems to me to open up a whole new facet of complexity. Personally, I favor using sp_OACreate in this case so that we don't need to mess with CLR for such a simple task.
- I declared the stored procedure parameters as OUTPUT parameters to make it easy to store results in variables if needed, such as when looping through a table to update coordinate and postal code data for existing rows.
This example shows several useful things, including:
- Obtaining GPS coordinates for an address
- Validating an address
- Normalizing address information into appropriate fields
- Looking up postal code and county information
- Obtaining a URL to a map of an address
This example does things in a way that is very easy for a T-SQL programmer to use and understand. This example also shows some simple real-world uses for SQL 2005 XQuery (to parse out data from the XML results), and for the sp_OAxxxxx procedures.
Think of all the fun you can have with this!
Happy geocoding!