May 11, 2006 at 1:24 am
Okay so I am new to SQL and data, but my thought is the best way to learn is by applying myself to a problem.
So here is mine,
The data Set is a table for GEO Coding so we have the normal column headings such as
Province, Municipality, Town, Suburb, Street, ERF/PLOT number, X and Y coordinates, Street Number
I have the table populated with about 3.5mil rows for now, and I want to structure the data a little better by creating a parent child relationship structure. So as there is no real unique ID in the data as yet because the source of the day does not supply it and row number is not a real option as relational data may change.
The geographical structure natrually lends itself to a data structure like such;
Country,
Province,
Municipality (Region),
Town,
Suburb,
Street,
Street number
ERF/PLOT number
With the above in mind I have decided that I will create two new tables, one to contain the actual values with a unique ID as they are created, the other will contain the structure by virtue of ID to ID relationship with a relationship type value. As each record in a row is unique by virtue of its relationship to its parent we can allow duplicate values as, for example, street name and street number is guaranteed to be duplicated at some point within a Coutry. It is important to note that in South Africa that all names are subject to change.
So I have started by creating a nested cursor structure to extract this relationship, the example below only returns to the suburb level for now just to test the logic of the data extraction, and it ain't workin too well. Does anyone have any ideas?
use data_transfer
go
SET NOCOUNT ON
DECLARE @Province varchar(128), @Munic varchar(128), @LOC_ID Int, @town varchar (128)
SET @LOC_ID=0
---begin the Province lookup----------
DECLARE Province_lookup CURSOR FOR
SELECT distinct province FROM dbo.GEO_Code_Full ORDER BY province
OPEN Province_lookup
FETCH NEXT FROM Province_lookup
INTO @province
WHILE @@FETCH_STATUS = 0
BEGIN
SET @LOC_ID=@LOC_ID+1-----set the location ID to increment------
DECLARE Munic_Search CURSOR FOR
SELECT Munic FROM dbo.GEO_Code_Full WHERE province like @Province-- Variable value from the outer cursor
OPEN Munic_Search
FETCH NEXT FROM Munic_Search INTO @Munic
----begin town Lookup-----
DECLARE town_Search CURSOR FOR
SELECT town FROM dbo.GEO_Code_Full WHERE munic like @MUNIC-- Variable value from the outer cursor
OPEN town_Search
FETCH NEXT FROM town_Search INTO @town
WHILE @@FETCH_STATUS = 0
BEGIN
SET @LOC_ID=@LOC_ID+1
FETCH NEXT FROM town_Search INTO @town
--select @loc_id as 'id', @town as 'town'
END
CLOSE town_Search
DEALLOCATE town_Search
WHILE @@FETCH_STATUS = 0
BEGIN
SET @LOC_ID=@LOC_ID+1
FETCH NEXT FROM Munic_Search INTO @Munic
--select @loc_id as 'id', @munic as 'Municipality'
END
CLOSE Munic_Search
DEALLOCATE Munic_Search
-- Get the next Province.
FETCH NEXT FROM Province_lookup
INTO @Province
--select @LOC_ID as 'ID', @Province as 'Province'---
Print 'For province'+@Province
Print @LOC_ID
END
CLOSE Province_lookup
DEALLOCATE Province_lookup
GO
May 11, 2006 at 2:55 am
Cursors, especially nested cursors are not going to work very well, ever. I suggest a set-based solution.
This isn't a complete solution, but should give you a good idea where to start.
Create the tables and set the ID column as IDENTITY, then...
INSERT INTO Provinces (ProvinceName)
SELECT distinct province FROM dbo.GEO_Code_Full
INSERT INTO Municipality (MunicipalityName, ProvinceID
SELECT DISTINCT Municipality, ProvinceID FROM dbo.GEO_Code_Full GeoFull INNER JOIN Provinces ON GeoFull.province = Provinces.ProvinceName
etc...
I also would sugest stopping the hierachy around street and having in the bottom table street, street number and plot number. Too many lookups will slow the system down a lot. I know they like renaming streets, but even our government doesn't do it every day.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 11, 2006 at 7:58 am
Thanks,
I have made a quick test as you have shown above and looks very promising. On the subject of the nested cursors, the use was for an initial restructure, thereafter they will not be needed as I will have to create a routine to determine changes from the source, which is easy out of production as I can take it in chunks and process only changes back into the live.
With regards to the comment around hierarchy,
Well what I have done is, as you suggested, created both a GUID as well as a sequential auto number column, I now use the sequential number to build up the hierarchy in a separate table, so the lookup will be against the 'object_id' when looking at the hierarchy in a graphical view.
The UI will make the call once a tree is expanded and retain it in cache until either a refresh from the client is requested or a push request based on a change.
I have a status change table to support this.
Anyways, I will post what works with some sample tables to demonstrate, unless there are others with ideas...
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply