January 29, 2010 at 10:28 am
Hi can someone please help with the following problem...
I have a table - MapItems
Structure... MapItemID int
ObjectCoords varchar(max)
This table has thousands of records.
The table contains a list of polygon records. The coordinates of each polygon are stored in metres in the ObjectCoords field in the form x1,y1,x2,y2,x3,y3,x4,y4,x5,y5...
Example: Record 1: "57001,23004,57023,23123,57104,23201,57001,23004,"
Each record may have an almost unlimited set of coordinates due to the particular polygons complexity but most would have no more than around 20 sets of X and Y.
I need to create a Stored Proc which will pass in a pair of coordinates plus a distance in metres (search area) then return all records where one or more SETS of that record's coordinates are within the search area.
As far as I can tell, what I need to do is get the data into the following format...
From :-
Record 1 "57001,23004,57023,23123,57104,23201,57001,23004,"
Record 2 "...,...,...,...,...,...,...,...,
Record 3 "...,...,...,...,...,...,...,...,...,...,
To :-
ID X Y
1 57001 23004
1 57023 23123
1 57104 23201
1 57001 23004
2 ... ... -Next Record start
2 ... ...
2 ... ...
3 ... ... - Next Record start
From this I can then do a :-
select distinct ID
from table
Where X >= @PassedInX-Distance and X <= @PassedInX+Distance
and Y >= @PassedInY-Distance and Y <= @PassedInY+Distance
Can someone help me get the data into the correct format for my select statement.
I had though about using a numbers table and some sort of in-line UDF using the ObjectCoords as a parameter. So far I've got to the following from a google search...
--Create Numbers Table --
CREATE TABLE dbo.Nbrs(n INT NOT NULL IDENTITY) ;
GO
SET NOCOUNT ON ;
INSERT dbo.Nbrs DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 500
INSERT dbo.Nbrs DEFAULT VALUES ;
DECLARE @p VARCHAR(50)
SET @p = '100,200,300,400,500,600,700,800 --Test data for simulating coordinates --
SELECT SUBSTRING( ',' + @p + ',', n + 1,
CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value",
ROW_NUMBER() OVER ( ORDER BY n ) AS "pos"
FROM Nbrs
WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','
AND n < LEN( ',' + @p + ',' )
It Returns...
1001
2002
3003
4004
5005
6006
7007
8008
I need it to Return...
100200 1
300400 2
500600 3
700800 4
Any halp most greatfully received.
Thanks
CCB
January 29, 2010 at 12:09 pm
I would approach this with a CTE that provides a list of coordinates, then use RowNumber() partitioning while going through the CTE results to calculate distance between each successive pair of points. The CTE can parse the varchar list of coordinates by joining on a tally table (if you don't have one or understand that concept, see this wonderful article by Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/62867/), and using the current tally number to keep track of which pair of coordinates you're dealing with in your parsing functions with charindex().
I hope this makes sense, but don't have a lot of time now to code an example. Please take a look at Jeff's article and let us know how it's going.
-----------
Edit: For some reason, I didn't scroll down and see that you'd already considered a Tally Table. My appologies. I'll leave the post as I wrote it, but please forgive my blindness to what you'd already tried. Jeff's article may still be helpful.
January 29, 2010 at 2:36 pm
John, thanks for your reply - that article was most useful.
I have now set up the following test...
--===== Create and populate the Tally table on the fly
SELECT TOP 10000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Let the public use it
GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC
--===== Create a sample table with a data column for testing --
CREATE TABLE #MyHead
(PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
CsvColumn VARCHAR(500))
INSERT INTO #MyHead
SELECT '123,5010,3,7232,81111,20822'
UNION ALL SELECT '701,222,31,72223,1,2,277'
UNION ALL SELECT '487,732,5'
UNION ALL SELECT '112'
UNION ALL SELECT '53'
UNION ALL SELECT '24,6555'
UNION ALL SELECT '132,2,31212,422,552,62'
SELECT * FROM #MyHead
--===== Split or "Normalize" the whole table at once
SELECT mh.PK,
SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,
FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh
WHERE N < LEN(','+mh.CsvColumn+',')
AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','
The above code gives the values split into seperate rows...
1123
15010
13
17232
181111
120822
2701
2222
231
272223
21
22
2277
3487
3732
35
4112
553
624
66555
7132
72
731212
7422
7552
762
What I need to do now is move the even value rows up 1 row as follows then I have it figured from there - e.g.
1123 5010
13 7232
181111 20822
2701 222
231 72223
21 2
...
Even if I could get the value moved up with the even row left behind...
1123 5010
15010
13 7232
17232
181111 20822
120822
2701 222
2222
231 72223
272223
21 2
22
I could sove the problem.
Can anyone help me with this?
Thanks
Charlotte
January 29, 2010 at 5:00 pm
Charlotte,
First, I want to thank you for providing DDL/DML to show what your working with.
Now, this just expands on what you've started.
--===== Split or "Normalize" the whole table at once
;WITH CTE AS
-- First, add the row number to the mix
(
SELECT mh.PK,
SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,
RowNum = Row_Number() OVER (ORDER BY (SELECT 1))
FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh
WHERE N < LEN(','+mh.CsvColumn+',')
AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','
), CTE2 AS
(
-- RowNum2 is the RowNum if it's an odd number.
-- it will be the previous RowNum if it's an even number.
SELECT PK,
Value1,
RowNum,
RowNum2 = CASE WHEN RowNum % 2 = 0 THEN RowNum - 1 ELSE RowNum END
FROM CTE
)
-- use FOR XML PATH('') to build a space separated string of the values for the PK/RowNum2 combination
-- use STUFF to remove the first space
-- only get the values for the ODD RowNum
SELECT DISTINCT
PK, RowNum, Value1 = stuff((select ' ' + Value1 from CTE2 c2 WHERE c2.PK = c.PK and c2.RowNum2 = c.RowNum2 ORDER BY RowNum FOR XML PATH('')),1,1,'')
FROM CTE2 c
WHERE RowNum % 2 = 1
ORDER BY PK, RowNum
This returns:
PKRowNumValue1
11123 5010
133 7232
1581111 20822
27701 222
2931 72223
2111 2
213277
315732 5
417112
61924 6555
721132 2
72331212 422
725552 62
I think this is what you're looking for.
Edit: changed ROW_NUMBER ORDER BY
Edit2: changed ROW_NUMBER ORDER BY back...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
January 29, 2010 at 5:47 pm
Hi. I just noticed that Wayne's given some ideas, but I haven't gone through them yet. Meanwhile, I worked up a simple case showing how a CTE and Row_Number() could be used to do this. Note the use of the modulo (%) operator to work through the tally table by twos. You should be in good shape now to apply the method to your case.
DECLARE @P VARCHAR(8000)
SET @p = '155400,200445,300,400,500,600,70666660,85500' --Test data for simulating coordinates --
;With CTE (Inx, Coord) as
(SELECT
inx=ROW_NUMBER() over (order by N)
,Coord = SUBSTRING(','+@P+',',N+1,CHARINDEX(',',','+@P+',',N+1)-N-1)
FROM dbo.Tally
WHERE N <= LEN(','+@P+',')
AND SUBSTRING(','+@P+',',N,1) = ','
)
Select X=x.CoOrd
,Y=y.CoOrd
,XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)
from CTE X
join CTE Y on Y.inx = x.inx+1
where y.inx%2 = 0
January 29, 2010 at 7:13 pm
Hi Charlotte,
I do have a question about the data you were kind enough to provide... how can a polygon have a single "X" coordinate and no "Y" coordinate? Why wouldn't every row have at least 1 full coordinate pair (ie; a point on a plane)? And why would any row have an uneven number of elements... for every "X" coordinate, shouldn't there also be a "Y" coordinate?
I'm asking because I want to show a slightly different method than the good folks that have posted already have but I'm confused about your requirements.
Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 30, 2010 at 2:39 am
Guys, thanks for your replies - I will have a look at these today.
Jeff regarding the X and missing Y, you are correct - it is a lapse in my test data (however, I though it was useful like that should the situation ever arise where somehow a Y was missing - although I not sure how this could arise) so ignore it.
The other thing wrong with the sample data is each line should be terminated with a comma (eg a comma after the last y coordinate).
Finally, these values in a real world scenario may have decimals in the string eg "123.098,5010,3,7232.9,81111.9098,20822.87,"
Updated test data as follows...
INSERT INTO #MyHead
SELECT 123.098,5010,3,7232.9,81111.9098,20822.87,'
UNION ALL SELECT '701,222,31,72223,1,2,277,3,'
UNION ALL SELECT '487,732,5,6789,'
UNION ALL SELECT '112,1234,'
UNION ALL SELECT '53,87,'
UNION ALL SELECT '24,6555'
UNION ALL SELECT '132,2,31212,422,552,62,'
I must say I'm most impressed with your article on the tally table but to be honest, when using your code to get me to my current state - it is a bit beyond me - I think it's really clever what it can do but I don't understand it how it works.
Thanks
Charlotte.
January 30, 2010 at 3:21 am
Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.
January 30, 2010 at 3:29 am
John, I've tried your code and it works perfectly. I added a comma to the end of your sample data string to simulate how my data is then I changed the line
WHERE N <= LEN(','+@P+',') to:
WHERE N < LEN(','+@P+',')
and it works perfectly.
However, I can't figure out how to hook this into my sample table (see response to Jeff for updated accurate sample data).
I've updated it to include the key field plus cross joined to the test data table below...
;With CTE (Inx, PK, Coord) as
(SELECT
inx=ROW_NUMBER() over (order by N),
mh.PK,
Coord = SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1)
FROM dbo.Tally CROSS JOIN #MyHead mh
WHERE N < LEN(','+mh.CsvColumn+',')
AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','
)
Select X.PK,
X=x.CoOrd,
Y=y.CoOrd,
XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)
from CTE X
join CTE Y on Y.inx = x.inx+1
where y.inx%2 = 0
But it goes all wrong - I suspect it's something to do with the Row_Order coming in to play but I'm not sure.
Please excuse my ignorance on this.
Thanks
Charlotte.
January 30, 2010 at 10:15 am
Since this is going to be a stored procedure, my recommendation would be to peel just on potato at a time. Let's split the coordinate pairs into a more friendly table first using the technique explained in the following article...
http://www.sqlservercentral.com/articles/T-SQL/63003/
--===== Create the demo table. This is not part of the solution.
CREATE TABLE #MapItems
(MapItemID INT IDENTITY(1,1) PRIMARY KEY, ObjectCoords VARCHAR(8000))
INSERT INTO #MapItems
(ObjectCoords)
SELECT '123.098,5010,3,7232.9,81111.9098,20822.87,' UNION ALL
SELECT '701,222,31,72223,1,2,277,3,' UNION ALL
SELECT '487,732,5,6789,' UNION ALL
SELECT '112,1234,' UNION ALL
SELECT '53,87,' UNION ALL
SELECT '24,6555,' UNION ALL
SELECT '132,2,31212,422,552,62,'
;
--===== Create a numbered coordinate pair result set in a temp table from the test data
WITH
cteSplit AS
( --=== Split each coordinate value out into a numbered vertical result set
SELECT (ROW_NUMBER() OVER (PARTITION BY m.MapItemID ORDER BY m.MapItemID,t.N)-1) AS ElementNumber,
m.MapItemID,
SUBSTRING(','+m.ObjectCoords,t.N+1,CHARINDEX(',',m.ObjectCoords,t.N)-t.N) AS Coordinate
FROM dbo.Tally t
CROSS JOIN #MapItems m
WHERE t.N > 0 AND t.N < LEN(m.ObjectCoords)+1
AND SUBSTRING(','+m.ObjectCoords,t.N,1) = ','
),
cteArray AS
( --=== Add row and column numbers to the individual coordinate parts
SELECT MapItemID,
ElementNumber/2 AS RowNum,
ElementNumber%2 AS ColNum,
Coordinate
FROM cteSplit
) --=== Reassemble the coordinate parts into coordinate pairs
SELECT MapItemID,
RowNum+1 AS CoordinateNumber,
MAX(CASE WHEN ColNum = 0 THEN Coordinate END) AS X,
MAX(CASE WHEN ColNum = 1 THEN Coordinate END) AS Y
INTO #CoordinatePairs
FROM cteArray
GROUP BY MapItemID, RowNum
;
SELECT * FROM #CoordinatePairs
;
From there, it should be a simple matter of comparing these coordinates to a coordinate pair and distance in another table.
I do have to warn... if this is for geographical analysis of any type and "X" and "Y" are actually Lat/Lon, you will have to take into account the curvature of the Earth and the fact that lines of longitude are not parallel. A Google search will reveal how to do that.
Also bear in mind that if you have a huge number of original coordinate pairs to check against, just about any code (even simple distance calcs) will benefit strongly from some of the optimizations to narrow down the search which a Google search would also show. For example, you can very simply narrow down the range of Latitudes to search by determining the min and max Latitudes to search using a given coordinate pair and its related distance.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 30, 2010 at 2:44 pm
Jeff, I can't thank you enough for what you have given me - it's exactly what I needed - you are a star and your efforts (and those of John and Wayne who also responded) are very much appreciated.
Thanks for the advice about Lat Long - however, the coordinate pairs in my real world DB are all in Metres (UK National Grid system). I'll do a Google search just to make sure there are no perculiarities of using this system when comparing distances and a search area.
Once again, many thanks!
January 30, 2010 at 6:35 pm
Thanks, Charlotte. Heh... not a star... just a guy who's had to do the same thing before. We have such a grid coordinate system for the US and Canada although it's not as resolute as meters (each whole value is the squareroot of 10 miles). It's known to some as the Telcordia V&H Coordinate System and to others it's known as the "Donald Elliptic Projection". A couple of it's more common uses are to calculate mileage for phone calls, between airports, and for shipping. Since it has the effect of flattening the Earth, distance calculations no longer need the complexity of "curvature of the Earth" functions... works just like the UK grid system.
As a side bar, since you are using a grid system, optimizations are easy. You can quickly set limits for both X and Y in a manner similar to that which I suggested for setting limits on Latitude. It's very well worth the extra bit of coding.
Anyway, glad it worked out and thanks again for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2010 at 5:14 am
I would just add that if you are doing a lot of work with this type of thing, you might want to take a look at the geometry data type built in to 2008 (as and when you decide to upgrade). This provides all sorts of handy built-in methods for calculating distances and such. There is also the decided advantage of being able to create spatial indexes...these can speed up queries by an amazing amount.
See Getting Started With The Geometry Data Type for further information.
January 31, 2010 at 12:57 pm
Hi Paul, Thanks for the info. Unfortunatly most of the people who use our software areon 2005 and it will probably be several years before they all go to 2008 so that I can then make use of the extra functionality. However, I am now thinking about looking into the Spacial Extensions for 2005.
Regards
Charlotte
January 31, 2010 at 7:48 pm
Charlottecb (1/30/2010)
Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.
Charlotte,
It wasn't you... the edit I made to the ROW_NUMBER ORDER BY broke it. I changed it back, and it works now.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply