Extract X and Y coordinates out of a string and check if within range

Question

Extract X and Y coordinates out of a string and check if within range

Charlottecb

Hall of Fame

Points: 3932
More actions
January 29, 2010 at 10:28 am

#216217

Hi can someone please help with the following problem...
I have a table - MapItems
Structure... MapItemID int
ObjectCoords varchar(max)
This table has thousands of records.
The table contains a list of polygon records. The coordinates of each polygon are stored in metres in the ObjectCoords field in the form x1,y1,x2,y2,x3,y3,x4,y4,x5,y5...
Example: Record 1: "57001,23004,57023,23123,57104,23201,57001,23004,"
Each record may have an almost unlimited set of coordinates due to the particular polygons complexity but most would have no more than around 20 sets of X and Y.
I need to create a Stored Proc which will pass in a pair of coordinates plus a distance in metres (search area) then return all records where one or more SETS of that record's coordinates are within the search area.
As far as I can tell, what I need to do is get the data into the following format...
From :-
Record 1 "57001,23004,57023,23123,57104,23201,57001,23004,"
Record 2 "...,...,...,...,...,...,...,...,
Record 3 "...,...,...,...,...,...,...,...,...,...,
To :-
ID X Y
1 57001 23004
1 57023 23123
1 57104 23201
1 57001 23004
2 ... ... -Next Record start
2 ... ...
2 ... ...
3 ... ... - Next Record start
From this I can then do a :-
select distinct ID
from table
Where X >= @PassedInX-Distance and X <= @PassedInX+Distance
and Y >= @PassedInY-Distance and Y <= @PassedInY+Distance
Can someone help me get the data into the correct format for my select statement.
I had though about using a numbers table and some sort of in-line UDF using the ObjectCoords as a parameter. So far I've got to the following from a google search...
--Create Numbers Table --
CREATE TABLE dbo.Nbrs(n INT NOT NULL IDENTITY) ;
GO
SET NOCOUNT ON ;
INSERT dbo.Nbrs DEFAULT VALUES ;
WHILE SCOPE_IDENTITY() < 500
INSERT dbo.Nbrs DEFAULT VALUES ;
DECLARE @p VARCHAR(50)
SET @p = '100,200,300,400,500,600,700,800 --Test data for simulating coordinates --
SELECT SUBSTRING( ',' + @p + ',', n + 1,
CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value",
ROW_NUMBER() OVER ( ORDER BY n ) AS "pos"
FROM Nbrs
WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','
AND n < LEN( ',' + @p + ',' )
It Returns...
1001
2002
3003
4004
5005
6006
7007
8008
I need it to Return...
100200 1
300400 2
500600 3
700800 4
Any halp most greatfully received.
Thanks
CCB

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

john.arnott SSChampion Points: 11882 More actions · Answer 1

I would approach this with a CTE that provides a list of coordinates, then use RowNumber() partitioning while going through the CTE results to calculate distance between each successive pair of points. The CTE can parse the varchar list of coordinates by joining on a tally table (if you don't have one or understand that concept, see this wonderful article by Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/62867/), and using the current tally number to keep track of which pair of coordinates you're dealing with in your parsing functions with charindex().

I hope this makes sense, but don't have a lot of time now to code an example. Please take a look at Jeff's article and let us know how it's going.

-----------

Edit: For some reason, I didn't scroll down and see that you'd already considered a Tally Table. My appologies. I'll leave the post as I wrote it, but please forgive my blindness to what you'd already tried. Jeff's article may still be helpful.

Charlottecb Hall of Fame Points: 3932 More actions · Answer 2

John, thanks for your reply - that article was most useful.

I have now set up the following test...

--===== Create and populate the Tally table on the fly

SELECT TOP 10000

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Let the public use it

GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

--===== Create a sample table with a data column for testing --

CREATE TABLE #MyHead

(PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

CsvColumn VARCHAR(500))

INSERT INTO #MyHead

SELECT '123,5010,3,7232,81111,20822'

UNION ALL SELECT '701,222,31,72223,1,2,277'

UNION ALL SELECT '487,732,5'

UNION ALL SELECT '112'

UNION ALL SELECT '53'

UNION ALL SELECT '24,6555'

UNION ALL SELECT '132,2,31212,422,552,62'

SELECT * FROM #MyHead

--===== Split or "Normalize" the whole table at once

SELECT mh.PK,

SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,

FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh

WHERE N < LEN(','+mh.CsvColumn+',')

AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

The above code gives the values split into seperate rows...

1123

15010

13

17232

181111

120822

2701

2222

231

272223

21

22

2277

3487

3732

35

4112

553

624

66555

7132

72

731212

7422

7552

762

What I need to do now is move the even value rows up 1 row as follows then I have it figured from there - e.g.

1123 5010

13 7232

181111 20822

2701 222

231 72223

21 2

...

Even if I could get the value moved up with the even row left behind...

1123 5010

15010

13 7232

17232

181111 20822

120822

2701 222

2222

231 72223

272223

21 2

22

I could sove the problem.

Can anyone help me with this?

Thanks

Charlotte

WayneS SSC Guru Points: 95460 More actions · Answer 3

Charlotte,

First, I want to thank you for providing DDL/DML to show what your working with.

Now, this just expands on what you've started.

--===== Split or "Normalize" the whole table at once

;WITH CTE AS

-- First, add the row number to the mix

(

SELECT mh.PK,

SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,

RowNum = Row_Number() OVER (ORDER BY (SELECT 1))

FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh

WHERE N < LEN(','+mh.CsvColumn+',')

AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

), CTE2 AS

(

-- RowNum2 is the RowNum if it's an odd number.

-- it will be the previous RowNum if it's an even number.

SELECT PK,

Value1,

RowNum,

RowNum2 = CASE WHEN RowNum % 2 = 0 THEN RowNum - 1 ELSE RowNum END

FROM CTE

)

-- use FOR XML PATH('') to build a space separated string of the values for the PK/RowNum2 combination

-- use STUFF to remove the first space

-- only get the values for the ODD RowNum

SELECT DISTINCT

PK, RowNum, Value1 = stuff((select ' ' + Value1 from CTE2 c2 WHERE c2.PK = c.PK and c2.RowNum2 = c.RowNum2 ORDER BY RowNum FOR XML PATH('')),1,1,'')

FROM CTE2 c

WHERE RowNum % 2 = 1

ORDER BY PK, RowNum

This returns:

PKRowNumValue1
11123 5010
133 7232
1581111 20822
27701 222
2931 72223
2111 2
213277
315732 5
417112
61924 6555
721132 2
72331212 422
725552 62

I think this is what you're looking for.

Edit: changed ROW_NUMBER ORDER BY

Edit2: changed ROW_NUMBER ORDER BY back...

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

john.arnott SSChampion Points: 11882 More actions · Answer 4

Hi. I just noticed that Wayne's given some ideas, but I haven't gone through them yet. Meanwhile, I worked up a simple case showing how a CTE and Row_Number() could be used to do this. Note the use of the modulo (%) operator to work through the tally table by twos. You should be in good shape now to apply the method to your case.

DECLARE @P VARCHAR(8000)

SET @p = '155400,200445,300,400,500,600,70666660,85500' --Test data for simulating coordinates --

;With CTE (Inx, Coord) as

(SELECT

inx=ROW_NUMBER() over (order by N)

,Coord = SUBSTRING(','+@P+',',N+1,CHARINDEX(',',','+@P+',',N+1)-N-1)

FROM dbo.Tally

WHERE N <= LEN(','+@P+',')

AND SUBSTRING(','+@P+',',N,1) = ','

)

Select X=x.CoOrd

,Y=y.CoOrd

,XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)

from CTE X

join CTE Y on Y.inx = x.inx+1

where y.inx%2 = 0

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 5

Hi Charlotte,

I do have a question about the data you were kind enough to provide... how can a polygon have a single "X" coordinate and no "Y" coordinate? Why wouldn't every row have at least 1 full coordinate pair (ie; a point on a plane)? And why would any row have an uneven number of elements... for every "X" coordinate, shouldn't there also be a "Y" coordinate?

I'm asking because I want to show a slightly different method than the good folks that have posted already have but I'm confused about your requirements.

Thanks.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Charlottecb Hall of Fame Points: 3932 More actions · Answer 6

Guys, thanks for your replies - I will have a look at these today.

Jeff regarding the X and missing Y, you are correct - it is a lapse in my test data (however, I though it was useful like that should the situation ever arise where somehow a Y was missing - although I not sure how this could arise) so ignore it.

The other thing wrong with the sample data is each line should be terminated with a comma (eg a comma after the last y coordinate).

Finally, these values in a real world scenario may have decimals in the string eg "123.098,5010,3,7232.9,81111.9098,20822.87,"

Updated test data as follows...

INSERT INTO #MyHead

SELECT 123.098,5010,3,7232.9,81111.9098,20822.87,'

UNION ALL SELECT '701,222,31,72223,1,2,277,3,'

UNION ALL SELECT '487,732,5,6789,'

UNION ALL SELECT '112,1234,'

UNION ALL SELECT '53,87,'

UNION ALL SELECT '24,6555'

UNION ALL SELECT '132,2,31212,422,552,62,'

I must say I'm most impressed with your article on the tally table but to be honest, when using your code to get me to my current state - it is a bit beyond me - I think it's really clever what it can do but I don't understand it how it works.

Thanks

Charlotte.

Charlottecb Hall of Fame Points: 3932 More actions · Answer 7

Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.

Charlottecb Hall of Fame Points: 3932 More actions · Answer 8

John, I've tried your code and it works perfectly. I added a comma to the end of your sample data string to simulate how my data is then I changed the line

WHERE N <= LEN(','+@P+',') to:

WHERE N < LEN(','+@P+',')

and it works perfectly.

However, I can't figure out how to hook this into my sample table (see response to Jeff for updated accurate sample data).

I've updated it to include the key field plus cross joined to the test data table below...

;With CTE (Inx, PK, Coord) as

(SELECT

inx=ROW_NUMBER() over (order by N),

mh.PK,

Coord = SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1)

FROM dbo.Tally CROSS JOIN #MyHead mh

WHERE N < LEN(','+mh.CsvColumn+',')

AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

)

Select X.PK,

X=x.CoOrd,

Y=y.CoOrd,

XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)

from CTE X

join CTE Y on Y.inx = x.inx+1

where y.inx%2 = 0

But it goes all wrong - I suspect it's something to do with the Row_Order coming in to play but I'm not sure.

Please excuse my ignorance on this.

Thanks

Charlotte.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

Since this is going to be a stored procedure, my recommendation would be to peel just on potato at a time. Let's split the coordinate pairs into a more friendly table first using the technique explained in the following article...

http://www.sqlservercentral.com/articles/T-SQL/63003/

--===== Create the demo table. This is not part of the solution.

CREATE TABLE #MapItems

(MapItemID INT IDENTITY(1,1) PRIMARY KEY, ObjectCoords VARCHAR(8000))

INSERT INTO #MapItems

(ObjectCoords)

SELECT '123.098,5010,3,7232.9,81111.9098,20822.87,' UNION ALL

SELECT '701,222,31,72223,1,2,277,3,' UNION ALL

SELECT '487,732,5,6789,' UNION ALL

SELECT '112,1234,' UNION ALL

SELECT '53,87,' UNION ALL

SELECT '24,6555,' UNION ALL

SELECT '132,2,31212,422,552,62,'

;

--===== Create a numbered coordinate pair result set in a temp table from the test data

WITH

cteSplit AS

( --=== Split each coordinate value out into a numbered vertical result set

SELECT (ROW_NUMBER() OVER (PARTITION BY m.MapItemID ORDER BY m.MapItemID,t.N)-1) AS ElementNumber,

m.MapItemID,

SUBSTRING(','+m.ObjectCoords,t.N+1,CHARINDEX(',',m.ObjectCoords,t.N)-t.N) AS Coordinate

FROM dbo.Tally t

CROSS JOIN #MapItems m

WHERE t.N > 0 AND t.N < LEN(m.ObjectCoords)+1

AND SUBSTRING(','+m.ObjectCoords,t.N,1) = ','

),

cteArray AS

( --=== Add row and column numbers to the individual coordinate parts

SELECT MapItemID,

ElementNumber/2 AS RowNum,

ElementNumber%2 AS ColNum,

Coordinate

FROM cteSplit

) --=== Reassemble the coordinate parts into coordinate pairs

SELECT MapItemID,

RowNum+1 AS CoordinateNumber,

MAX(CASE WHEN ColNum = 0 THEN Coordinate END) AS X,

MAX(CASE WHEN ColNum = 1 THEN Coordinate END) AS Y

INTO #CoordinatePairs

FROM cteArray

GROUP BY MapItemID, RowNum

;

SELECT * FROM #CoordinatePairs

;

From there, it should be a simple matter of comparing these coordinates to a coordinate pair and distance in another table.

I do have to warn... if this is for geographical analysis of any type and "X" and "Y" are actually Lat/Lon, you will have to take into account the curvature of the Earth and the fact that lines of longitude are not parallel. A Google search will reveal how to do that.

Also bear in mind that if you have a huge number of original coordinate pairs to check against, just about any code (even simple distance calcs) will benefit strongly from some of the optimizations to narrow down the search which a Google search would also show. For example, you can very simply narrow down the range of Latitudes to search by determining the min and max Latitudes to search using a given coordinate pair and its related distance.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Charlottecb Hall of Fame Points: 3932 More actions · Answer 10

Jeff, I can't thank you enough for what you have given me - it's exactly what I needed - you are a star and your efforts (and those of John and Wayne who also responded) are very much appreciated.

Thanks for the advice about Lat Long - however, the coordinate pairs in my real world DB are all in Metres (UK National Grid system). I'll do a Google search just to make sure there are no perculiarities of using this system when comparing distances and a search area.

Once again, many thanks!

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 11

Thanks, Charlotte. Heh... not a star... just a guy who's had to do the same thing before. We have such a grid coordinate system for the US and Canada although it's not as resolute as meters (each whole value is the squareroot of 10 miles). It's known to some as the Telcordia V&H Coordinate System and to others it's known as the "Donald Elliptic Projection". A couple of it's more common uses are to calculate mileage for phone calls, between airports, and for shipping. Since it has the effect of flattening the Earth, distance calculations no longer need the complexity of "curvature of the Earth" functions... works just like the UK grid system.

As a side bar, since you are using a grid system, optimizations are easy. You can quickly set limits for both X and Y in a manner similar to that which I suggested for setting limits on Latitude. It's very well worth the extra bit of coding.

Anyway, glad it worked out and thanks again for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150467 More actions · Answer 12

I would just add that if you are doing a lot of work with this type of thing, you might want to take a look at the geometry data type built in to 2008 (as and when you decide to upgrade). This provides all sorts of handy built-in methods for calculating distances and such. There is also the decided advantage of being able to create spatial indexes...these can speed up queries by an amazing amount.

See Getting Started With The Geometry Data Type for further information.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Charlottecb Hall of Fame Points: 3932 More actions · Answer 13

Hi Paul, Thanks for the info. Unfortunatly most of the people who use our software areon 2005 and it will probably be several years before they all go to 2008 so that I can then make use of the extra functionality. However, I am now thinking about looking into the Spacial Extensions for 2005.

Regards

Charlotte

WayneS SSC Guru Points: 95460 More actions · Answer 14

Charlottecb (1/30/2010)
Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.

Charlotte,

It wasn't you... the edit I made to the ROW_NUMBER ORDER BY broke it. I changed it back, and it works now.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2