Extract X and Y coordinates out of a string and check if within range

  • Hi can someone please help with the following problem...

    I have a table - MapItems

    Structure... MapItemID int

    ObjectCoords varchar(max)

    This table has thousands of records.

    The table contains a list of polygon records. The coordinates of each polygon are stored in metres in the ObjectCoords field in the form x1,y1,x2,y2,x3,y3,x4,y4,x5,y5...

    Example: Record 1: "57001,23004,57023,23123,57104,23201,57001,23004,"

    Each record may have an almost unlimited set of coordinates due to the particular polygons complexity but most would have no more than around 20 sets of X and Y.

    I need to create a Stored Proc which will pass in a pair of coordinates plus a distance in metres (search area) then return all records where one or more SETS of that record's coordinates are within the search area.

    As far as I can tell, what I need to do is get the data into the following format...

    From :-

    Record 1 "57001,23004,57023,23123,57104,23201,57001,23004,"

    Record 2 "...,...,...,...,...,...,...,...,

    Record 3 "...,...,...,...,...,...,...,...,...,...,

    To :-

    ID X Y

    1 57001 23004

    1 57023 23123

    1 57104 23201

    1 57001 23004

    2 ... ... -Next Record start

    2 ... ...

    2 ... ...

    3 ... ... - Next Record start

    From this I can then do a :-

    select distinct ID

    from table

    Where X >= @PassedInX-Distance and X <= @PassedInX+Distance

    and Y >= @PassedInY-Distance and Y <= @PassedInY+Distance

    Can someone help me get the data into the correct format for my select statement.

    I had though about using a numbers table and some sort of in-line UDF using the ObjectCoords as a parameter. So far I've got to the following from a google search...

    --Create Numbers Table --

    CREATE TABLE dbo.Nbrs(n INT NOT NULL IDENTITY) ;

    GO

    SET NOCOUNT ON ;

    INSERT dbo.Nbrs DEFAULT VALUES ;

    WHILE SCOPE_IDENTITY() < 500

    INSERT dbo.Nbrs DEFAULT VALUES ;

    DECLARE @p VARCHAR(50)

    SET @p = '100,200,300,400,500,600,700,800 --Test data for simulating coordinates --

    SELECT SUBSTRING( ',' + @p + ',', n + 1,

    CHARINDEX( ',', ',' + @p + ',', n + 1 ) - n - 1 ) AS "value",

    ROW_NUMBER() OVER ( ORDER BY n ) AS "pos"

    FROM Nbrs

    WHERE SUBSTRING( ',' + @p + ',', n, 1 ) = ','

    AND n < LEN( ',' + @p + ',' )

    It Returns...

    1001

    2002

    3003

    4004

    5005

    6006

    7007

    8008

    I need it to Return...

    100200 1

    300400 2

    500600 3

    700800 4

    Any halp most greatfully received.

    Thanks

    CCB

  • I would approach this with a CTE that provides a list of coordinates, then use RowNumber() partitioning while going through the CTE results to calculate distance between each successive pair of points. The CTE can parse the varchar list of coordinates by joining on a tally table (if you don't have one or understand that concept, see this wonderful article by Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/62867/), and using the current tally number to keep track of which pair of coordinates you're dealing with in your parsing functions with charindex().

    I hope this makes sense, but don't have a lot of time now to code an example. Please take a look at Jeff's article and let us know how it's going.

    -----------

    Edit: For some reason, I didn't scroll down and see that you'd already considered a Tally Table. My appologies. I'll leave the post as I wrote it, but please forgive my blindness to what you'd already tried. Jeff's article may still be helpful.

  • John, thanks for your reply - that article was most useful.

    I have now set up the following test...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 10000

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Let the public use it

    GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

    --===== Create a sample table with a data column for testing --

    CREATE TABLE #MyHead

    (PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    CsvColumn VARCHAR(500))

    INSERT INTO #MyHead

    SELECT '123,5010,3,7232,81111,20822'

    UNION ALL SELECT '701,222,31,72223,1,2,277'

    UNION ALL SELECT '487,732,5'

    UNION ALL SELECT '112'

    UNION ALL SELECT '53'

    UNION ALL SELECT '24,6555'

    UNION ALL SELECT '132,2,31212,422,552,62'

    SELECT * FROM #MyHead

    --===== Split or "Normalize" the whole table at once

    SELECT mh.PK,

    SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,

    FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh

    WHERE N < LEN(','+mh.CsvColumn+',')

    AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

    The above code gives the values split into seperate rows...

    1123

    15010

    13

    17232

    181111

    120822

    2701

    2222

    231

    272223

    21

    22

    2277

    3487

    3732

    35

    4112

    553

    624

    66555

    7132

    72

    731212

    7422

    7552

    762

    What I need to do now is move the even value rows up 1 row as follows then I have it figured from there - e.g.

    1123 5010

    13 7232

    181111 20822

    2701 222

    231 72223

    21 2

    ...

    Even if I could get the value moved up with the even row left behind...

    1123 5010

    15010

    13 7232

    17232

    181111 20822

    120822

    2701 222

    2222

    231 72223

    272223

    21 2

    22

    I could sove the problem.

    Can anyone help me with this?

    Thanks

    Charlotte

  • Charlotte,

    First, I want to thank you for providing DDL/DML to show what your working with.

    Now, this just expands on what you've started.

    --===== Split or "Normalize" the whole table at once

    ;WITH CTE AS

    -- First, add the row number to the mix

    (

    SELECT mh.PK,

    SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1) AS Value1,

    RowNum = Row_Number() OVER (ORDER BY (SELECT 1))

    FROM TempDB.dbo.Tally t CROSS JOIN #MyHead mh

    WHERE N < LEN(','+mh.CsvColumn+',')

    AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

    ), CTE2 AS

    (

    -- RowNum2 is the RowNum if it's an odd number.

    -- it will be the previous RowNum if it's an even number.

    SELECT PK,

    Value1,

    RowNum,

    RowNum2 = CASE WHEN RowNum % 2 = 0 THEN RowNum - 1 ELSE RowNum END

    FROM CTE

    )

    -- use FOR XML PATH('') to build a space separated string of the values for the PK/RowNum2 combination

    -- use STUFF to remove the first space

    -- only get the values for the ODD RowNum

    SELECT DISTINCT

    PK, RowNum, Value1 = stuff((select ' ' + Value1 from CTE2 c2 WHERE c2.PK = c.PK and c2.RowNum2 = c.RowNum2 ORDER BY RowNum FOR XML PATH('')),1,1,'')

    FROM CTE2 c

    WHERE RowNum % 2 = 1

    ORDER BY PK, RowNum

    This returns:

    PKRowNumValue1

    11123 5010

    133 7232

    1581111 20822

    27701 222

    2931 72223

    2111 2

    213277

    315732 5

    417112

    61924 6555

    721132 2

    72331212 422

    725552 62

    I think this is what you're looking for.

    Edit: changed ROW_NUMBER ORDER BY

    Edit2: changed ROW_NUMBER ORDER BY back...

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi. I just noticed that Wayne's given some ideas, but I haven't gone through them yet. Meanwhile, I worked up a simple case showing how a CTE and Row_Number() could be used to do this. Note the use of the modulo (%) operator to work through the tally table by twos. You should be in good shape now to apply the method to your case.

    DECLARE @P VARCHAR(8000)

    SET @p = '155400,200445,300,400,500,600,70666660,85500' --Test data for simulating coordinates --

    ;With CTE (Inx, Coord) as

    (SELECT

    inx=ROW_NUMBER() over (order by N)

    ,Coord = SUBSTRING(','+@P+',',N+1,CHARINDEX(',',','+@P+',',N+1)-N-1)

    FROM dbo.Tally

    WHERE N <= LEN(','+@P+',')

    AND SUBSTRING(','+@P+',',N,1) = ','

    )

    Select X=x.CoOrd

    ,Y=y.CoOrd

    ,XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)

    from CTE X

    join CTE Y on Y.inx = x.inx+1

    where y.inx%2 = 0

  • Hi Charlotte,

    I do have a question about the data you were kind enough to provide... how can a polygon have a single "X" coordinate and no "Y" coordinate? Why wouldn't every row have at least 1 full coordinate pair (ie; a point on a plane)? And why would any row have an uneven number of elements... for every "X" coordinate, shouldn't there also be a "Y" coordinate?

    I'm asking because I want to show a slightly different method than the good folks that have posted already have but I'm confused about your requirements.

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Guys, thanks for your replies - I will have a look at these today.

    Jeff regarding the X and missing Y, you are correct - it is a lapse in my test data (however, I though it was useful like that should the situation ever arise where somehow a Y was missing - although I not sure how this could arise) so ignore it.

    The other thing wrong with the sample data is each line should be terminated with a comma (eg a comma after the last y coordinate).

    Finally, these values in a real world scenario may have decimals in the string eg "123.098,5010,3,7232.9,81111.9098,20822.87,"

    Updated test data as follows...

    INSERT INTO #MyHead

    SELECT 123.098,5010,3,7232.9,81111.9098,20822.87,'

    UNION ALL SELECT '701,222,31,72223,1,2,277,3,'

    UNION ALL SELECT '487,732,5,6789,'

    UNION ALL SELECT '112,1234,'

    UNION ALL SELECT '53,87,'

    UNION ALL SELECT '24,6555'

    UNION ALL SELECT '132,2,31212,422,552,62,'

    I must say I'm most impressed with your article on the tally table but to be honest, when using your code to get me to my current state - it is a bit beyond me - I think it's really clever what it can do but I don't understand it how it works.

    Thanks

    Charlotte.

  • Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.

  • John, I've tried your code and it works perfectly. I added a comma to the end of your sample data string to simulate how my data is then I changed the line

    WHERE N <= LEN(','+@P+',') to:

    WHERE N < LEN(','+@P+',')

    and it works perfectly.

    However, I can't figure out how to hook this into my sample table (see response to Jeff for updated accurate sample data).

    I've updated it to include the key field plus cross joined to the test data table below...

    ;With CTE (Inx, PK, Coord) as

    (SELECT

    inx=ROW_NUMBER() over (order by N),

    mh.PK,

    Coord = SUBSTRING(','+mh.CsvColumn+',',N+1,CHARINDEX(',',','+mh.CsvColumn+',',N+1)-N-1)

    FROM dbo.Tally CROSS JOIN #MyHead mh

    WHERE N < LEN(','+mh.CsvColumn+',')

    AND SUBSTRING(','+mh.CsvColumn+',',N,1) = ','

    )

    Select X.PK,

    X=x.CoOrd,

    Y=y.CoOrd,

    XY_Distance = Power(power(x.CoOrd,2) + power(y.CoOrd,2),.5)

    from CTE X

    join CTE Y on Y.inx = x.inx+1

    where y.inx%2 = 0

    But it goes all wrong - I suspect it's something to do with the Row_Order coming in to play but I'm not sure.

    Please excuse my ignorance on this.

    Thanks

    Charlotte.

  • Since this is going to be a stored procedure, my recommendation would be to peel just on potato at a time. Let's split the coordinate pairs into a more friendly table first using the technique explained in the following article...

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    --===== Create the demo table. This is not part of the solution.

    CREATE TABLE #MapItems

    (MapItemID INT IDENTITY(1,1) PRIMARY KEY, ObjectCoords VARCHAR(8000))

    INSERT INTO #MapItems

    (ObjectCoords)

    SELECT '123.098,5010,3,7232.9,81111.9098,20822.87,' UNION ALL

    SELECT '701,222,31,72223,1,2,277,3,' UNION ALL

    SELECT '487,732,5,6789,' UNION ALL

    SELECT '112,1234,' UNION ALL

    SELECT '53,87,' UNION ALL

    SELECT '24,6555,' UNION ALL

    SELECT '132,2,31212,422,552,62,'

    ;

    --===== Create a numbered coordinate pair result set in a temp table from the test data

    WITH

    cteSplit AS

    ( --=== Split each coordinate value out into a numbered vertical result set

    SELECT (ROW_NUMBER() OVER (PARTITION BY m.MapItemID ORDER BY m.MapItemID,t.N)-1) AS ElementNumber,

    m.MapItemID,

    SUBSTRING(','+m.ObjectCoords,t.N+1,CHARINDEX(',',m.ObjectCoords,t.N)-t.N) AS Coordinate

    FROM dbo.Tally t

    CROSS JOIN #MapItems m

    WHERE t.N > 0 AND t.N < LEN(m.ObjectCoords)+1

    AND SUBSTRING(','+m.ObjectCoords,t.N,1) = ','

    ),

    cteArray AS

    ( --=== Add row and column numbers to the individual coordinate parts

    SELECT MapItemID,

    ElementNumber/2 AS RowNum,

    ElementNumber%2 AS ColNum,

    Coordinate

    FROM cteSplit

    ) --=== Reassemble the coordinate parts into coordinate pairs

    SELECT MapItemID,

    RowNum+1 AS CoordinateNumber,

    MAX(CASE WHEN ColNum = 0 THEN Coordinate END) AS X,

    MAX(CASE WHEN ColNum = 1 THEN Coordinate END) AS Y

    INTO #CoordinatePairs

    FROM cteArray

    GROUP BY MapItemID, RowNum

    ;

    SELECT * FROM #CoordinatePairs

    ;

    From there, it should be a simple matter of comparing these coordinates to a coordinate pair and distance in another table.

    I do have to warn... if this is for geographical analysis of any type and "X" and "Y" are actually Lat/Lon, you will have to take into account the curvature of the Earth and the fact that lines of longitude are not parallel. A Google search will reveal how to do that.

    Also bear in mind that if you have a huge number of original coordinate pairs to check against, just about any code (even simple distance calcs) will benefit strongly from some of the optimizations to narrow down the search which a Google search would also show. For example, you can very simply narrow down the range of Latitudes to search by determining the min and max Latitudes to search using a given coordinate pair and its related distance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I can't thank you enough for what you have given me - it's exactly what I needed - you are a star and your efforts (and those of John and Wayne who also responded) are very much appreciated.

    Thanks for the advice about Lat Long - however, the coordinate pairs in my real world DB are all in Metres (UK National Grid system). I'll do a Google search just to make sure there are no perculiarities of using this system when comparing distances and a search area.

    Once again, many thanks!

  • Thanks, Charlotte. Heh... not a star... just a guy who's had to do the same thing before. We have such a grid coordinate system for the US and Canada although it's not as resolute as meters (each whole value is the squareroot of 10 miles). It's known to some as the Telcordia V&H Coordinate System and to others it's known as the "Donald Elliptic Projection". A couple of it's more common uses are to calculate mileage for phone calls, between airports, and for shipping. Since it has the effect of flattening the Earth, distance calculations no longer need the complexity of "curvature of the Earth" functions... works just like the UK grid system.

    As a side bar, since you are using a grid system, optimizations are easy. You can quickly set limits for both X and Y in a manner similar to that which I suggested for setting limits on Latitude. It's very well worth the extra bit of coding.

    Anyway, glad it worked out and thanks again for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would just add that if you are doing a lot of work with this type of thing, you might want to take a look at the geometry data type built in to 2008 (as and when you decide to upgrade). This provides all sorts of handy built-in methods for calculating distances and such. There is also the decided advantage of being able to create spatial indexes...these can speed up queries by an amazing amount.

    See Getting Started With The Geometry Data Type for further information.

  • Hi Paul, Thanks for the info. Unfortunatly most of the people who use our software areon 2005 and it will probably be several years before they all go to 2008 so that I can then make use of the extra functionality. However, I am now thinking about looking into the Spacial Extensions for 2005.

    Regards

    Charlotte

  • Charlottecb (1/30/2010)


    Wayne, I've tried running your code but it doesn't seem to give the correct splits in values - It may be me doing something wrong though.

    Charlotte,

    It wasn't you... the edit I made to the ROW_NUMBER ORDER BY broke it. I changed it back, and it works now.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply