Geometric Median / Weiszfeld's algorithm

Question

Geometric Median / Weiszfeld's algorithm

busraker

Ten Centuries

Points: 1116
More actions
September 22, 2014 at 11:37 am

#296589

Does anyone have a function that determines the geometric function from a series of spatial points (geometric or geographic.)?
If not I am ready to build one ... I found a clean C++ implementation that I can translate to T-SQL.
Russ

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Haven't done Weiszfeld's algorithm in TSQL, would love giving it a go as it is a good example of iterative approach which potentially could be converted into a set based solution. Could you possibly share the C++ code?

😎

mickyT SSChampion Points: 10360 More actions · Answer 2

I've had a quick look at this and I suspect while it could be done in T-SQL (painfully and slowly :-D), a CLR function built around your C++ would probably be a better option.

Here's a link to a python version Eirikur

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 3

mickyT (9/22/2014)
I've had a quick look at this and I suspect while it could be done in T-SQL (painfully and slowly :-D), a CLR function built around your C++ would probably be a better option.
Here's a link to a python version Eirikur

Thanks!

😎

mickyT SSChampion Points: 10360 More actions · Answer 4

Looking at it further, it may not be so bad in t-sql.

Here's my crack at it. It currently uses a while loop because I'm having a few issues turning it into a recursive CTE. Might be able to sort something out later.

I also included a formula for a Geometric Mean.

-- Setup some sample data

--------------------------

SELECT * INTO sample

FROM (VALUES

(Geometry::Point(54,37,0))

,(Geometry::Point(45,75,0))

,(Geometry::Point(34,45,0))

,(Geometry::Point(87,124,0))

,(Geometry::Point(15,68,0))

,(Geometry::Point(32,84,0))

,(Geometry::Point(87,48,0))

,(Geometry::Point(01,78,0))

,(Geometry::Point(78,35,0))

,(Geometry::Point(125,78,0))

,(Geometry::Point(57,14,0))

) p(point);

-- Do that actual work

----------------------

DECLARE @tempX float = 0.0;

DECLARE @tempY float = 0.0;

DECLARE @iterations int = 50;

DECLARE @denomsum float;

SELECT @tempX = EXP(SUM(LOG(point.STX))/COUNT(*)),

@tempY = EXP(SUM(LOG(point.STY))/COUNT(*))

FROM sample;

WHILE @iterations > 0

BEGIN

SET @iterations -= 1;

SELECT @denomsum = SUM(1 / SQRT(POWER(@tempX - point.STX,2) + POWER(@tempY - point.STY,2)))

FROM sample;

SELECT @tempX = SUM((point.STX * (1 / SQRT(POWER(@tempX - point.STX,2) + POWER(@tempY - point.STY,2)))) / @denomsum),

@tempy = SUM((point.STY * (1 / SQRT(POWER(@tempX - point.STX,2) + POWER(@tempY - point.STY,2)))) / @denomsum)

FROM sample;

END;

-- Show the results

-------------------

SELECT point.STBuffer(2) from sample

UNION ALL

SELECT Geometry::Point(EXP(SUM(LOG(point.STX))/COUNT(*)), EXP(SUM(LOG(point.STY))/COUNT(*)), 0).STBuffer(4) -- GeometricMean

FROM sample

UNION ALL

SELECT Geometry::Point(@tempx, @tempy,0).STBuffer(6) -- GeometricMedian

;

busraker Ten Centuries Points: 1116 More actions · Answer 5

I pulled it off of another forum (what is the etiquette relating to that?). I'm better off converting it to T-SQL for my purpose as it will be used to populate a report in a third party app.

// header files for IO functions and math

#include <cstdio>

#include <cmath>

// the maximul value n can take

const int maxn = 50001;

// given a point (x, y) on a grid, we can find its left/right/up/down neighbors

// by using these constants: (x + dx[0], y + dy[0]) = upper neighbor etc.

const int dx[] = {-1, 0, 1, 0};

const int dy[] = {0, 1, 0, -1};

// controls the precision - this should give you an answer accurate to 3 decimals

const double eps = 0.001;

// input and output files

FILE *in = fopen("adapost2.in","r"), *out = fopen("adapost2.out","w");

// stores a point in 2d space

struct punct

{

double x, y;

};

// how many points are in the input file

int n;

// stores the points in the input file

punct a[maxn];

// stores the answer to the question

double x, y;

// finds the sum of (euclidean) distances from each input point to (x, y)

double dist(double x, double y)

{

double ret = 0;

for ( int i = 1; i <= n; ++i )

{

double dx = a.x - x;

double dy = a.y - y;

ret += sqrt(dx*dx + dy*dy); // classical distance formula

}

return ret;

}

// reads the input

void read()

{

fscanf(in, "%d", &n); // read n from the first

// read n points next, one on each line

for ( int i = 1; i <= n; ++i )

fscanf(in, "%lf %lf", &a.x, &a.y), // reads a point

x += a.x,

y += a.y; // we add the x and y at first, because we will start by approximating the answer as the center of gravity

// divide by the number of points (n) to get the center of gravity

x /= n;

y /= n;

}

// implements the solving algorithm

void go()

{

// start by finding the sum of distances to the center of gravity

double d = dist(x, y);

// our step value, chosen by experimentation

double step = 100.0;

// done is used to keep track of updates: if none of the neighbors of the current

// point that are *step* steps away improve the solution, then *step* is too big

// and we need to look closer to the current point, so we must half *step*.

int done = 0;

// while we still need a more precise answer

while ( step > eps )

{

done = 0;

for ( int i = 0; i < 4; ++i )

{

// check the neighbors in all 4 directions.

double nx = (double)x + step*dx;

double ny = (double)y + step*dy;

// find the sum of distances to each neighbor

double t = dist(nx, ny);

// if a neighbor offers a better sum of distances

if ( t < d )

{

update the current minimum

d = t;

x = nx;

y = ny;

// an improvement has been made, so

// don't half step in the next iteration, because we might need

// to jump the same amount again

done = 1;

break;

}

// half the step size, because no update has been made, so we might have

// jumped too much, and now we need to head back some.

if ( !done )

step /= 2;

}

int main()

{

read();

go();

// print the answer with 4 decimal points

fprintf(out, "%.4lf %.4lf", x, y);

return 0;

}

Eirikur Eiriksson (9/22/2014)
Haven't done Weiszfeld's algorithm in TSQL, would love giving it a go as it is a good example of iterative approach which potentially could be converted into a set based solution. Could you possibly share the C++ code?
😎

busraker Ten Centuries Points: 1116 More actions · Answer 6

Nice, I'm going to check it out tomorrow morning. I was using Geometric Mean and comparing it to Absolute distance from a separate point (to measure efficiency). If I run this against my data (which will be around 5,000 samples) to make sure the geographic median always has closer sums of distances than the mean centroids and my facility location.

Russ

mickyT (9/22/2014)
Looking at it further, it may not be so bad in t-sql.
Here's my crack at it. It currently uses a while loop because I'm having a few issues turning it into a recursive CTE. Might be able to sort something out later.
I also included a formula for a Geometric Mean.

mickyT SSChampion Points: 10360 More actions · Answer 7

While I'm on a bit of a roll. Had a bit of success with a rCTE, but it does require a iTVF to be created.

Removed: Incorrect results

busraker Ten Centuries Points: 1116 More actions · Answer 8

Thanks a bunch mickyT!!! I've added a user-defined table type to make this re-usable from my GIS database

CREATE TYPE udttPDXGeom AS TABLE (geompt GEOMETRY)

CREATE TYPE udttPDXGeog AS TABLE (geogpt GEOGRAPHY)

And added a function that does your math (allowing for iterations to be a parameter so it is flexible upon usage):

CREATE FUNCTION udfPDXGeomMedian (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)

RETURNS GEOMETRY AS

BEGIN

DECLARE @tempXFLOAT = 0.0;

DECLARE @tempYFLOAT = 0.0;

DECLARE @denomsumFLOAT;

SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),

@tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts

WHILE @iIterations > 0

BEGIN

SET @iIterations -= 1;

SELECT @denomsum = SUM(1 / SQRT(POWER(@tempX - geompt.STX,2) + POWER(@tempY - geompt.STY,2)))

FROM @geomPts

SELECT @tempX = SUM((geompt.STX * (1 / SQRT(POWER(@tempX - geompt.STX,2) + POWER(@tempY - geompt.STY,2)))) / @denomsum),

@tempy = SUM((geompt.STY * (1 / SQRT(POWER(@tempX - geompt.STX,2) + POWER(@tempY - geompt.STY,2)))) / @denomsum)

FROM @geomPts;

END

RETURN GEOMETRY::Point(@tempx, @tempy,0)

END

So the usage is down to this for me ...

DECLARE @sample AS udttPDXGeom

INSERT INTO @sample

VALUES

(Geometry::Point(54,37,0))

,(Geometry::Point(45,75,0))

,(Geometry::Point(34,45,0))

,(Geometry::Point(87,124,0))

,(Geometry::Point(15,68,0))

,(Geometry::Point(32,84,0))

,(Geometry::Point(87,48,0))

,(Geometry::Point(01,78,0))

,(Geometry::Point(78,35,0))

,(Geometry::Point(125,78,0))

,(Geometry::Point(57,14,0))

SELECT geompt.STBuffer(2) FROM @sample UNION ALL

SELECT dbo.udfPDXGeomMedian(@sample, 50).STBuffer(6)

Do you think this would translate will to GEOGRAPHY data types? I have all of my locations in both the spatial "4326" SRID as well as the "2601" Stateplane planar coordinates ... but the google maps API only supports Lat/Long so I would have to translate in my code which is expensive.

Russ

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 9

Quick question, how accurate does this have to be? The reason for asking is that one can do a good approximation using the spatial data type methods and functions, happy to demonstrate if that's good enough.

I have looked at the python and the C++ samples, think mikyT has pretty much nailed that approach, would like to prove otherwise if I have time:-D (good job mikyT)

😎

busraker Ten Centuries Points: 1116 More actions · Answer 10

Well that didn't go so well ....

CREATE FUNCTION udfPDXGeogMedian (@geogPts udttPDXGeog READONLY, @iIterations NUMERIC)

RETURNS GEOGRAPHY AS

BEGIN

DECLARE @tempXFLOAT = 0.0;

DECLARE @tempYFLOAT = 0.0;

DECLARE @denomsumFLOAT;

SELECT@tempx = EXP(SUM(LOG(geogpt.Long))/COUNT(*)),

@tempy = EXP(SUM(LOG(geogpt.Lat))/COUNT(*)) FROM @geogPts

WHILE @iIterations > 0

BEGIN

SET @iIterations -= 1;

SELECT @denomsum = SUM(1 / SQRT(POWER(@tempX - geogpt.Long,2) + POWER(@tempY - geogpt.Lat,2)))

FROM @geogPts

SELECT @tempX = SUM((geogpt.Lat * (1 / SQRT(POWER(@tempX - geogpt.Long,2) + POWER(@tempY - geogpt.Lat,2)))) / @denomsum),

@tempy = SUM((geogpt.Lat * (1 / SQRT(POWER(@tempX - geogpt.Long,2) + POWER(@tempY - geogpt.Lat,2)))) / @denomsum)

FROM @geogPts;

END

RETURN GEOGRAPHY::Point(@tempx, @tempy,0)

END

Sample data:

DECLARE @sampleGeog AS udttPDXGeog

INSERT INTO @sampleGeog

VALUES

(GEOGRAPHY::Point(44.478311,-122.62196,4326))

,(GEOGRAPHY::Point(44.4600192,-122.6435081,4326))

,(GEOGRAPHY::Point(44.5137321,-122.6458555,4326))

,(GEOGRAPHY::Point(44.505355,-122.58344,4326))

,(GEOGRAPHY::Point(44.518456,-122.6453,4326))

,(GEOGRAPHY::Point(44.4941156,-122.5793172,4326))

SELECT geogpt.STBuffer(100) FROM @sampleGeog

UNION ALL SELECT dbo.udfPDXGeogMedian(@sampleGeog, 500).STBuffer(300)

Yields:

(6 row(s) affected)

Msg 3623, Level 16, State 1, Line 13

An invalid floating point operation occurred.

busraker Ten Centuries Points: 1116 More actions · Answer 11

The more accurate it is, the more money I will save :-). So for an area of 30 miles by 30 miles, I would like to get the Geographic Median within maybe a tenth of a mile. I'm dealing with ten of thousands of trips a year so every bit helps.

Russ

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 12

busraker (9/24/2014)
I'm dealing with ten of thousands of trips a year so every bit helps.
Russ

Sounds a little like travelling salesman here:-)

This isn't too high of a volume, cannot say much but these are kind of hourly type volumes where I'm coming from. I'll piece together an example and post it shortly when I have time.

😎

busraker Ten Centuries Points: 1116 More actions · Answer 13

Eirikur Eiriksson (9/24/2014)
Sounds a little like travelling salesman here:-)
This isn't too high of a volume, cannot say much but these are kind of hourly type volumes where I'm coming from. I'll piece together an example and post it shortly when I have time.
😎

The only miles I log are on the CPU bus 🙂

I'm thinking I will simply cast the log and lat as Geometries instead of Geographies and convert back to geography after getting the geometric median. It will be close enough I think, and may better then the Geometric Mean

mickyT SSChampion Points: 10360 More actions · Answer 14

busraker (9/24/2014)
Well that didn't go so well ....
...
SELECT@tempx = EXP(SUM(LOG(geogpt.Long))/COUNT(*)),
@tempy = EXP(SUM(LOG(geogpt.Lat))/COUNT(*)) FROM @geogPts
...
END
Yields:
(6 row(s) affected)
Msg 3623, Level 16, State 1, Line 13
An invalid floating point operation occurred.

The floating point error will be due to the LOG getting a negative I think.

I suspect the geography version will not be accurate enough for you. It would depend on the north south spread of your data.

If you aren't adverse to using CLRs, there is a few libraries out there that can help with reprojections.

http://sqlspatialtools.codeplex.com/wikipage?title=Current%20Contents&referringTitle=Home

http://www.doogal.co.uk/dotnetcoords.php