Geometric Median / Weiszfeld's algorithm

  • Thinking about this a bit more. If you replace the distance calculation in the current version with the haversine formula, you should get a reasonably accurate median.

    So replace

    denomsum = SUM(1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2))),

    X = SUM((point.STX * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum),

    Y = SUM((point.STY * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum)

    with

    denomsum = SUM(1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371)),

    X = SUM((point.STX * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum),

    Y = SUM((point.STY * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum)

    I haven't tested this yet, so I hope I got it right.

    Edit: Here's a handy site http://www.movable-type.co.uk/scripts/latlong.html

  • There is a big difference in the results when doing the calculation on geometry or geography, look at this examples and results

    😎

    Geometry

    USE tempdb;

    GO

    DECLARE @GEOM TABLE

    (

    GNAME VARCHAR(10) NOT NULL

    ,GPoint GEOMETRY NOT NULL

    );

    INSERT INTO @GEOM(GNAME,GPoint)

    VALUES

    ('3,1' ,GEOMETRY::Point (3,1,0))

    ,('1,25' ,GEOMETRY::Point (1,25,0))

    ,('18,30' ,GEOMETRY::Point (18,30,0))

    ,('27,16' ,GEOMETRY::Point (27,6,0))

    ,('32,6' ,GEOMETRY::Point (32,16,0))

    ,('15,0.5',GEOMETRY::Point (15,0.5,0))

    ,('5,15' ,GEOMETRY::Point (5,15,0));

    ;WITH BASE_AGGREGATION AS

    (

    SELECT

    geometry::ConvexHullAggregate(GPoint) AS AGPoints

    FROM @GEOM G

    )

    SELECT

    'Centre Point' AS P_LABEL

    ,BA.AGPoints.STCentroid ().STBuffer(0.5)

    FROM BASE_AGGREGATION BA

    UNION ALL

    SELECT

    'Convex'

    ,BA.AGPoints

    FROM BASE_AGGREGATION BA

    UNION ALL

    SELECT

    G.GNAME

    ,G.GPoint.STBuffer(0.3)

    FROM @GEOM G;

    Geography

    USE tempdb;

    GO

    DECLARE @GEOM TABLE

    (

    GNAME VARCHAR(10) NOT NULL

    ,GPoint GEOGRAPHY NOT NULL

    );

    INSERT INTO @GEOM(GNAME,GPoint)

    VALUES

    ('3,1' ,GEOGRAPHY::Point (3,1,4326))

    ,('1,25' ,GEOGRAPHY::Point (1,25,4326))

    ,('18,30' ,GEOGRAPHY::Point (18,30,4326))

    ,('27,16' ,GEOGRAPHY::Point (27,6,4326))

    ,('32,6' ,GEOGRAPHY::Point (32,16,4326))

    ,('15,0.5',GEOGRAPHY::Point (15,0.5,4326))

    ,('5,15' ,GEOGRAPHY::Point (5,15,4326));

    ;WITH BASE_AGGREGATION AS

    (

    SELECT

    GEOGRAPHY::ConvexHullAggregate(GPoint) AS AGPoints

    FROM @GEOM G

    )

    SELECT

    'Centre Point' AS P_LABEL

    ,BA.AGPoints.EnvelopeCenter ().STBuffer(50000)

    FROM BASE_AGGREGATION BA

    UNION ALL

    SELECT

    'Convex'

    ,BA.AGPoints

    FROM BASE_AGGREGATION BA

    UNION ALL

    SELECT

    G.GNAME

    ,G.GPoint.STBuffer(50000)

    FROM @GEOM G;

    Geometry results

    Geography (Mercator projection) results

    Edit: Added (Mercator projection)

  • I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.

  • busraker (9/24/2014)


    I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.

    Guess that would be fine, the example is quite exaggerated

    😎

  • The function doesn't work anyway, even for straight up geometric medians of non-geographic multi-points. I calculated the geometric median for a few thousand sample test multi-points from my database, and summed the distance versus a random point to make sure the random point was never a better solution. Many failed the test. For example with this test data set:

    DECLARE @sampleGeom AS udttPDXGeom

    INSERT INTO @sampleGeom

    VALUES

    (GEOMETRY::Point(122.688,45.515,0))

    ,(GEOMETRY::Point(122.685,45.517,0))

    ,(GEOMETRY::Point(122.48,45.527,0))

    ,(GEOMETRY::Point(122.558,45.549,0))

    ,(GEOMETRY::Point(122.683,45.522,0));

    Against a Random Point:

    WITH pts AS (

    SELECT

    geompt AS geomPoint,

    dbo.udfPDXGeomMedian(@sampleGeom, 100) AS geomMedian,

    GEOMETRY::Point(122.6719651,45.5218728,0) AS geomTest

    FROM

    @sampleGeom)

    ,dist AS (

    SELECT

    geomPoint.STDistance(geomMedian) DistanceToMedian,

    geomPoint.STDistance(geomTest) DistanceToTest,

    pts.*

    FROM

    pts)

    SELECT

    COUNT(*) iPoints,

    SUM(DistanceToMedian) MedianTotal,

    SUM(DistanceToTest) TestTotal

    FROM

    dist

    The results favored the test point significantly despite how many iterations I chose:

    iPointsMedianTotalTestTotal

    50.4182644767160040.351580012335012

    For reference this is my UDTT and Function:

    CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(

    [geompt] [geometry] NULL

    )

    GO

    And final Function:

    CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)

    RETURNS GEOMETRY AS

    BEGIN

    DECLARE @tempXFLOAT = 0.0;

    DECLARE @tempYFLOAT = 0.0;

    DECLARE @denomsumFLOAT;

    SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),

    @tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts

    WHILE @iIterations > 0

    BEGIN

    SET @iIterations -= 1;

    SELECT @denomsum = SUM(1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))

    FROM @geomPts

    SELECT @tempX = SUM((geompt.STX * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371)))

    / @denomsum),

    @tempY = SUM((geompt.STY * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))) / @denomsum)

    FROM @geomPts

    END

    RETURN GEOMETRY::Point(@tempx, @tempy,0)

    END

  • **Insert bad word here**, I've either translated the python wrong or it doesn't work.

    I've just tried it with some of my real data rather than the mocked up points I did before and it ends up getting worse.

    I'm investigating at the moment and will get back.

  • I've removed the rcte version as that appeared to be behaving badly. The looped version appears to work OK.

    To avoid issues where distances in the procedure get calculated badly I replaced them with the built in function. This does make it go slower.

    I tested the following by creating a circle of points (132) 100 km around a point.

    Origin 2000000, 5000000

    For Geometry the X was bang on. The Y was bang on.

    For Geography the Long was 99m out and the Lat 780m out.

    Geometry Version:

    DECLARE @tempX float = 0.0;

    DECLARE @tempY float = 0.0;

    DECLARE @iterations int = 50;

    DECLARE @denomsum float;

    SELECT @tempX = EXP(SUM(LOG(geom.STX))/COUNT(*)),

    @tempY = EXP(SUM(LOG(geom.STY))/COUNT(*))

    FROM testmedian;

    WHILE @iterations > 0

    BEGIN

    SET @iterations -= 1;

    SELECT @denomsum = SUM(1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))

    FROM testmedian;

    SELECT @tempX = SUM((geom.STX * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum),

    @tempy = SUM((geom.STY * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum)

    FROM testmedian;

    END;

    select @tempx, @tempy

    Geography Version:DECLARE @tempX float = 0.0;

    DECLARE @tempY float = 0.0;

    DECLARE @iterations int = 50;

    DECLARE @denomsum float;

    SELECT @tempX = 0,

    @tempY = 0

    WHILE @iterations > 0

    BEGIN

    SET @iterations -= 1;

    SELECT @denomsum = SUM(1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))

    FROM testmedian;

    SELECT @tempX = SUM((geog.Long * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum),

    @tempy = SUM((geog.Lat * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum)

    FROM testmedian;

    END;

    select @tempx, @tempy

  • I got the geometry version working beautifully after getting rid of the division by zeroes. I also had to ABS the longitude and then flip it back after:

    CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(

    [geompt] [geometry] NULL

    )

    CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)

    RETURNS GEOMETRY AS

    BEGIN

    DECLARE @tempXFLOAT = 0.0;

    DECLARE @tempYFLOAT = 0.0;

    DECLARE @denomsumFLOAT;

    DECLARE @G2 udttPDXGeom

    INSERT INTO @G2 SELECT * FROM @geomPts

    SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),

    @tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts

    WHILE @iIterations > 0

    BEGIN

    SET @iIterations -= 1;

    SELECT @denomsum = SUM(1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))

    FROM @G2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;

    SELECT @tempX = SUM((geompt.STX * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum),

    @tempy = SUM((geompt.STY * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum)

    FROM @G2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;

    END

    RETURN GEOMETRY::Point(@tempx, @tempy,0)

    END

    I was able to process 1489 sets with 8641 total locations in 4:19 which is totally fine. I'll see if I can get the geography version going as well.

  • That's excellent

  • Add the Geography Version:

    CREATE TYPE [dbo].[udttPDXGeog] AS TABLE(

    [geogpt] [geography] NULL

    ALTER FUNCTION [dbo].[udfPDXGeogMedian] (@geogPts udttPDXGeog READONLY, @iIterations NUMERIC)

    RETURNS GEOGRAPHY AS

    BEGIN

    DECLARE @tempXFLOAT = 0.0;

    DECLARE @tempYFLOAT = 0.0;

    DECLARE @denomsumFLOAT;

    DECLARE @sridNUMERIC;

    DECLARE @G2 udttPDXGeog;

    INSERT INTO @G2 SELECT * FROM @geogPts

    SELECT@tempx = EXP(SUM(LOG(ABS(geogpt.Long)))/COUNT(*)) * SIGN(SUM(geogpt.Long)) ,

    @tempy = EXP(SUM(LOG(ABS(geogpt.Lat)))/COUNT(*)) * SIGN(SUM(geogpt.Lat)) ,

    @srid = geogpt.STSrid FROM @G2 GROUP BY geogpt.STSrid

    WHILE @iIterations > 0

    BEGIN

    SET @iIterations -= 1;

    SELECT @denomsum = SUM(1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)))

    FROM @G2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;

    SELECT @tempX = SUM((geogpt.Long * (1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx,geogpt.STSrid)))) / @denomsum),

    @tempy = SUM((geogpt.Lat * (1 / geogpt.STDistance(GEOGRAPHY::Point( @tempy, @tempx,geogpt.STSrid)))) / @denomsum)

    FROM @G2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;

    END

    RETURN GEOGRAPHY::Point(@tempy, @tempx, @srid)

    END

    Thanks for all of your help!

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply