September 24, 2014 at 1:34 pm
Thinking about this a bit more. If you replace the distance calculation in the current version with the haversine formula, you should get a reasonably accurate median.
So replace
denomsum = SUM(1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2))),
X = SUM((point.STX * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum),
Y = SUM((point.STY * (1 / SQRT(POWER(@X - point.STX,2) + POWER(@Y - point.STY,2)))) / @denomsum)
with
denomsum = SUM(1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371)),
X = SUM((point.STX * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum),
Y = SUM((point.STY * (1 / (ACOS( SIN(@Y)*SIN(point.STY) + COS(@Y)*COS(point.STY)*COS(@Y-@X) ) * 6371))) / @denomsum)
I haven't tested this yet, so I hope I got it right.
Edit: Here's a handy site http://www.movable-type.co.uk/scripts/latlong.html
September 24, 2014 at 2:31 pm
There is a big difference in the results when doing the calculation on geometry or geography, look at this examples and results
😎
Geometry
USE tempdb;
GO
DECLARE @GEOM TABLE
(
GNAME VARCHAR(10) NOT NULL
,GPoint GEOMETRY NOT NULL
);
INSERT INTO @GEOM(GNAME,GPoint)
VALUES
('3,1' ,GEOMETRY::Point (3,1,0))
,('1,25' ,GEOMETRY::Point (1,25,0))
,('18,30' ,GEOMETRY::Point (18,30,0))
,('27,16' ,GEOMETRY::Point (27,6,0))
,('32,6' ,GEOMETRY::Point (32,16,0))
,('15,0.5',GEOMETRY::Point (15,0.5,0))
,('5,15' ,GEOMETRY::Point (5,15,0));
;WITH BASE_AGGREGATION AS
(
SELECT
geometry::ConvexHullAggregate(GPoint) AS AGPoints
FROM @GEOM G
)
SELECT
'Centre Point' AS P_LABEL
,BA.AGPoints.STCentroid ().STBuffer(0.5)
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
'Convex'
,BA.AGPoints
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
G.GNAME
,G.GPoint.STBuffer(0.3)
FROM @GEOM G;
Geography
USE tempdb;
GO
DECLARE @GEOM TABLE
(
GNAME VARCHAR(10) NOT NULL
,GPoint GEOGRAPHY NOT NULL
);
INSERT INTO @GEOM(GNAME,GPoint)
VALUES
('3,1' ,GEOGRAPHY::Point (3,1,4326))
,('1,25' ,GEOGRAPHY::Point (1,25,4326))
,('18,30' ,GEOGRAPHY::Point (18,30,4326))
,('27,16' ,GEOGRAPHY::Point (27,6,4326))
,('32,6' ,GEOGRAPHY::Point (32,16,4326))
,('15,0.5',GEOGRAPHY::Point (15,0.5,4326))
,('5,15' ,GEOGRAPHY::Point (5,15,4326));
;WITH BASE_AGGREGATION AS
(
SELECT
GEOGRAPHY::ConvexHullAggregate(GPoint) AS AGPoints
FROM @GEOM G
)
SELECT
'Centre Point' AS P_LABEL
,BA.AGPoints.EnvelopeCenter ().STBuffer(50000)
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
'Convex'
,BA.AGPoints
FROM BASE_AGGREGATION BA
UNION ALL
SELECT
G.GNAME
,G.GPoint.STBuffer(50000)
FROM @GEOM G;
Geometry results
Geography (Mercator projection) results
Edit: Added (Mercator projection)
September 24, 2014 at 2:54 pm
I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.
September 24, 2014 at 3:18 pm
busraker (9/24/2014)
I would expect that given the large variation of lats/longs. All of my data points will be within 1 degree latitude and longitude so I think i can get away with it.
Guess that would be fine, the example is quite exaggerated
😎
September 24, 2014 at 6:02 pm
The function doesn't work anyway, even for straight up geometric medians of non-geographic multi-points. I calculated the geometric median for a few thousand sample test multi-points from my database, and summed the distance versus a random point to make sure the random point was never a better solution. Many failed the test. For example with this test data set:
DECLARE @sampleGeom AS udttPDXGeom
INSERT INTO @sampleGeom
VALUES
(GEOMETRY::Point(122.688,45.515,0))
,(GEOMETRY::Point(122.685,45.517,0))
,(GEOMETRY::Point(122.48,45.527,0))
,(GEOMETRY::Point(122.558,45.549,0))
,(GEOMETRY::Point(122.683,45.522,0));
Against a Random Point:
WITH pts AS (
SELECT
geompt AS geomPoint,
dbo.udfPDXGeomMedian(@sampleGeom, 100) AS geomMedian,
GEOMETRY::Point(122.6719651,45.5218728,0) AS geomTest
FROM
@sampleGeom)
,dist AS (
SELECT
geomPoint.STDistance(geomMedian) DistanceToMedian,
geomPoint.STDistance(geomTest) DistanceToTest,
pts.*
FROM
pts)
SELECT
COUNT(*) iPoints,
SUM(DistanceToMedian) MedianTotal,
SUM(DistanceToTest) TestTotal
FROM
dist
The results favored the test point significantly despite how many iterations I chose:
iPointsMedianTotalTestTotal
50.4182644767160040.351580012335012
For reference this is my UDTT and Function:
CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(
[geompt] [geometry] NULL
)
GO
And final Function:
CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)
RETURNS GEOMETRY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),
@tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))
FROM @geomPts
SELECT @tempX = SUM((geompt.STX * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371)))
/ @denomsum),
@tempY = SUM((geompt.STY * (1 / (ACOS( SIN(@tempY)*SIN(geompt.STY) + COS(@tempY)*COS(geompt.STY)*COS(@tempY-@tempX) ) * 6371))) / @denomsum)
FROM @geomPts
END
RETURN GEOMETRY::Point(@tempx, @tempy,0)
END
September 24, 2014 at 7:40 pm
**Insert bad word here**, I've either translated the python wrong or it doesn't work.
I've just tried it with some of my real data rather than the mocked up points I did before and it ends up getting worse.
I'm investigating at the moment and will get back.
September 24, 2014 at 8:38 pm
I've removed the rcte version as that appeared to be behaving badly. The looped version appears to work OK.
To avoid issues where distances in the procedure get calculated badly I replaced them with the built in function. This does make it go slower.
I tested the following by creating a circle of points (132) 100 km around a point.
Origin 2000000, 5000000
For Geometry the X was bang on. The Y was bang on.
For Geography the Long was 99m out and the Lat 780m out.
Geometry Version:
DECLARE @tempX float = 0.0;
DECLARE @tempY float = 0.0;
DECLARE @iterations int = 50;
DECLARE @denomsum float;
SELECT @tempX = EXP(SUM(LOG(geom.STX))/COUNT(*)),
@tempY = EXP(SUM(LOG(geom.STY))/COUNT(*))
FROM testmedian;
WHILE @iterations > 0
BEGIN
SET @iterations -= 1;
SELECT @denomsum = SUM(1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))
FROM testmedian;
SELECT @tempX = SUM((geom.STX * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum),
@tempy = SUM((geom.STY * (1 / geom.STDistance(Geometry::Point(@tempx, @tempy,geom.STSrid)))) / @denomsum)
FROM testmedian;
END;
select @tempx, @tempy
Geography Version:DECLARE @tempX float = 0.0;
DECLARE @tempY float = 0.0;
DECLARE @iterations int = 50;
DECLARE @denomsum float;
SELECT @tempX = 0,
@tempY = 0
WHILE @iterations > 0
BEGIN
SET @iterations -= 1;
SELECT @denomsum = SUM(1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))
FROM testmedian;
SELECT @tempX = SUM((geog.Long * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum),
@tempy = SUM((geog.Lat * (1 / Geog.STDistance(Geography::Point(@tempy,@tempx,geog.STSrid)))) / @denomsum)
FROM testmedian;
END;
select @tempx, @tempy
September 25, 2014 at 5:03 pm
I got the geometry version working beautifully after getting rid of the division by zeroes. I also had to ABS the longitude and then flip it back after:
CREATE TYPE [dbo].[udttPDXGeom] AS TABLE(
[geompt] [geometry] NULL
)
CREATE FUNCTION [dbo].[udfPDXGeomMedian] (@geomPts udttPDXGeom READONLY, @iIterations NUMERIC)
RETURNS GEOMETRY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
DECLARE @G2 udttPDXGeom
INSERT INTO @G2 SELECT * FROM @geomPts
SELECT@tempx = EXP(SUM(LOG(geompt.STX))/COUNT(*)),
@tempy = EXP(SUM(LOG(geompt.STY))/COUNT(*)) FROM @geomPts
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))
FROM @G2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;
SELECT @tempX = SUM((geompt.STX * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum),
@tempy = SUM((geompt.STY * (1 / geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)))) / @denomsum)
FROM @G2 WHERE geompt.STDistance(Geometry::Point(@tempx, @tempy,geompt.STSrid)) <> 0;
END
RETURN GEOMETRY::Point(@tempx, @tempy,0)
END
I was able to process 1489 sets with 8641 total locations in 4:19 which is totally fine. I'll see if I can get the geography version going as well.
September 25, 2014 at 5:11 pm
That's excellent
September 25, 2014 at 5:50 pm
Add the Geography Version:
CREATE TYPE [dbo].[udttPDXGeog] AS TABLE(
[geogpt] [geography] NULL
ALTER FUNCTION [dbo].[udfPDXGeogMedian] (@geogPts udttPDXGeog READONLY, @iIterations NUMERIC)
RETURNS GEOGRAPHY AS
BEGIN
DECLARE @tempXFLOAT = 0.0;
DECLARE @tempYFLOAT = 0.0;
DECLARE @denomsumFLOAT;
DECLARE @sridNUMERIC;
DECLARE @G2 udttPDXGeog;
INSERT INTO @G2 SELECT * FROM @geogPts
SELECT@tempx = EXP(SUM(LOG(ABS(geogpt.Long)))/COUNT(*)) * SIGN(SUM(geogpt.Long)) ,
@tempy = EXP(SUM(LOG(ABS(geogpt.Lat)))/COUNT(*)) * SIGN(SUM(geogpt.Lat)) ,
@srid = geogpt.STSrid FROM @G2 GROUP BY geogpt.STSrid
WHILE @iIterations > 0
BEGIN
SET @iIterations -= 1;
SELECT @denomsum = SUM(1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)))
FROM @G2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;
SELECT @tempX = SUM((geogpt.Long * (1 / geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx,geogpt.STSrid)))) / @denomsum),
@tempy = SUM((geogpt.Lat * (1 / geogpt.STDistance(GEOGRAPHY::Point( @tempy, @tempx,geogpt.STSrid)))) / @denomsum)
FROM @G2 WHERE geogpt.STDistance(GEOGRAPHY::Point(@tempy, @tempx, geogpt.STSrid)) <> 0;
END
RETURN GEOGRAPHY::Point(@tempy, @tempx, @srid)
END
Thanks for all of your help!
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply