February 24, 2010 at 3:55 pm
I've just finished implemented a simple univariate curve-fitting algorithm in T-SQL. This allows you to take N simple X/Y points and then extrapolate a curve that matches all of these points. This uses the principle that for any such list of N points, there is exactly one curve of maximum power N-1 that satisfies these points, for example, 3 points will have a curve of the form:
y = a + bx + cx^2 ... (and so on up through to zx^(n-1))
To use this code, you'll need SQL Server 2008, and the rights to create two table types and two functions. Let's start by declaring some useful table types:
-- Point type
CREATE TYPE Point AS TABLE (XCoordinate FLOAT, YCoordinate FLOAT);
GO
-- Curve fitting coefficient type
CREATE TYPE Coefficient AS TABLE (Multiplier FLOAT, PowerOfX INT);
GO
Next up, the actual Gauss-Jordan curve fitting function, with appropriate commentry:
/**
* Function: fn_CurveFitPoints
* Author: Steven James Gray [ steve@cobaltsoftware.net ]
* Date: 24th February, 2010
* Version: 1.0
* Description:
* Takes a series of points in a table variant in POINT (X float, Y float) format and
* computes the n-1 power series curve fit using the Gauss-Jordan elimination process.
* Return value is a series of Multiplier, Power elements that can be used for producing
* an estimated Y value for any X value.
*
* Please refer to fn_ExtrapolateYValue(@xvalue, @coefficients) for a simple implementation
* of how to use the output of this function.
**/
ALTER FUNCTION dbo.fn_CurveFitPoints
(
@PointsToCurveFit POINT readonly
)
RETURNS @Result TABLE (Multiplier FLOAT, PowerOfX INT)
AS
BEGIN
-- ==========================================================================================
-- Stage 1 - Convert @PointsToFit into a matrix
-- ==========================================================================================
DECLARE @Matrix TABLE (MatrixRow INT, MatrixColumn INT, MatrixValue FLOAT);
DECLARE @TotalPoints INT = (SELECT COUNT(1) FROM @PointsToCurveFit);
WITH NumberProjectionCTE(CurrentNumber)
AS
(
SELECT 1
UNION ALL
SELECT 1+CurrentNumber FROM NumberProjectionCTE WHERE CurrentNumber < @TotalPoints
) INSERT INTO @Matrix
SELECT
Sequence-1, -- Each point gets it's own row
PWR.CurrentNumber-1, -- Column per power of X
CASE
WHEN PWR.CurrentNumber = 1 -- 1st column is X^0 = 1 Always
THEN 1
ELSE POWER(XCoordinate,PWR.CurrentNumber-1) -- Raise nth column to power n-1.
END
FROM
NumberProjectionCTE PWR, -- Cross join numeric point data and column indexes
(SELECT
ROW_NUMBER() OVER (ORDER BY XCoordinate, YCoordinate) AS Sequence,
XCoordinate,
YCoordinate
FROM
@PointsToCurveFit
) ValueData;
/* Append Y values as nth column */
INSERT INTO @Matrix
SELECT
ROW_NUMBER() OVER (ORDER BY XCoordinate, YCoordinate) - 1 AS Sequence,
@TotalPoints,
YCoordinate
FROM
@PointsToCurveFit;
-- ==========================================================================================
-- Stage 2 - Compute row echelon form of matrix
-- ==========================================================================================
DECLARE @lead INT = 0, @index INT = 0, @current FLOAT;
DECLARE @Rows INT = (SELECT MAX(MatrixRow) FROM @Matrix);
DECLARE @Columns INT = (SELECT MAX(MatrixColumn) FROM @Matrix);
DECLARE @Solved INT -- 0=Unsolvable, 1 = Solved
DECLARE @r INT = 0
WHILE @r <= @Rows
BEGIN
IF @Columns <= @lead
BEGIN
-- Cannot solve this one
SET @Solved = 0;
BREAK;
END;
-- Determine if any row swaps are needed.
WHILE (SELECT MatrixValue FROM @Matrix WHERE MatrixRow = @index AND MatrixColumn = @lead) = 0
BEGIN
IF @Rows = @index
BEGIN
SET @lead = @lead + 1;
IF @Columns = @lead
BEGIN
-- Cannot solve
SET @Solved = 0;
BREAK;
END;
END;
END;
-- Move this row to the correct position if needed.
BEGIN
-- Swap rows
UPDATE @Matrix
SET MatrixRow = CASE MatrixRow
END
WHERE MatrixRow IN (@index, @r);
END;
-- Divide this row by it's lead column value, so that this row's lead is 1 (this will actually multiply/increase the value if lead <0)
DECLARE @Divisor FLOAT = (SELECT MatrixValue FROM @Matrix WHERE MatrixRow = @r AND MatrixColumn = @lead);
If @Divisor <> 1
BEGIN
UPDATE @Matrix SET MatrixValue = MatrixValue / @Divisor WHERE MatrixRow = @r;
END;
-- Update other rows and divide them by the appropriate multiple of this row in order to zero the current lead column.
UPDATE I
SET
MatrixValue = I.MatrixValue - (M.MatrixValue * R.MatrixValue)
FROM
@Matrix I
INNER JOIN @Matrix M ON M.MatrixRow = I.MatrixRow AND M.MatrixColumn = @lead
INNER JOIN @Matrix R ON R.MatrixColumn = I.MatrixColumn AND R.MatrixRow = @r AND R.MatrixRow <> I.MatrixRow
SET @lead = @lead + 1;
-- Move to next
END;
-- If we didn't bomb out, we're solved.
IF @Solved IS NULL
BEGIN
SET @Solved = 1
END;
-- ==========================================================================================
-- Stage 3 - Produce coefficients list (The final colum when in REF)
-- ==========================================================================================
IF @Solved = 1
BEGIN
INSERT INTO @Result (Multiplier, PowerOfX)
SELECT
MatrixValue,
MatrixRow
FROM @Matrix
WHERE MatrixColumn = @Columns;
END;
RETURN;
END;
GO
To make use of this functions output, you'll need
CREATE FUNCTION dbo.fn_ExtrapolateYValue
(
@XValue FLOAT,
@Coefficients Coefficient readonly
)
RETURNS FLOAT
AS
BEGIN
RETURN (SELECT SUM(Multiplier * POWER(@XValue, PowerOfX)) FROM @Coefficients);
END
A simple example, deducing Y=11 + 6x + x^2:
DECLARE @PointsToCurveFit Point
-- A few simple X/Y values
INSERT INTO @PointsToCurveFit SELECT 1,6
INSERT INTO @PointsToCurveFit SELECT 2,3
INSERT INTO @PointsToCurveFit SELECT 3,2
-- Calculate the curve fitting coefficients
DECLARE @Coefficients Coefficient
INSERT INTO @Coefficients SELECT * FROM dbo.fn_CurveFitPoints(@PointsToCurveFit);
-- Shows that y= 11x^0 + 6x + x^2
SELECT * FROM @Coefficients;
-- Show the values for X=-5 to 5
WITH NumberCTE(Number)
AS
(
SELECT -5
UNION ALL
SELECT 1 + Number FROM NumberCTE WHERE Number < 5
) SELECT
Number AS XValue,
dbo.fn_ExtrapolateYValue(Number, @Coefficients) AS YValue
FROM NumberCTE;
The results are:
XY
-566
-451
-338
-227
-118
011
16
23
32
43
56
Share and enjoy.
February 25, 2010 at 6:24 am
Isn't T-SQL the wrong tool for this sort of job?
For the curve-fitting, pass a number of built-in Point spatial data types (with geometry and geography variants) to a user-defined aggregate to return the equation.
For the extrapolation, pass the equation and some other parameters to a streaming CLR table-valued function to return points on that curve.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 25, 2010 at 9:29 am
I'd originally written this as a pair of CLR functions and used Geometry/Geography, but found little/no benefit in terms of performance/readability - it actually ended up being a lot more code, if anything. Once you take away formatting/spacing there's ~10 lines of actual elimination code left in the SQL. The T-SQL approach also allows for the use of set-operations fairly efficiently on large volumes of data (so for example, the last UPDATE statement would actually translate into dozens and dozens of lines of C#/VB .NET code to deal with the matrix).
SQL functions also don't allow for dynamic column-sets, so a 3x3 matrix would need to come out in the same X, Y, Value sequence as I've used here. This means any CLR implementation would need a matrix class, a block of code to map the X,Y,V triplets into matrix cells, plus code to map it back out, not counting any code to actually do the actually.
GJ elimination is effectively an O(N^3) algorithm, and one that doesn't lend itself naturally to parallelism, meaning you cant use SQL's inbuilt SMP and custom aggregates to speed it up either.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply