February 20, 2017 at 4:12 am
Good morning,
I hope I'm on the right place to post my question ...
I work on a project for research where I need to perform a linear regression on some big data and then test the results on a database according to the factors given by the linear regression. This would not be cumbersome if I had to do it 2-3 times only, but as I need to do it a lot of times, I'm trying to figure out how I could automate this.
Currently, I determine the data as input to the LR and the output; then I copy it to Excel and launch the linear regression; finally I modify the resulting factors in a stored procedure that will check the database and give me a final result that I can compare with previous ones. I hope this makes sense.
What I'd like to do is automatically:
1. select the input columns and the output one
2. call "something" to perform the regression tests
3. retrieve the resulting factors in a SP
4. run this latter and insert the result in a DB
Whilst the steps 1 and 4 are straightforward, I very much look forward to reading your suggestions for step 2 and 3.
(I currently use SQL Server 2012-Express, but could move to SQL Server 2016 standard edition if needed)
Thanks a lot!
February 21, 2017 at 2:11 am
It has been a while when I created the script below together with my co-worker. We used it to forecast growth of disk usage of which the measured values were stored in a table. Together with a tally table and a table to hold calculated values, we created a resultset with calculated forecasted values for the upcoming six months (with measured values of the last 12 months).
Try to modify the code to fit to your situation. Hope it helps...
-- create supporting tables
declare @datum datetime
declare @servername sysname
set @servername = 'hostserver'
set @datum = convert(nvarchar(8), getdate(), 120) + '01' -- first day of current month
IF OBJECT_ID('Tempdb..#Calculate') IS NOT NULL
DROP TABLE #Calculate;
-- Create table to hold calculated values
CREATE TABLE #Calculate
(Servername sysname
, Instance sysname
, drive char(1)
, mountpoint varchar(50)
, Capacity_Gb decimal (18,2)
, [-12 mnd] decimal (18,2)
, [-11 mnd] decimal (18,2)
, [-10 mnd] decimal (18,2)
, [-9 mnd] decimal (18,2)
, [-8 mnd] decimal (18,2)
, [-7 mnd] decimal (18,2)
, [-6 mnd] decimal (18,2)
, [-5 mnd] decimal (18,2)
, [-4 mnd] decimal (18,2)
, [-3 mnd] decimal (18,2)
, [-2 mnd] decimal (18,2)
, [-1 mnd] decimal (18,2)
, [0 mnd] decimal (18,2)
, avg_y decimal (18,2)
, a decimal (18,2)
, b decimal (18,2)
, [+1 mnd] decimal (18,2)
, [+2 mnd] decimal (18,2)
, [+3 mnd] decimal (18,2)
, [+4 mnd] decimal (18,2)
, [+5 mnd] decimal (18,2)
, [+6 mnd] decimal (18,2)
)
IF OBJECT_ID('Tempdb..#Tally_Date') IS NOT NULL
DROP TABLE #Tally_Date;
-- Create tally table
CREATE TABLE #Tally_Date (Datum Datetime)
-- fill it with required dates
INSERT INTO #Tally_Date
SELECT dateadd(month, -12, @datum)
UNION ALL
SELECT dateadd(month, -11, @datum)
UNION ALL
SELECT dateadd(month, -10, @datum)
UNION ALL
SELECT dateadd(month, -9, @datum)
UNION ALL
SELECT dateadd(month, -8, @datum)
UNION ALL
SELECT dateadd(month, -7, @datum)
UNION ALL
SELECT dateadd(month, -6, @datum)
UNION ALL
SELECT dateadd(month, -5, @datum)
UNION ALL
SELECT dateadd(month, -4, @datum)
UNION ALL
SELECT dateadd(month, -3, @datum)
UNION ALL
SELECT dateadd(month, -2, @datum)
UNION ALL
SELECT dateadd(month, -1, @datum)
UNION ALL
SELECT @datum
UNION ALL
SELECT dateadd(month, 1, @datum)
UNION ALL
SELECT dateadd(month, 2, @datum)
UNION ALL
SELECT dateadd(month, 3, @datum)
UNION ALL
SELECT dateadd(month, 4, @datum)
UNION ALL
SELECT dateadd(month, 5, @datum)
UNION ALL
SELECT dateadd(month, 6, @datum)
insert into #Calculate
(Servername
, Instance
, Drive
, Mountpoint
, Capacity_Gb
, [-12 mnd]
, [-11 mnd]
, [-10 mnd]
, [-9 mnd]
, [-8 mnd]
, [-7 mnd]
, [-6 mnd]
, [-5 mnd]
, [-4 mnd]
, [-3 mnd]
, [-2 mnd]
, [-1 mnd]
, [0 mnd]
)
select
Servername
, Instance
, Drive
, Mountpoint
, Capacity_Gb
, COALESCE([-12], 0) as '-12 mnd'
, COALESCE([-11], 0) as '-11 mnd'
, COALESCE([-10], 0) as '-10 mnd'
, COALESCE([-9], 0) as '-9 mnd'
, COALESCE([-8], 0) as '-8 mnd'
, COALESCE([-7], 0) as '-7 mnd'
, COALESCE([-6], 0) as '-6 mnd'
, COALESCE([-5], 0) as '-5 mnd'
, COALESCE([-4], 0) as '-4 mnd'
, COALESCE([-3], 0) as '-3 mnd'
, COALESCE([-2], 0) as '-2 mnd'
, COALESCE([-1], 0) as '-1 mnd'
, COALESCE([0], 0) as '0 mnd'
from
(select
Servername
, Instance
, Drive
, MountPoint
, datediff(mm, @datum, #Tally_Date.Datum) as Maand
, MAX(Capacity) AS Capacity_Gb
, MAX(capacity-free) AS MaximumUsedSpace
from #Tally_Date
left outer join dbo.mon_server_disks
-- join on time range
on #Tally_Date.Datum >= startdatum and #Tally_Date.Datum < COALESCE(einddatum, GETDATE())
where
drive <> 'q'
and servername like @servername
and instance IS NOT NULL
group by
servername
, instance
, Drive
, MountPoint
, #Tally_Date.Datum
, startdatum
)sel_alias
PIVOT
(MAX(MaximumUsedSpace)
FOR Maand
IN ([-12]
, [-11]
, [-10]
, [-9]
, [-8]
, [-7]
, [-6]
, [-5]
, [-4]
, [-3]
, [-2]
, [-1]
, [0]
)
)
as piv_alias
order by
servername
, Instance
, Drive
, MountPoint
--x = 1,2,3,4,5,6,7,8,9,10,11,12,13
--avg_x = (1+2+3+4+5+6+7+8+9+10+11+12+13) / 13 = 7
--divider for a = sum( (x - avg_x)2 ) = 182
update #Calculate
set avg_y = ([-12 mnd]+[-11 mnd]+[-10 mnd]+[-9 mnd]+[-8 mnd]+[-7 mnd]+[-6 mnd]+[-5 mnd]+[-4 mnd]+[-3 mnd]+[-2 mnd]+[-1 mnd]+[0 mnd])/13
update #Calculate
set a=((-6*([-12 mnd]-avg_y))+(-5*([-11 mnd]-avg_y))+(-4*([-10 mnd]-avg_y))+(-3*([-9 mnd]-avg_y))+(-2*([-8 mnd]-avg_y))+(-1*([-7 mnd]-avg_y))+(0*([-6 mnd]-avg_y))+(1*([-5 mnd]-avg_y))+(2*([-4 mnd]-avg_y))+(3*([-3 mnd]-avg_y))+(4*([-2 mnd]-avg_y))+(5*([-1 mnd]-avg_y))+(6*([0 mnd]-avg_y)))/182
update #Calculate
set b=avg_y-a*7
update #Calculate
set [+1 mnd]=a*14+b
, [+2 mnd]=a*15+b
, [+3 mnd]=a*16+b
, [+4 mnd]=a*17+b
, [+5 mnd]=a*18+b
, [+6 mnd]=a*19+b
select
Servername
, Instance
, drive
, mountpoint
, Capacity_Gb
, [-12 mnd]
, [-11 mnd]
, [-10 mnd]
, [-9 mnd]
, [-8 mnd]
, [-7 mnd]
, [-6 mnd]
, [-5 mnd]
, [-4 mnd]
, [-3 mnd]
, [-2 mnd]
, [-1 mnd]
, [0 mnd]
, [+1 mnd]
, [+2 mnd]
, [+3 mnd]
, [+4 mnd]
, [+5 mnd]
, [+6 mnd]
, case
when [+1 mnd]>Capacity_Gb then 'extreme'
when [+2 mnd]>Capacity_Gb then 'critical'
when [+3 mnd]>Capacity_Gb then 'warning'
when [+4 mnd]>Capacity_Gb then 'order'
when [+5 mnd]>Capacity_Gb then 'monitor'
when [+6 mnd]>Capacity_Gb then 'none'
else ''
end as Alert
from #Calculate
-- clean-up
IF OBJECT_ID('Tempdb..#Tally_Date') IS NOT NULL
DROP TABLE #Tally_Date
IF OBJECT_ID('Tempdb..#Calculate') IS NOT NULL
DROP TABLE #Calculate;
February 27, 2017 at 1:33 am
Vielen Dank HanShi for sharing your script.
At first sight I can't see any matrix inversion in it (that's in fact the function I need most in matrix calculations 😉 ), but I'll look into it ...
If anyone else has a got a clue to perform (efficient) matrix calculations in SQL server, I'm still buyer.
Have a nice day ...
🙂
February 27, 2017 at 2:30 am
We use R to do similar processing. As we don't have SQL 2016, we use the standalone version of R called by a SSIS package which reads and writes to the SQL Server database.
In SQL 2016 R services are available straight from the database.
This seems to be a good starting place
http://www.sqlservercentral.com/articles/RevolutionR/134077/
February 27, 2017 at 6:29 am
thanks alastair.beveridge ... I'll have a look at this ... it will not be as easy to implement as I initially thought of, but the power behind can be very helpful and it might be worth spending some extra time for training me into R ... 🙂
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply