January 13, 2010 at 12:02 pm
Hello!
I'm fairly new to SQL server, and trying to understand the best method to apply some data analysis.
I have a medium size data set 100000+ rows, consisting of date sequential financial data - historical prices over a date range.
I intend to process these row by row using C++, but wish to pre prepare the data by adding extra columns - my assumption being doing this pre processing up front will speed up the sequential data processing.
I'm interested in adding columns containing averages, minimums and maximums over a set period (eg row x would have an average column containing the average of the previous 350 prices, or the maximum column would contain the maximum price out of the previous 50 etc).
I'm using the following code to provide the averages, but it takes a long time (10 mins ish) to select the full data set. I have indexed the table on RowNumber. I'm also a little unsure as to how I combine this with an update to update the appropriate column...
select t1.RowNumber, t1.[close], AVG(t2.[close]) ma_slow
from #temp t1, #temp t2
where t1.RowNumber between t2.RowNumber and t2.RowNumber + 349 and t1.RowNumber between 350 and 100000
group by t1.RowNumber, t1.[close]
order by t1.RowNumber
Any ideas on speeding this process up, or am I better extracting the data into C++ layer and doing the calculations there?
Interestingly, if I limit the RowNumber to 10000, it processes in a few seconds... unsure why having 10 times the data means 1000 times the processing time!
Excel takes about 2 seconds to process the calculation on 100000+ rows(!), but I don't want to do it by hand, and may end up with more data than I can open in Excel...
Thanks,
David.
January 13, 2010 at 2:09 pm
given that RowNumber is an interger, why couldn't you simply do it this way:
select RowNumber/350 as ID, avg(close) as average, max(close) as maximum
from your_table
where RowNumber between 350 and 100000
group by RowNumber/350
order by RowNumber/350
PS if you need to use joins use the proper JOIN syntax because it is predictable. Plus the old style join syntax is deprecated.
The probability of survival is inversely proportional to the angle of arrival.
January 13, 2010 at 2:15 pm
Thanks for the idea, but that appears to average a single value, which gives the value, not an average of the previous 350 values... or am I missing something?
January 13, 2010 at 2:21 pm
actually, you are probably right... my brain was working in a different direction. However I think you could use the modulo 350 idea to generate a virtual table of RowNumbers to use to JOIN with using RowNumber between X and Y
something like this:
select Z.rownum, max(close), avg(close)
from maintable A
join (select X.J as rownum, min(X.K) as start, max(X.K) as [end] from (select convert(int,((RowNumber/350)*350)) as J, RowNumber as K from maintable) X group by X.J) Z on A.RowNumber between Z.start and Z.end
group by Z.rownum
The probability of survival is inversely proportional to the angle of arrival.
January 13, 2010 at 2:46 pm
Again, not sure that would work as I need every single row to have a different index...
row 351 has a average of rows 1 to 350
row 352 an average of rows 2 to 351
if sets of 350 rows have the same index I don't see how I can achieve this...
January 13, 2010 at 3:00 pm
Its just the genesis of an idea... to generate a table having columns startRowNum and endRowNum based on modulo-350.
It may be better to create a temp table and use identity() to create a unique rowid from 1 - 100000/350 with columns of start and end values (350 apart). The table Z might look like this:
ID start end
1 0 349
2 350 699
then:
select Z.ID, avg(close), max(close)
from maintable A
JOIN Z on A.RowNumber between Z.start and Z.end
group by Z.ID
The probability of survival is inversely proportional to the angle of arrival.
January 13, 2010 at 3:02 pm
I may have found something interesting here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911
but yet to test...
January 15, 2010 at 2:08 am
mctaff (1/13/2010)
row 351 has a average of rows 1 to 350row 352 an average of rows 2 to 351
As soon as SQL Server will implement full windowing clause (including sliding windows) in their ranking functions as specified by SQL standard and for example implemented in Oracle 😉 it will be very easy 🙂
Gints Plivna
http://www.gplivna.eu
January 15, 2010 at 11:27 am
hmmm... doesn't help me much though!
I've adapted the code from the link I posted.... 100,000 rows processed in 20 seconds for a moving average of 350 rows...
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply