March 8, 2006 at 5:41 am
Does anyone know how i might store an array of floats in a field in a database?
The array would need to be variable in length but i would like to be able to extract the appropriate value from a particular index within the array within a regular query and be able to use AVG and STDEV statistical functions for the given index over a selected rowset.
Any ideas would be much appreciated
Matt
March 8, 2006 at 5:53 am
I can't think of any reason, why one would want to do this. Why don't you store the values in another table?
_/_/_/ paramind _/_/_/
March 8, 2006 at 7:35 am
We have an application that is logging data from a scientific instrument and all of the values in the array are taken at the same time. Different runs of the application might choose to store more or less readings in the array depending on what's being monitored. Whilst we could have a table that effectively just holds each of the floats from the array there would be a significant amount of data to go along with each value to indicate where it was taken, what time, etc. that would be duplicated on every row whereas it would be in the database only once with the array approach.
We know we can store our data in a binary form like this but i am looking for possible ways to manage it better in queries/stored procedures rather than querying for the raw data and then manipulating it in procedural code.
I know that PostgreSQL supports an array datatype, i'm just looking to achieve similar things in SQL Server.
Matt
March 8, 2006 at 9:07 am
>Whilst we could have a table that effectively just holds each of the floats from the array there would be a significant amount of data to go along with each value to indicate where it was taken, what time, etc. that would be duplicated on every row whereas it would be in the database only once with the array approach.<
You we're talking of floats and their respective indexes only in the original post, not of [time taken][location taken]; that still won't change things.
In case you take many [index]-[float-value] pairs at one time and in one location, two additional tables are required. There's no reason to repeat location, time per indexed float value.
And, in effect, this will probably be less costly even in terms of space (see BOL on the actual storage of varbinary), let alone in terms of speed when performing math operations (AVG etc) on that data.
As for space: if your varbinaries may grow large and may differ significantly in length from record to record inline storage (in the regular table space) is no advisable option. For every varbinary first a 16-Byte-Pointer to the actual data is stored in the table space whereas the data itself resides in an extent. This occurs automatically/hidden and will not be obvious, if you don't simply know this or check the database's overall size.
What you need to repeat is nothing but the primary key of your main record. (I assume that's where your [time taken][location taken]) is placed. If you have no PK, add a surrogate key, say an integer.
Regarding the 16-byte-pointer, the first four recordid's in the n-table are for free Moreover the indexes (!I do not talk of physical indexes in database terms) are for free too, since there's no need to store them, if you perform your inserts in the presumed order. Moreover, index markers are not necessary either - at least 2 more free bytes per float stored [index markers: you need markers, e.g. "(...)" to tell indexes and values apart, when you store your non-fixed-length values in a varbinary).
Summary: As long, as you don't trade 100 gb of memory + 10 cpu's for one gb of harddisk you're better off with an extra table - and in case you do, I'll offer you a deal
_/_/_/ paramind _/_/_/
March 8, 2006 at 9:20 am
OK, thanks. I hear what you're saying. I guess i've got to just take the plunge and try to fill a test database with a load of data to judge performance/space requirements and then take it from there.
Cheers
March 8, 2006 at 10:31 am
Paramind is suggesting using a table structure like this:
create table Reading
(
ReadingID bigint Identity(1,1),
timetaken datetime,
locationid int
--add other qualifying data
)
create table ReadingData
(
ReadingID bigint,
Value float
)
--then the following is very simple
select r.ReadingID, avg(d.value)
from reading r
inner join readingdata d
on r.ReadingID = d.ReadingID
group by r.ReadingID
March 9, 2006 at 3:27 am
Or there are database systems that allow multi-valued fields.
We still use Universe (now owned by IBM) which allows sub,sub,sub values in a field (first delimiter is ctl+} ) and can present interesting problems in queries and programming (a variation of Pick Basic).
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply