November 23, 2008 at 11:56 am
I have an old data structure, currently held in files,
In C its something like this
struct block {
int blockNumber;
double blockData[1020]; // C double will be SQL float
};
each file has 256 blocks of data and there are around 50 files.
The filename also forms part of the indexing system, and are of the form FILE
To use the data the existing application passes fileNumber and the blockNumber, and gets a block of 1020 values returned.
The program that reads the data is in Fortran (gulp! :unsure: ) - and we want to replace this somehow - I'd like to put the data into a Database. My goal would be something like a stored procedure recieving the fileNumber and blockNumber - and passing back the array of 1020 float values
My question is how to represent the the array of float values in an SQL record structure?
November 23, 2008 at 6:14 pm
I'm pretty sure that there is... attach one of the files and a record layout and let's see what we can do...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2008 at 2:27 am
Thanks Jeff,
the file is ascii, record structure is
line 1 [Block Number] [No Values in Block]
lines 2...341 [value] [value] [value]
e.g. the first 4 lines are
1 1018
0.251726450000000000E+07 0.251729650000000000E+07 0.149568940363419000E+08
0.129347882411886640E+08 -0.194328028922087426E+06 -0.294282022326385668E+05
-0.383682118344389664E+03 -0.145789684500091479E+02 0.583642027748376771E-01
There follows a further 337 lines containing 3 float values each, giving a total of 1020 float values per block, however the initial line of the block gives 1018 as the No Values in Block - the last two values in the block are zero and would be ignored.
A sample is attached. I have cut down this file so there are only 10 blocks of data, the original file has around 250 blocks and is over 6Mb
My first idea has been rejected - it was simply to store each block of data in a TEXT field. - The user rejected this because they might want to query individual values or search for value ranges at some point in the future.
Then I mused on the idea of Pivoting the data so each row represents a position in the block and each column represents an entire block - I assume SQL 2005 no longer has the 8060 record size limit - so I can have 256(blocks) times 50(files) worth of float values in a row. It would still be awkward to reference the data as column names would be File1Block1 File1Block2 ... FileNBlockN.
I Look forward to anyone elses suggestions.
November 24, 2008 at 6:18 am
I'm getting ready to go to work, so I can't solve it all right this instant.
I do have a question, though... are all of these values for a single column even though the data brings them in 3 at a time? In other words, would it be convenient to bring them into a table that looks like this?
BlockNum Data
And, if so, what should the column names actually be?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2008 at 6:32 am
Heh... never mind... not enough coffee... this is a matrix they want and they want to be able to address the data by the individual elements using j,k coorinates. Can do...
... can you wait until tonight for me to do this?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 24, 2008 at 8:12 am
I think the only significance of 3 values per line is so it fits on an old 80-character green screen.
I have been investigating the code further and it seems the 1018 values can be broken up into 13 unequal groups, each group can be named GROUP0 ... GROUP12, as GROUP is another input value, however it was hidden from my view of the application, as the data retrieval part just gets the whole block, and leaves it up to the application to drill down into the group.
The following explanation of the structure is for information only. I only need to return an array of 1018 float values to the user application - but if it helps devise a better SQL storage method, then here goes: -
Of the 13 unequal groups, each group contains between 1 and 14 sub-groups of values and a sub-group can be between 2 and or 24 values
The first data line of each block (2nd actual line) contains some checking data, so actual data starts on data line 2 (actual line 3)
Example Group0 contains 14 sets of 12 values = 168 numbers
Data starts at position 4, and ends at position 172,
Then Group1 contains 10 sets of 6 values = 60 values
Data starts at position 173 and ends at position 232
I have worked out the start and end positions of each data group.
The number of values in a sub-group are pre-solved values of some mathematical transform formula on empiracle data (as far as I can tell this whole data table is like old-fashioned log tables -you just look up the values you want) the user calls it pcd or pcv ('positional calibration data/vector') I think I would need a degree in higher mathematics to fully understand what the data actually means.
So the full query is: query the positional calibration co-ordinates/vectors (x,y,z, vx,vy,vz) or (x,y, vx,vy) for the Nth transform of Group A, Block B, File C
However as I said earlier, I the application itself takes care of the co-ordiantes and the transform and the Group - I can get at the required Group if I want, but I still have to return an array of 1018 float values - all the non-required group values could be set to zero, only populating the required group.
November 24, 2008 at 8:26 am
Hi Jeff
I really appreciate you looking at this. Posting this makes me think more about what I'm doing, and your replies meant I had to ask the user some more meaninful questions. I've had the problem for about 3 weeks - the user is in no rush as the old file system works - but they have no Fortran support, so they want it done eventually.
Now that I've got the base level of data out of the user, I can start devising a structure
something like
CREATE TABLE Transform(
pKey int identity,
FK_SubGroupID int
FK_GroupID int,
FK_BlockID int
FK_FileID int
x float,
y float,
z float,
vx float,
vy float
vz float )
Then Tables to link to the SubGroups, Groups, Blocks and Files
... it probably needs a bit more work - and some indexes.
November 24, 2008 at 12:32 pm
I've gone for simplicity, and abandoned any attempt at making sense of the data's internal structure. Its just a Block of float values this way
CREATE TABLE BlockRef (
pKey int identity(1,1) PRIMARY KEY,
FileNo int not null,
BlockNo int not null,
itemCount int not null )
CREATE TABLE Block (
pKey int identity(1,1),
fk_BlockRef int not null FOREIGN KEY REFERENCES BlockRef(pKey),
sequence int not null -- sequence of the float value in the original file block,
fValue float )
-- Then to select an array
CREATE PROC GetDataBlock
@FileNo int,
@BlockNo int
AS
SELECT fValue
FROM Block
WHERE fk_BlockRef = (
SELECT pKey FROM BlockRef
WHERE FileNo = @FileNo
AND BlockNo = @BlockNo )
ORDER BY sequence
I think this will work and I've started on the C program to populate the data structure
I'll put some indexes on it too.
November 24, 2008 at 10:33 pm
Ok... I've got the split done. I just need to "smear" the Group number, SubGroup number, and a couple of other things throughout the "matrix". Since this isn't a rush and I actually have to get up in 5 hours, I'll continue on this bad boy tomorrow.
Write the C code if you want... I'm still gonna do it in SQL because it's all just data and it's fun, besides. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
November 25, 2008 at 8:29 am
This sounds like an opportunity for CLR to me.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 25, 2008 at 6:59 pm
It sure does... let's have a race when we're done... CLR vs T-SQL.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 26, 2008 at 9:21 am
Jeff Moden (11/25/2008)
It sure does... let's have a race when we're done... CLR vs T-SQL.
Unfortunately I can't be the one to code it. 🙁 I have lots of obligations already for the holiday weekend and then will be out of the country for 2 weeks on a rather intense client visit.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 26, 2008 at 3:05 pm
I've got a bit of spare time these days - I'm doing this Fortran replacement thing between the itty bitty niggley things that crop up.
I've not looked at the CLR yet - I have some C code to write the data, can't be too hard to make it C#
maybe this my big opportunity 😉
lets see ... C# probably stores a double in 8 bytes, assume its the same IEEE format as C (and probably the same as SQL float - or is that too much to ask)
so a 1020 array will be 8160 bytes that can be my CLR custom data type
SQL equivalent can just hold a bit bucket varbinary(8160) per data block.
I'm just thinking aloud here :ermm: - I need to go read something about the CLR :doze:
November 26, 2008 at 3:46 pm
1) I recommend Professional SQL Server 2005 CLR Programming by WROX
2) SQL 2008 will allow for large CLR types if needed. I didn't review the needs carefully, but I wonder if a structure is best here or a simple array of some flavor.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 26, 2008 at 3:59 pm
:w00t::w00t::w00t:
The one by Derek Comingore and Douglas Hinson!
By amazingly stupendous coincidence I have that book - I've had it (unopened :blush: ) for over a year now. Thanks for the suggestion - never thought of looking on my bookshelf.
Being a bit of a hack and slay kind of programmer (especially at this time of night) - I'm just skimming it looking for something similar to what I want.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply