Storing arrays in SQL.

  • Hi,

    I've been set the task of converting an app that uses DOS B-Tree style databases to SQL. One thing I'm finding is that the existing databases make heavy use of arrays.

    i.e. 12 individual records of a customers monthly spend are not stored in 12 seperate fields but are just blatted into one field. The existing code is effectively streaming a structure to the database and back again when reading it.

    So my dilema is - is there a field type in SQL that may help my conversion? Or should I resign myself to having 12 seperate fields.

    There's a nightmare database to do, which contains an array of 1024 IDs. Now that I do no want in a table!!!!!!

    Thanks for any help.

    Martin

  • Martin,

    There's no array type structure in SQL. It sort of violates relational concepts.

    If I were you I'd resign myself to to the slightly more difficult task of creating child tables for the repeating data. However, in the long run you'll find you'll be able to process that data (updating or querying) quite easily.

    Cheers,

    - Mark


    Cheers,
    - Mark

  • Thanks for that. I was mulling over the idea of including a text field into the record that contains the data in a comma delimited format.

    My class can then stream the data in and out, and cope with variable lengths of arrays.

    I'm hoping (because I'm no expert), that this will have less of a performance hit than referencing a child table.

    What do you think?

  • Stop. Desist. No! please, please don't.

    I've seen too many similar designs (including those by me) that have been created with the same good intentions. It just doesn't work well within a relational DBMS.

    Cheers,

    - Mark


    Cheers,
    - Mark

  • Hi Martin,

    quote:


    I've been set the task of converting an app that uses DOS B-Tree style databases to SQL. One thing I'm finding is that the existing databases make heavy use of arrays.

    i.e. 12 individual records of a customers monthly spend are not stored in 12 seperate fields but are just blatted into one field. The existing code is effectively streaming a structure to the database and back again when reading it.

    So my dilema is - is there a field type in SQL that may help my conversion? Or should I resign myself to having 12 seperate fields.

    There's a nightmare database to do, which contains an array of 1024 IDs. Now that I do no want in a table!!!!!!


    as mccork has written, data in a relational dbms is atomic, meaning one field can only one value, not more.

    What might be an alternative are post-relational, object-relational or object-oriented dbms (please, don't ask what is what ) like http://www.postgressql.org

    Cheers,

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Unfortunately like most of us, I'm having to work with the tools provided. And in this case it's SQL server.

    Having mulled it over, I think I will keep to streaming the data into a variable length field. The length of the arrays are so variable that I would need upto 20 separate child tables to store this information.

    And I have to look at this from a maintenance point of view. So unless there's some real hardcore objections or sound advice not to, I'm going to keep with streaming into text fields.

    I'm restricted by the legacy code on this matter. Personally anyone storing arrays of data in a database like this should be shot...............I'll fetch my gun....

  • quote:


    The length of the arrays are so variable that I would need upto 20 separate child tables to store this information.


    I can't imagine a situation that you are describing that would require 20 separate tables. Could you describe an example of how you'd need to put the array into the tables? For instance, this array contains these fields, which would be stored in these tables...

  • Well there are existing databases with simple arrays of integers for example. The length of these arrays vary, some are 4 elements, others are 12, 32 or the extreme 1024.

    The more complex ones, are arrays of structures, so we have a whole range of different types that need to be stored into the table. i.e:

    struct

    {

    char AdPriceBrand[ 4 ] ;

    char AdSource ;

    int AdLimit ;

    long ScratchNo ;

    } AdStock[ 12 ]

    Now some of these more complex structs have been rightly turned into their own tables. But when I come to a struct containing just two members, and there's plenty of them, I really don't want to muck around generating yet another table.

    There's a finite amount of time to do the conversion (isn't there always), and generating and managing 200+ tables really doesn't sound ideal to me.

  • Well, I certainly see your point. Time is always an issue, and I don't envy your position. I think your best solution might just be the serialize/unserialize solution you have now. Good luck!

  • One more point, though. If you have the need to do searching on your database for any of these "array fields", your task just got incredibly difficult...may be something to consider.

  • Mmmm, I think that I've pulled out all the data that I need into searchable fields. Worst comes to the worst, I can always create these child tables for any arrays that do need searching on.

    Legacy systems, you've gotta love 'em.

  • Actually, a thought has just occurred, and I'll have to rely on your saged advice again.

    Is there any way of turning these arrays into XML to store into SQL, which could perhaps then in turn be searched on????

  • You can use a C XML Parser to convert the array to XML, but I don't see any value in storing the XML in SQL Server. You'll just add bloat to the fields, and there isn't any gain in searching ability (AFAIK) by having the data represented in XML. If you already have the mechanisms built in C to do searching across the STRUCTs, I'd stick with that. If you don't, you could think about using a C XML Parser's seek functionality as your search/seek engine...just some thoughts.

  • Something that no one has mentioned is that you could use a single table that looks something like:

    create table legacy_struct (

    struct_name varchar(255) , -- the class name of the struct

    struct_instance int , -- an identifier for the instace of the above class

    field_idx int , -- identifies the field number in the struct (array index)

    n_val numeric(10,18) , -- stores numeric values

    s_val nvarchar(1000) , -- stores string/date values

    data_type_id int , -- fk to type child table

    s_len int -- length of string data (if needed)

    )

    The data_type_id field would point to a child table that tells you which of the two data fields, n_val or s_val, the data is stored in, and also what it was meant to be cast into, such as: int, money, float, etc for the n_val field; varchar, datetime, etc. for the s_val field. You could probably dispense with the n_val field on use just the s_val, but I prefer to store numeric data in a real numeric type when possible.

    Now, it can be a beast to get things out of this, and some queries have to be written somewhat cleverly to avoid conversion errors (espcially when storing dates). The upside though, is that you can store anything in there, so long as it doesn't over run the maximum field size available, which I wouldn't anticipate being a problem anyway.

    I have personally used this technique to store responses to a set of questions that could not be predicted when the database was architected. Rather than require somewhat more dangerous schema updates to add new types of questions, I implemented this with a couple of supporting tables so that new questions could be specified purely by adding new records to existing tables. It's probably a bit hard to show someone new how to use it, but once you get the hang of it, it's not too hard to deal with and it's extremely flexible. This complexity could be mitigated by writing a small application to handle the actual creation of the needed rows so that someone new wouldn't have to know too much about how the system worked; there's not much you could do to make querying it easier though.

    Matthew Galbraith

  • Oh, and there's no problem with searching it. You just create some indexes that cover the struct_name, field_idx, and n_val or s_val fields. Searching is another reason I like to have a separate field for numeric data; that way I don't have to put the data field inside a convert function so SQL can still use an index to search it. It might also be a good idea to create a separate field for dates if you are going to do a lot of searches on date data as well.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply