Splitting data at 8000 characters

  • Hi All,

    I have what has become a pretty nasty database project. Basically, I have about 50 locations generating up to 5 reports each every hour. The reports are sent to my server and I have an SSIS chunking through these, writing a record for each report.

    The contents of the report is currently saved to one field which had to be set to VARCHAR(MAX). There are now about 240,000 records which push the database size to over a gig. And this is only about 5 weeks of data...so this is becoming a monster and I'm trying to slim things down. Out of these records, only 3300 are over 8000 characters in length. I'm wondering if I could create two VARCHAR(8000) columns, saving most records to the first column and anything over 8000 in length to the second column. My understanding is that varchar under 8000 will use only as much space as is needed to save the actual characters while VARCHAR(MAX) is more similar to TEXT and uses the same amount of space no matter much actual data is being saved.

    Do you think I could save space this way or am I just trading one nightmare for another? Any insight would be much appreciated.

    Thanks,

    Steve

  • Hi Steve

    SQL Server does not allocate the maximal possible storage for VARCHAR(MAX) nor for TEXT (this would be 2GB per field 😉 ).

    I would even advice to use a TEXT or IMAGE column. VARCHAR(MAX) should be used for data over 8000 characters which need all features of VARCHAR like indexing, quick search, ... . If you primarily want to store data you should use TEXT or IMAGE.

    If you think your database becomes to big (some gigs are usually no problem for SQL Server - depending on the hardware...) you should think about archiving some of the data.

    Greets

    Flo

  • Stay away from TEXT, NTEXT, and IMAGE data types. They have been depreciated in SQL Server 2005 and may not be supported in future versions. For varchar use varchar(max), nvarchar use nvarchar(max), for image and other blobs use varbinary(max).

  • Lynn Pettis (3/16/2009)


    Stay away from TEXT, NTEXT, and IMAGE data types. They have been depreciated in SQL Server 2005 and may not be supported in future versions. For varchar use varchar(max), nvarchar use nvarchar(max), for image and other blobs use varbinary(max).

    Depreciated??? Really? Why? I'd never heared that. I thought they are a good solution for data which should be stored outside of the usual data structure. Especially in combination of full-text indexes for TEXT.

    Thank you very much for teaching! 🙂

    Greets

    Flo

  • Lynn,

    That's exactly what I heard regarding the TEXT datatype - that it would be deprecated and it was better to use VARCHAR(MAX) instead. So really, what I was asking was if there was a way to shrink my records by dividing up the varchar column or some other method where the datatype might not consume as much space.

    We have some databases that are 7-10 gigs right now and our server can handle it, but I would guess that this db will grow 12 gigs or more per year.

    Thanks,

    Steve

  • Hello Steve

    First sorry for my wrong answer! I really didn't know...

    To your storage problem. What about compression? Almost every client programming language supports compression methods (like GZIP). So you can compress the data before you write it into the database and decompress to use it.

    Greets

    Flo

  • First sorry for my wrong answer! I really didn't know...

    No problem...we're all learning something

    To your storage problem. What about compression?

    I've considered that...The process is being done within an SSIS project and I could zip the data before saving. However, I need the ability to search the data for reporting purposes, which is going to hamper that. The powers that be want to be able to go back months at a time, compare days, etc. This is also my reason for wanting to keep the database lean.

    Thanks,

    Steve

  • Hi,

    Maybe reading this article will comfort you:

    http://msdn.microsoft.com/en-us/library/ms189087(SQL.90).aspx

    "Similarly, unless the large value types out of row option is set to ON, varchar(max), nvarchar(max), varbinary(max), and xml columns are stored, if it is possible, inside the data row. If this is the case, the Database Engine tries to fit the specific value if it can, and will push it off-row otherwise. If large value types out of row is set to ON, the values are stored off-row and only a 16-byte text pointer is stored in the record."

    So if I understood your question right you have nothing to worry about as these datatypes allocates dynamically.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply