How to Handle "Wide" Tables?...

Question

How to Handle "Wide" Tables?...

Steve Rosenbach

SSCrazy

Points: 2181
More actions
August 16, 2004 at 4:39 pm

#360683
We have a database with a number of "wide" tables. That is, tables with one or more very wide varchar columns. For example, one table includes a Description column and a Notes column, both varchar(4000).
Besides the danger of exceding the maximum record length (which the application should, but doesn't currently, guard against,) there is the issue of tables like this leading to excessive I/O, since not that many records can fit on an 8K data page.
Some of the things I'm considering include:
1. Change the BIG varchar() columns to text (most of the time, there would be very little updating of these large fields.)
2. Split the big varchar() columns out to separate tables
For the most part, the application follows a paradigm of (1) show a list, then (2) pick a single record from the list.
Data displayed on a list doesn't include any data from the big varchar() fields. It's just when you dive down to a single record that you need to retrieve and display "description", "note", etc information.
I would appreciate your thoughts and experience on how to approach this part of the database redesign.
Thanks!
SteveR
Steve Rosenbach
Arnold, MD

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

I'll probably get a huge number of arguments from this but you're really opening yourself up to a world of hurt if you move into the TEXT datatype. They're difficult to update and most STRING functions don't work with them at all. Splitting the big descriptions into a seperate table may be an answer... that way you could use up to 8000 characters without the disadvantages of TEXT datatype fields. It would also keep the server from having to wade through them until you found the "right ID".

If you want to really see a nightmare, lookup about how to reference and update TEXT fields... you'll never want to go there again. Don't get me wrong, they do have their uses... but I avoid them like I avoid large reptiles.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Julian Kuiters SSCertifiable Points: 6585 More actions · Answer 2

Jeff is right TEXT is a whole different ball game. It operates in a similar way to the BINARY and IMAGE types, tagging bunches of separate data pages together to make a blob. Hard to use.

Maybe make your table more relational and put the data displayed in the list in one table, with an ID. and have another table with either/both the Notes and Description columns, referenced back to the parent table.

But really, are you worried about the data exceeding the data page length, or the excessive i/o? If you are worried about the I/O, make sure your data list is only selecting the columns and rows it needs from the database.

Julian Kuiters
juliankuiters.id.au

Steve Rosenbach SSCrazy Points: 2181 More actions · Answer 3

Jeff and Julian - thank you for your insight!

>>If you are worried about the I/O, make sure your data list is only selecting the columns and rows it needs from the database.<<

But if the average record length is, say 1000 bytes, there will only by seven or eight records on a page. Sometimes the searches that our users do bring back a hundred or more rows. With so few records per page, won't there be a lot of i/o even if I limit the columns for the search results list to only the ones I need?

Thanks again,

SteveR

Paul Mu SSCertifiable Points: 5547 More actions · Answer 4

Assuming that (i) the average length of the data within the two wide fields are relatively large (ie occupying most of the 4000 varchar field and is a common occurrence for that table), and (ii) there are many records, then it would be wise to create a separate table for the two columns and join back to the parent table.

Since the parent table is the one that is searched on mostly, this action will reduce the IO for queries against the parent table.

If you find that the average length of the data is realtively small, then the varchar data type will prevent the concerns that you may have regarding the creation of too many pages due to the wide fields.

Paul

Corie Curcillo SSC-Addicted Points: 418 More actions · Answer 5

I know this may be an oversimplified suggestion, but are they really using all 4000 CHARS? I experienced a similar problem recently. After doing a SELECT LEN(), I found that of the 10 mil plus rows only twice(2) did the actual field length exceed 520 chars. And in those cases, the application was being misused to support a business process that should have been handled elsewhere.

Corie Curcillo
MCT, MCDBA, MCSD

Don Tanner SSC Enthusiast Points: 104 More actions · Answer 6

If you are using sql2000 you might want to read up on text fields. I use them all the time. If the text field is less than 8K, you can use them just as you would varchar's in inserts, selects and updates. If you only manage them using either ADO, OLE DB or ODBC, you can handle up to 2GB of data in them with no problem. The hard stuff is handled in the background.

Quoting BOL:

"If an ntext, text, and image data value is no longer than a Unicode, character, or binary string (4,000 characters, 8,000 characters, 8,000 bytes respectively), the value can be referenced in SELECT, UPDATE, and INSERT statements much the same way as the smaller data types."

My application never has a text field longer than about 4K to 6k and I have never had a problem.

Don

Steve Rosenbach SSCrazy Points: 2181 More actions · Answer 7

Hi Corie,

You're absolutely right - looking at a specific table with two of these varchar(4000) columns, if found that the average length for one column was 383 and for the other only 75.

BTW, I also looked at the average total record length, both with and without the two varchar(4000) columns and found

490 bytes WITH the 2 varchar(4000) columns - about 16 records per data page

91 bytes WITHOUT the 2 varchar(4000) columns - about 87 records per data page - 5.25 more

Steve Rosenbach SSCrazy Points: 2181 More actions · Answer 8

Thanks for the info, Dan -- it's good to know that text columns haven't given you any problems.

Combining your input with what Jeff said, it sound that as long as I don't need to manipulate the text data, that is by using LIKE, right, left, etc (which I don't), then text columns may work well in my situation and have the positive effect of making the table considerably narrower.

Regards,

SteveR

Coach James Ten Centuries Points: 1272 More actions · Answer 9

OK now that you know TEXT data is not an option!

Performance of TEXT can't compare with varchar, look at this way. Varchar is a set of pointers and text is a grouping of pointers. No matter how large a varchar is, it still only stores two references.

Let me help you out, on your redesign: These varchar(4000) fields are only used rarely, right? So, setup a table field expansion. What is this you say?

Here’s how it works:

1. Create scripts to find which varchars are being used for existing data (once per each table and field).

2. Create a table called “varchar_table” with the fields identity integer, tbl_name varchar(100), fld_name varchar(100), tbl_record integer, fld_data varchar(4000). Or better yet, create type tables to store the tbl_name within tbl_type, and fld_name within fld_type; which stores an integer representation, which is better (and retrieve by joining the two), so varchar_table is created as identity int, tbl_id int, fld_id int, tbl_record int, fld_data varchar(4000). Create unique index varchar_table_01 on varchar_table(tbl_id, fld_id, tbl_record);

3. Duplicate your varchar(4000) where found within the varchar_table. This field can be dropped later, once the new table, and all new logic is completed and verified. NOTE: Now only one record is being made when data is actually used and a link is created within the varchar_table using tbl_record pointer. By including the table and field, this one table can be used for large varchar across all other tables only when needed. Another table can be created for text only, image only, etc… Performance wise, this makes a huge impact on the majority of all your queries.

4. Now verify your data, by comparing an old query to a new query selection: By using an embedded select like this within your existing SQL Select (to get the varchar(4000) data, only if found, and associated by the table, field, and record identities):

“(select ISNULL(VT.fld_data,’’) from varchar_table VT(nolock), fld_type FT(nolock), tbl_type TT(nolock)

where TT.tbl_name = ‘TableName’ /* gets an integer for tablename */

and VT.tbl_id = TT.tbl_id

and FT.fld_name = ‘FieldName’ /* gets an integer for fieldname */

and VT.fld_id = FT.fld_id

and VT.tbl_record = ‘identityfield’) FieldName,“ /* this is the pointer back to the old table */

NOTE: It is important to use ISNULL for replacing NOT FOUND rows with blank data. Also insert, update, and delete, need new logic for maintaining varchar data located within another table. New structure is now master detail relationship, and row inserts requires that the identity from the parent table be known, prior to inserting a row with its corresponding varchar(4000) equivalent. Updates and deletes are easy, since updates are simple replacements and a delete is empty, null, or blank. Enjoy!

Regards,

Coach James