storing CSV in a column vs. table with records

Question

storing CSV in a column vs. table with records

stephen99999

Ten Centuries

Points: 1197
More actions
January 18, 2012 at 1:19 pm

#255193

So simple question on which is better: storing CSVs in a column vs. table with records.
**Doing some fine tuning of an old system, an have never ran across a db where CSVs were stored in a column**
**The CSVs are from questionnaires with like something like, "How much pain do you have? 1-10, 1 being 'none' and 10 'hurts bad'"**
**Each value in the CSV is a different question
having a table like so:
UserId CSValues
23 1,2,3,4,5
24 2,3,4,5,6
25 8,7,6,5,4
or using a one to many relationship (creating necessary tables):
User_Id Question_Id Answer
1 1 6
1 2 5
1 3 8
2 1 3
2 2 3
From my past experiences, I have always learned to break such a situation up into one-to-many and many-to-many relationships and creating the necessary tables. Just trying to figure out why a past dev would have done this, so that I better understand, and maybe learn something new 🙂

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 1

The one-to-many is a more normalized view of the data and would be more 'correct'. But , you could argue that the CSV route could be right for your system , if you are treating it as an atomic unit of data within the database.

Clear Sky SQL
My Blog[/url]

stephen99999 Ten Centuries Points: 1197 More actions · Answer 2

Dave Ballantyne (1/18/2012)
The one-to-many is a more normalized view of the data and would be more 'correct'. But , you could argue that the CSV route could be right for your system , if you are treating it as an atomic unit of data within the database.

Currently the system does not do anything really with these questionnares (which are completed over and over again) results, i.e. no trending analysis, etc.

But, I would think that when that day comes, it would be more efficient (code-wise and maybe processing time) to have it in a more normalized solution (above). What do you think?

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 3

Yes, the one-to-many is what you should be aiming for. Its always better to over-engineer than under-engineer.

Clear Sky SQL
My Blog[/url]

Brandie Tarvin SSC Guru Points: 173083 More actions · Answer 4

stephen99999 (1/18/2012)
Just trying to figure out why a past dev would have done this, so that I better understand, and maybe learn something new 🙂

There are a couple of possibilities.

1) he was lazy or didn't know any better.

2) he was overwhelmed with emergencies and didn't have time to code it properly. Then he forgot to fix it.

3) he let the Business Users bully him into doing it their way.

Being polite, I'd go with the "didn't know any better" option.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

sturner SSC-Insane Points: 22411 More actions · Answer 5

I have seen way too much of this delimited data in on varchar() column crap. It is typically implemented by a programmer who knows about RBAR processing with VB or ASP instead of set based relational processing.

The one exception would be if the specification clearly stated that the data would always be an atomic entity and never used otherwise. Unfortunately the inevitable happens and the requirement to query this data (or even worse, update it) comes up. Then you have hideous performance & fragmentation issues and/or ugly code to deal with it.

The probability of survival is inversely proportional to the angle of arrival.

Brandie Tarvin SSC Guru Points: 173083 More actions · Answer 6

CELKO (1/20/2012)
Anyone who is this bad will also have left you with really bad code in other places; you need to clean out the whole damn thing. This is my experiences over the last three decades of doing SQL.

I would have to agree with this sentiment. This can't be the only horrible legacy you have from the previous coder, so keep your eyes peeled for other monsters lurking in the code shadows.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 7

Yet, ironically, people let other people store XML in their database. Go figure. 😉

If you change the structure of the table now, be advised that you'll also have to change everything that touches it. Although I agree that the original design is terrible and leaving it as it is constitutes an "SQL sin", until you know what the cost of changing everything else is, leave it be. Although it's certainly not the most effecient thing in the world, it can be temporarily normalized as a working table using the high speed T-SQL splitter at the following URL or the CLR splitter found at the same URL.

http://www.sqlservercentral.com/articles/Tally+Table/72993/

And, no... I'm not advocating CSV columns at all. I'm simply saying that you don't know what the cost of changing it to be correct is and, considering the workaround, wait to change it until you know all of the ramifications.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)