Splitting and aggregating a field with commas

Question

Splitting and aggregating a field with commas

paully21

SSC Enthusiast

Points: 102
More actions
August 22, 2005 at 11:55 am

#89513

I have a table with a field that contains comma-delimited data. Here is a sample of two rows and the values for each field
cat, dog, mouse
mouse, dog, fox, hen
I need to get a list of each (animal) and a count for each as well

Desired result
cat 1
dog 2
mouse 2
fox 1
hen 1

The table will contain at most 500 rows and there are at most 10 possibile values.
Thanks!
Paully

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 1

I guess this might not be the answer you want, but you should really change the design. That is just not proper design and you will spend much more time trying to create queries such as the one you request than what it would take to fix the design one-time only.

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 2

i'd do it like this:

1. read yr orig SQL tbl into a DataSet, then to Table object (ADO.NET)

2. use RegEx object to match all occurrences of yr targeted strings ('dog' etc) in the Table 0bject (VB.NET); create a Match Collection (array)

3. output the array as a tbl back to SQL Server; add an identity field to the tbl (ADO.NET)

4. group all the matches in the tbl to c how many of each there is. (SQL Server)

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

Why not just correct the design on the server one time and then just do a simple agregate query?

Farrell Keough SSCoach Points: 17414 More actions · Answer 4

Here is one attempt: [Note, I have inserted the values you provided - you will want to insert your own values from your table....]

[Remi is correct. You may want to use this code to correct the design and put the values in a physical table rather than the #TempTables I create...]

SET NOCOUNT ON

DECLARE @ListID integer,

@PipeList varchar(255),

@Delimiter char(1),

@LeftDelimter smallint,

@Word varchar(50),

@List varchar(255)

CREATE TABLE #Parse( ListID integer IDENTITY(1,1),

List varchar(255))

-- you would actually insert all of your values from your table here...

--INSERT INTO #Parse SELECT * FROM #Table

INSERT INTO #Parse( List) VALUES( 'cat, dog, mouse')

INSERT INTO #Parse( List) VALUES( 'mouse, dog, fox, hen')

CREATE TABLE #List( Word varchar(255) NOT NULL)

SELECT @ListID = (SELECT MIN( ListID) FROM #Parse)

WHILE @ListID <= (SELECT MAX( ListID) FROM #Parse)

BEGIN

SELECT @PipeList = (SELECT List FROM #Parse WHERE ListID = @ListID)

SELECT @LeftDelimter = 1,

@Delimiter = ',',

@List = @Delimiter + @PipeList + @Delimiter

WHILE CHARINDEX( @Delimiter, @List, @LeftDelimter + 1) > 0

BEGIN

SELECT @Word = SUBSTRING( @List, @LeftDelimter + 1, CHARINDEX( @Delimiter, @List, @LeftDelimter + 1) - ( @LeftDelimter + 1))

IF LTRIM( RTRIM( @Word)) <> ''

BEGIN

INSERT INTO #List( Word) VALUES( @Word)

END

SELECT @LeftDelimter = CHARINDEX( @Delimiter, @List, @LeftDelimter + 1)

END

SELECT @ListID = (SELECT MIN( ListID) FROM #Parse WHERE ListID > @ListID)

END

SELECT Word, COUNT( Word)

FROM #List

GROUP BY Word

DROP TABLE #Parse

DROP TABLE #List

I wasn't born stupid - I had to study.

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 5

Q to Farrell:

pls explain what's going to happen it a delimiter pattern changes from ', ' to say ' , ' in some parts of the text? WIll yr code still be working OK?

Thanks

Farrell Keough SSCoach Points: 17414 More actions · Answer 6

I did find a problem. The insert line should read:

INSERT INTO #List( Word) VALUES( RTRIM( LTRIM( @Word)))

You do need to trim leading and trailing values as you correctly point out...

I wasn't born stupid - I had to study.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

No he needs to correct the design... this is just the beginning of his problems.

And yes farrell, this is one problem where no set solution exists .

Farrell Keough SSCoach Points: 17414 More actions · Answer 8

Thanks! I did agree, this should be re-designed. But my code may help do that by inserting into a Physical table rather than a #TempTable. The rest is as you said, a simple select, count, and group...

I wasn't born stupid - I had to study.

paully21 SSC Enthusiast Points: 102 More actions · Answer 9

Okay...I thought of the redesign too, but the situation warrants this design. Why is it that everyone always assumes there is no place for this design?

I got the solution I select a list of possible values and then select the count for each of those values all in ado.net. It is a procedure that will only be executed once a day for like a week.

The reason the design is required is complicated, but it is part of a survey generating/tabulating application. The possible question types are radio button, textbox, dropdown list and checkboxes. The checkboxes cause the disdain, but as I said, it really isn't a high production application so this solution is fine.

Thanks anyway,

Paul

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 10

Sergei Zarembo

Hall of Fame

Points: 3887

August 22, 2005 at 1:32 pm

#584039

ADO.NET rocks

Farrell Keough SSCoach Points: 17414 More actions · Answer 11

Because these problems arise more than you would think. Once something like this gets in place, it can creep in and become part of final design. Then you have to jump all these hoops to get simple questions answered.

I have a number of legacy systems that have these kinds of problems. It is a real booger to change something that other systems depend upon. And when someone new comes in to try and read and understand the code - phewey!!

I wasn't born stupid - I had to study.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

Let me put it this way, if this had been designed correctly in the first place you wouldn't even have had to come here for help and you'd've been done a few hours ago. Even if this is not critical to your compagny now, it may well become someday and then you're screwed.

Denormalization is a no-no unless you're using a reporting perm table that holds data for a few minutes/hours at the time. Other than that you're better off banishing that design style from your repertoire.

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 13

what about a situation when the data is coming from outside ( say as some lousy ecxel file) and u just need to run a quick stats on it (seems to be the case here). Does it warrant the down-to-bare-bones normalization and redisign of the data, if all u need is just 50 smth lines of code to process?

Thanks,

Sergei

Sergei Zarembo Hall of Fame Points: 3887 More actions · Answer 14

the problem is the data is coming from all directions and it's never of the same structure. I'm talking about text processing.