August 22, 2005 at 11:55 am
I have a table with a field that contains comma-delimited data. Here is a sample of two rows and the values for each field
cat, dog, mouse
mouse, dog, fox, hen
I need to get a list of each (animal) and a count for each as well
Desired result
cat 1
dog 2
mouse 2
fox 1
hen 1
The table will contain at most 500 rows and there are at most 10 possibile values.
Thanks!
Paully
August 22, 2005 at 12:57 pm
I guess this might not be the answer you want, but you should really change the design. That is just not proper design and you will spend much more time trying to create queries such as the one you request than what it would take to fix the design one-time only.
August 22, 2005 at 12:58 pm
i'd do it like this:
1. read yr orig SQL tbl into a DataSet, then to Table object (ADO.NET)
2. use RegEx object to match all occurrences of yr targeted strings ('dog' etc) in the Table 0bject (VB.NET); create a Match Collection (array)
3. output the array as a tbl back to SQL Server; add an identity field to the tbl (ADO.NET)
4. group all the matches in the tbl to c how many of each there is. (SQL Server)
August 22, 2005 at 12:59 pm
Why not just correct the design on the server one time and then just do a simple agregate query?
August 22, 2005 at 12:59 pm
Here is one attempt: [Note, I have inserted the values you provided - you will want to insert your own values from your table....]
[Remi is correct. You may want to use this code to correct the design and put the values in a physical table rather than the #TempTables I create...]
SET NOCOUNT ON
DECLARE @ListID integer,
@PipeList varchar(255),
@Delimiter char(1),
@LeftDelimter smallint,
@Word varchar(50),
@List varchar(255)
CREATE TABLE #Parse( ListID integer IDENTITY(1,1),
List varchar(255))
-- you would actually insert all of your values from your table here...
--INSERT INTO #Parse SELECT * FROM #Table
INSERT INTO #Parse( List) VALUES( 'cat, dog, mouse')
INSERT INTO #Parse( List) VALUES( 'mouse, dog, fox, hen')
CREATE TABLE #List( Word varchar(255) NOT NULL)
SELECT @ListID = (SELECT MIN( ListID) FROM #Parse)
WHILE @ListID <= (SELECT MAX( ListID) FROM #Parse)
BEGIN
SELECT @PipeList = (SELECT List FROM #Parse WHERE ListID = @ListID)
SELECT @LeftDelimter = 1,
@Delimiter = ',',
@List = @Delimiter + @PipeList + @Delimiter
WHILE CHARINDEX( @Delimiter, @List, @LeftDelimter + 1) > 0
BEGIN
SELECT @Word = SUBSTRING( @List, @LeftDelimter + 1, CHARINDEX( @Delimiter, @List, @LeftDelimter + 1) - ( @LeftDelimter + 1))
IF LTRIM( RTRIM( @Word)) <> ''
BEGIN
INSERT INTO #List( Word) VALUES( @Word)
END
SELECT @LeftDelimter = CHARINDEX( @Delimiter, @List, @LeftDelimter + 1)
END
SELECT @ListID = (SELECT MIN( ListID) FROM #Parse WHERE ListID > @ListID)
END
SELECT Word, COUNT( Word)
FROM #List
GROUP BY Word
DROP TABLE #Parse
DROP TABLE #List
I wasn't born stupid - I had to study.
August 22, 2005 at 1:07 pm
Q to Farrell:
pls explain what's going to happen it a delimiter pattern changes from ', ' to say ' , ' in some parts of the text? WIll yr code still be working OK?
Thanks
August 22, 2005 at 1:08 pm
I did find a problem. The insert line should read:
INSERT INTO #List( Word) VALUES( RTRIM( LTRIM( @Word)))
You do need to trim leading and trailing values as you correctly point out...
I wasn't born stupid - I had to study.
August 22, 2005 at 1:12 pm
No he needs to correct the design... this is just the beginning of his problems.
And yes farrell, this is one problem where no set solution exists .
August 22, 2005 at 1:17 pm
Thanks! I did agree, this should be re-designed. But my code may help do that by inserting into a Physical table rather than a #TempTable. The rest is as you said, a simple select, count, and group...
I wasn't born stupid - I had to study.
August 22, 2005 at 1:28 pm
Okay...I thought of the redesign too, but the situation warrants this design. Why is it that everyone always assumes there is no place for this design?
I got the solution I select a list of possible values and then select the count for each of those values all in ado.net. It is a procedure that will only be executed once a day for like a week.
The reason the design is required is complicated, but it is part of a survey generating/tabulating application. The possible question types are radio button, textbox, dropdown list and checkboxes. The checkboxes cause the disdain, but as I said, it really isn't a high production application so this solution is fine.
Thanks anyway,
Paul
August 22, 2005 at 1:32 pm
ADO.NET rocks
August 22, 2005 at 1:38 pm
Because these problems arise more than you would think. Once something like this gets in place, it can creep in and become part of final design. Then you have to jump all these hoops to get simple questions answered.
I have a number of legacy systems that have these kinds of problems. It is a real booger to change something that other systems depend upon. And when someone new comes in to try and read and understand the code - phewey!!
I wasn't born stupid - I had to study.
August 22, 2005 at 1:43 pm
Let me put it this way, if this had been designed correctly in the first place you wouldn't even have had to come here for help and you'd've been done a few hours ago. Even if this is not critical to your compagny now, it may well become someday and then you're screwed.
Denormalization is a no-no unless you're using a reporting perm table that holds data for a few minutes/hours at the time. Other than that you're better off banishing that design style from your repertoire.
August 22, 2005 at 2:00 pm
what about a situation when the data is coming from outside ( say as some lousy ecxel file) and u just need to run a quick stats on it (seems to be the case here). Does it warrant the down-to-bare-bones normalization and redisign of the data, if all u need is just 50 smth lines of code to process?
Thanks,
Sergei
August 22, 2005 at 2:02 pm
the problem is the data is coming from all directions and it's never of the same structure. I'm talking about text processing.
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply