July 14, 2009 at 1:34 am
I have some requirements to normalise data in a table for the purpose of reporting (I'm not allowed to modify the original table). At the moment, the databases are a mixture of SQL 7 and 2000 but will all be 2000 in next month or so, so a 2000-only solution would be fine.
We have a variable number of surveys, each with a variable number of questions, each with a variable number of predefined multiple-choice answers. Unfortunately, the answers are stored in a semicolon-delimited field, in which case the first item is answer #1, second is answer #2, the nth is answer #n (the survey responses are stored with the answer number).
/*
If tally table doesn't exist, create it with minimum rows needed
for this test
*/
SELECT TOP 260 IDENTITY(INT, 1, 1) n
INTO Tally
FROM syscolumns sc1, syscolumns sc2
SELECT * FROM [Tally]
/*
Table and sample data (these aren't actual survey answers - just made up
for the purpose of this post!)
*/
CREATE TABLE SurveyAnswer
(
SurveyCode INT,
QuestionCode INT,
Answers VARCHAR(255)
)
INSERT INTO [SurveyAnswer]
SELECT 1, 1, '0 - 10; 11 - 20; 21 - 30; 30+'
UNION ALL
SELECT 1, 2, 'Yes; No; Unknown'
UNION ALL
SELECT 1, 3, 'Male; Female'
UNION ALL
SELECT 1, 4, 'Football; Rugby'
UNION ALL
SELECT 2, 1, 'Under 18; Over 18'
UNION ALL
SELECT 2, 2, 'Tea; Coffee'
UNION ALL
SELECT 2, 3, 'MS SQL; Oracle'
UNION ALL
SELECT 3, 1, 'Smoker; Non-smoker'
UNION ALL
SELECT 3, 2, 'Lager; Wine; Spirits; Other'
UNION ALL
SELECT 3, 3, 'Development; QA; Production'
UNION ALL
SELECT 3, 4, 'Simpsons; Family Guy; Futurama'
UNION ALL
SELECT 3, 5, 'Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday'
I want to split the lists into individual rows, and add the answer number value to each row. I can split the lists easily enough, but I find I am having to loop and use the IDENTITY function to populate the answer numbers.
DECLARE @MaxSurveyCode INT
SELECT @MaxSurveyCode = MAX(SurveyCode) FROM [SurveyAnswer]
DECLARE @MaxQuestionCode INT
SELECT @MaxQuestionCode = MAX(QuestionCode) FROM [SurveyAnswer]
DECLARE @SurveyCode INT
SET @SurveyCode = 1
CREATE TABLE SurveyAnswer2
(
SurveyCode INT,
QuestionCode INT,
AnswerCode INT,
AnswerString VARCHAR(255)
)
WHILE (@SurveyCode <= @MaxSurveyCode)
BEGIN
DECLARE @QuestionCode INT
SET @QuestionCode = 1
WHILE (@QuestionCode <= @MaxQuestionCode)
BEGIN
SELECT
SurveyCode,
QuestionCode,
IDENTITY(INT, 1, 1) AnswerCode,
LTRIM(RTRIM(SUBSTRING(';' + a.[Answers] + ';', n + 1, CHARINDEX(';', ';' + a.[Answers] + ';', n + 1) - n - 1))) AnswerString
INTO
#SurveyAnswerStaging
FROM
[SurveyAnswer] a,
[Tally]
WHERE
LEN(';' + a.[Answers] + ';') > n
AND SUBSTRING(';' + a.[Answers] + ';' , n, 1) = ';'
AND [SurveyCode] = @SurveyCode
AND [QuestionCode] = @QuestionCode
INSERT INTO [SurveyAnswer2]
SELECT * FROM [#SurveyAnswerStaging]
DROP TABLE #SurveyAnswerStaging
SET @QuestionCode = @QuestionCode + 1
END
SET @SurveyCode = @SurveyCode + 1
END
SELECT * FROM [SurveyAnswer2]
DROP TABLE [SurveyAnswer]
DROP TABLE [SurveyAnswer2]
Can anyone give me any tips to improve this?
Cheers,
Simon 🙂
July 14, 2009 at 9:06 am
This returns the same results as your loop code:
/*
If tally table doesn't exist, create it with minimum rows needed
for this test
*/
SELECT TOP 260 IDENTITY(INT, 1, 1) n
INTO Tally
FROM syscolumns sc1, syscolumns sc2
/*
Table and sample data (these aren't actual survey answers - just made up
for the purpose of this post!)
*/
CREATE TABLE SurveyAnswer
(
SurveyCode INT,
QuestionCode INT,
Answers VARCHAR(255)
)
INSERT INTO [SurveyAnswer]
SELECT 1, 1, '0 - 10; 11 - 20; 21 - 30; 30+'
UNION ALL
SELECT 1, 2, 'Yes; No; Unknown'
UNION ALL
SELECT 1, 3, 'Male; Female'
UNION ALL
SELECT 1, 4, 'Football; Rugby'
UNION ALL
SELECT 2, 1, 'Under 18; Over 18'
UNION ALL
SELECT 2, 2, 'Tea; Coffee'
UNION ALL
SELECT 2, 3, 'MS SQL; Oracle'
UNION ALL
SELECT 3, 1, 'Smoker; Non-smoker'
UNION ALL
SELECT 3, 2, 'Lager; Wine; Spirits; Other'
UNION ALL
SELECT 3, 3, 'Development; QA; Production'
UNION ALL
SELECT 3, 4, 'Simpsons; Family Guy; Futurama'
UNION ALL
SELECT 3, 5, 'Sunday; Monday; Tuesday; Wednesday; Thursday; Friday; Saturday'
DECLARE @MaxSurveyCode INT
SELECT @MaxSurveyCode = MAX(SurveyCode) FROM [SurveyAnswer]
DECLARE @MaxQuestionCode INT
SELECT @MaxQuestionCode = MAX(QuestionCode) FROM [SurveyAnswer]
DECLARE @SurveyCode INT
SET @SurveyCode = 1
CREATE TABLE SurveyAnswer2
(
SurveyCode INT,
QuestionCode INT,
AnswerCode INT,
AnswerString VARCHAR(255)
)
INSERT INTO [SurveyAnswer2]
SELECT
SurveyCode,
QuestionCode,
n,--IDENTITY(INT, 1, 1) AnswerCode,
LTRIM(RTRIM(SUBSTRING(';' + a.[Answers] + ';', n + 1, CHARINDEX(';', ';' + a.[Answers] + ';', n + 1) - n - 1))) AnswerString
/* INTO
#SurveyAnswerStaging
*/FROM
[SurveyAnswer] a,
[Tally] T
WHERE
LEN(';' + a.[Answers] + ';') > n
AND SUBSTRING(';' + a.[Answers] + ';' , n, 1) = ';'
ORDER BY QuestionCode
SELECT * FROM [SurveyAnswer2] ORDER BY SurveyCode, QuestionCode
DROP TABLE [SurveyAnswer]
DROP TABLE [SurveyAnswer2]
DROP TABLE [tally]
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 15, 2009 at 3:53 am
Hi Jack,
I'm getting different result sets when I run your query and my query.
Just to clarify, for each question of a survey, I want to split the answers out into separate rows, numbered by their position in the list (not their position in the string). So the answer list '0 - 10; 11 - 20; 21 - 30; 30+' would become:
AnswerCode AnswerString
----------- -------------
1 0 - 10
2 11 - 20
3 21 - 30
4 30+
With your query, I am getting (I think) the position in the string of each ';'.
Cheers,
Simon
July 15, 2009 at 8:05 am
Oops, you are correct, I only verified the rows returned and the 1st 2 columns assuming I had the others correct. Boy, I wish 2000 had row_number().
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
July 19, 2009 at 10:33 pm
Using the test data provided, the following does the trick on the split and the answer number. You can take it from there...
--===== Declare a variable to hold the delimiter
DECLARE @Delim CHAR(1)
SET @Delim = ';'
--===== Do the split with a count
SELECT SurveyCode,
QuestionCode,
AnswerNum = t.N-LEN(REPLACE(LEFT(@Delim+sa.Answers,t.N), @Delim, '')),
Answer = LTRIM(RTRIM(SUBSTRING(@Delim+sa.Answers, t.N+1,
CHARINDEX(@Delim, sa.Answers+@Delim, t.N)-t.N)))
FROM dbo.Tally t
RIGHT OUTER JOIN --Necessary in case Answers is NULL
dbo.SurveyAnswer sa
ON SUBSTRING(@Delim+sa.Answers, t.N, 1) = @Delim
AND t.N < LEN(@Delim+sa.Answers)
--Jeff Moden
Change is inevitable... Change for the better is not.
July 21, 2009 at 1:44 am
Thankyou Jeff, I was having real trouble trying to think of a way to work the question number off the number of preceding semi-colons in the string - hadn't thought about essentially removing them and comparing the length of before and after strings. That's yet another tip of yours I've now made note of to solve similar problems in future. 🙂
July 21, 2009 at 8:38 am
Thanks, Simon. I can't take the credit for the original idea, though.
As a side-bar, you did make my day. So many people take a bit of code and run, never to be seen again. I not only appreciate the fact that you came back but I'm tickled that you took the time to figure out what the code is actually doing and to share your understanding of it. Very well done, Sir. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply