August 11, 2014 at 2:27 am
Hi,
we have a text response system that stores ratings received from the public and stores the string value in a field in a database where it asks the public to rate the assistance received.
I have been tasked to retrieve the scores and create a report.
my problem is the responses received are not in a specific format.
i have written many case statements etc etc to cater for this but the responses are so different that i am having issues.
the text question is similar to: please rate the service received out of 10
a few examples of responses are as follows:
2 -> easy to extract value
4 -> easy to extract value
10/10 Excellent. **** guided me well through the process. -> returns a 1
BROOKS NO SPC14060519047 GIVES **** **** 9 OUT OF 10 . -> returns a 1
-5 (minus 5) -> done and returs -5
9 NINE -> done returns 9
five -> done returns 5
as you can see the responses vary and i am struggling to accomplish the task which is due today.
Does anyone have any ideas for me or have you by chance encountered this before and can assist?
Thank you,
Nevarda
August 11, 2014 at 2:39 am
Nevarda
Have your text response system only accept whole numbers between 0 and 10. If customers want to provide comments and feedback as well, put that in a separate field. If that isn't possible, I think your only alternative is to have someone go through the responses one by one and manually extract the scores .
John
August 11, 2014 at 3:14 am
Hi John,
Thanks for your response, however, the table unfortunately cannot be changed and we cant have someone sift through tens of thousands of responses a week.
the problem i seem to be facing now is that as i tackle one scenario and cater for it, that solution breaks another condition.
I'll keep on trying to cater for as many different scenarios as i can but if someone does have any other ideas i'd greatly appreciate them.
Thanks again,
Nevarda
August 11, 2014 at 3:31 am
nevarda (8/11/2014)
the text question is similar to: please rate the service received out of 10
Well then the answer should be a list of option buttons where the user should only be able to select 1 of the pre-defined answers. That way you wouldn't be having to clean the data up.
As this is not the case, you're going to have some issues I think. I suspect that the best bet is going to be to feed this through an SSIS package or CLR procedure, with some regular expressions. I think that trying to clean this in T-SQL, whilst possible, will be probably end with some truly unmanageable code that when it breaks will be terrible to debug.
August 11, 2014 at 3:35 am
There's really no way out of this. You have a GIGO issue.
You're going to have to build data strippers. First, fix the front end before this continues to happen. Next, start building filters. Find everything that falls out, and look for a new pattern. Include that into the filter, and rinse/repeat. Hopefully you can get down to a few hundred that can be hand cleansed.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 11, 2014 at 7:35 am
Thanks for the advice guys.
I ended up putting together a table valued function that splits up every word stores it
I then had another function that looks for the lowest value with a few checks in it.
for reference, here are the 2 functions:
Splitwords:
GO
/****** Object: UserDefinedFunction [dbo].[udf_SplitWords] Script Date: 08/11/2014 15:21:10 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[udf_SplitWords](@text varchar(8000))
RETURNS @words TABLE (
pos smallint primary key,
value varchar(8000)
)
AS
BEGIN
DECLARE
@pos smallint,
@i smallint,
@j-2 smallint,
@s-2 varchar(8000)
SET @pos = 1
WHILE @pos <= LEN(@text)
BEGIN
SET @i = CHARINDEX(' ', @text, @pos)
SET @j-2 = CHARINDEX(',', @text, @pos)
IF @i > 0 OR @j-2 > 0
BEGIN
IF @i = 0 OR (@j > 0 AND @j-2 < @i)
SET @i = @j-2
IF @i > @pos
BEGIN
-- @i now holds the earliest delimiter in the string
SET @s-2 = SUBSTRING(@text, @pos, @i - @pos)
INSERT INTO @words
VALUES (@pos, @s-2)
END
SET @pos = @i + 1
WHILE @pos < LEN(@text)
AND SUBSTRING(@text, @pos, 1) IN (' ', ',')
SET @pos = @pos + 1
END
ELSE
BEGIN
INSERT INTO @words
VALUES (@pos, SUBSTRING(@text, @pos, LEN(@text) - @pos + 1))
SET @pos = LEN(@text) + 1
END
END
RETURN
END
which is used in this function:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:Nevarda
-- Create date: 2014-08-11
-- Description:Extract the rating out of an SMS reply
-- =============================================
ALTER FUNCTION [dbo].[Udf_getlowestnumsms] (@string VARCHAR(max))
RETURNS INT
AS
BEGIN
DECLARE @Rate INT
DECLARE @temp TABLE
(
id INT IDENTITY(1, 1),
word VARCHAR(8000)
)
INSERT INTO @temp
(word)
SELECT value
FROM [dbo].Udf_splitwords (Replace(Replace(Replace(@string, '/', ' '), '.', ''), ')', ''))
DECLARE @output TABLE
(
value INT
)
DECLARE @word VARCHAR(200)
DECLARE @i INT = 0
DECLARE @n INT = (SELECT Count(1)
FROM @temp)
--select * from @temp
WHILE @i < @n
BEGIN
SET @i = @i + 1
SET @word = (SELECT word
FROM @temp
WHERE id = @i)
IF Isnumeric(@word) = 1
BEGIN
IF Len(@word) <= 2
BEGIN
INSERT INTO @output
VALUES (@word)
END
END
IF Isnumeric(@word) = 0
BEGIN
SET @word = CASE Lower(@word)
WHEN 'one' THEN '1'
WHEN 'two' THEN '2'
WHEN 'three' THEN '3'
WHEN 'four' THEN '4'
WHEN 'five' THEN '5'
WHEN 'six' THEN '6'
WHEN 'seven' THEN '7'
WHEN 'eight' THEN '8'
WHEN 'nine' THEN '9'
WHEN 'ten' THEN '10'
ELSE NULL
END
INSERT INTO @output
VALUES (@word)
END
END
SELECT @Rate = (SELECT Min(value) Rating
FROM @output)
RETURN @rate
END
to further filter and get rid of a few errors i simply used another function that, if this returns a null, uses a standard get numeric function to get a value
a simple example of how to use it is:
CaseWHEN dbo.udf_GetLowestNumSMS(T2.ReceivedData) IS NULL
THEN [dbo].[ufn_GetNumeric1](T2.ReceivedData)
ELSE dbo.udf_GetLowestNumSMS(T2.ReceivedData)
END Rating
I know it ends up slowing the entire process down drastically calling multiple functions or a function multiple times but at the end of the day, this is the closest solution myself and my colleague could come up with which seems to cater for almost every scenario we have so far
I hope that, in case someone else encounters this issue, the above will assist them
thanks again,
Nevarda
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply