Extracting Numerical values from a survey response

  • Hi,

    we have a text response system that stores ratings received from the public and stores the string value in a field in a database where it asks the public to rate the assistance received.

    I have been tasked to retrieve the scores and create a report.

    my problem is the responses received are not in a specific format.

    i have written many case statements etc etc to cater for this but the responses are so different that i am having issues.

    the text question is similar to: please rate the service received out of 10

    a few examples of responses are as follows:

    2 -> easy to extract value

    4 -> easy to extract value

    10/10 Excellent. **** guided me well through the process. -> returns a 1

    BROOKS NO SPC14060519047 GIVES **** **** 9 OUT OF 10 . -> returns a 1

    -5 (minus 5) -> done and returs -5

    9 NINE -> done returns 9

    five -> done returns 5

    as you can see the responses vary and i am struggling to accomplish the task which is due today.

    Does anyone have any ideas for me or have you by chance encountered this before and can assist?

    Thank you,

    Nevarda

  • Nevarda

    Have your text response system only accept whole numbers between 0 and 10. If customers want to provide comments and feedback as well, put that in a separate field. If that isn't possible, I think your only alternative is to have someone go through the responses one by one and manually extract the scores .

    John

  • Hi John,

    Thanks for your response, however, the table unfortunately cannot be changed and we cant have someone sift through tens of thousands of responses a week.

    the problem i seem to be facing now is that as i tackle one scenario and cater for it, that solution breaks another condition.

    I'll keep on trying to cater for as many different scenarios as i can but if someone does have any other ideas i'd greatly appreciate them.

    Thanks again,

    Nevarda

  • nevarda (8/11/2014)


    the text question is similar to: please rate the service received out of 10

    Well then the answer should be a list of option buttons where the user should only be able to select 1 of the pre-defined answers. That way you wouldn't be having to clean the data up.

    As this is not the case, you're going to have some issues I think. I suspect that the best bet is going to be to feed this through an SSIS package or CLR procedure, with some regular expressions. I think that trying to clean this in T-SQL, whilst possible, will be probably end with some truly unmanageable code that when it breaks will be terrible to debug.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • There's really no way out of this. You have a GIGO issue.

    You're going to have to build data strippers. First, fix the front end before this continues to happen. Next, start building filters. Find everything that falls out, and look for a new pattern. Include that into the filter, and rinse/repeat. Hopefully you can get down to a few hundred that can be hand cleansed.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks for the advice guys.

    I ended up putting together a table valued function that splits up every word stores it

    I then had another function that looks for the lowest value with a few checks in it.

    for reference, here are the 2 functions:

    Splitwords:

    GO

    /****** Object: UserDefinedFunction [dbo].[udf_SplitWords] Script Date: 08/11/2014 15:21:10 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER FUNCTION [dbo].[udf_SplitWords](@text varchar(8000))

    RETURNS @words TABLE (

    pos smallint primary key,

    value varchar(8000)

    )

    AS

    BEGIN

    DECLARE

    @pos smallint,

    @i smallint,

    @j-2 smallint,

    @s-2 varchar(8000)

    SET @pos = 1

    WHILE @pos <= LEN(@text)

    BEGIN

    SET @i = CHARINDEX(' ', @text, @pos)

    SET @j-2 = CHARINDEX(',', @text, @pos)

    IF @i > 0 OR @j-2 > 0

    BEGIN

    IF @i = 0 OR (@j > 0 AND @j-2 < @i)

    SET @i = @j-2

    IF @i > @pos

    BEGIN

    -- @i now holds the earliest delimiter in the string

    SET @s-2 = SUBSTRING(@text, @pos, @i - @pos)

    INSERT INTO @words

    VALUES (@pos, @s-2)

    END

    SET @pos = @i + 1

    WHILE @pos < LEN(@text)

    AND SUBSTRING(@text, @pos, 1) IN (' ', ',')

    SET @pos = @pos + 1

    END

    ELSE

    BEGIN

    INSERT INTO @words

    VALUES (@pos, SUBSTRING(@text, @pos, LEN(@text) - @pos + 1))

    SET @pos = LEN(@text) + 1

    END

    END

    RETURN

    END

    which is used in this function:

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    -- Author:Nevarda

    -- Create date: 2014-08-11

    -- Description:Extract the rating out of an SMS reply

    -- =============================================

    ALTER FUNCTION [dbo].[Udf_getlowestnumsms] (@string VARCHAR(max))

    RETURNS INT

    AS

    BEGIN

    DECLARE @Rate INT

    DECLARE @temp TABLE

    (

    id INT IDENTITY(1, 1),

    word VARCHAR(8000)

    )

    INSERT INTO @temp

    (word)

    SELECT value

    FROM [dbo].Udf_splitwords (Replace(Replace(Replace(@string, '/', ' '), '.', ''), ')', ''))

    DECLARE @output TABLE

    (

    value INT

    )

    DECLARE @word VARCHAR(200)

    DECLARE @i INT = 0

    DECLARE @n INT = (SELECT Count(1)

    FROM @temp)

    --select * from @temp

    WHILE @i < @n

    BEGIN

    SET @i = @i + 1

    SET @word = (SELECT word

    FROM @temp

    WHERE id = @i)

    IF Isnumeric(@word) = 1

    BEGIN

    IF Len(@word) <= 2

    BEGIN

    INSERT INTO @output

    VALUES (@word)

    END

    END

    IF Isnumeric(@word) = 0

    BEGIN

    SET @word = CASE Lower(@word)

    WHEN 'one' THEN '1'

    WHEN 'two' THEN '2'

    WHEN 'three' THEN '3'

    WHEN 'four' THEN '4'

    WHEN 'five' THEN '5'

    WHEN 'six' THEN '6'

    WHEN 'seven' THEN '7'

    WHEN 'eight' THEN '8'

    WHEN 'nine' THEN '9'

    WHEN 'ten' THEN '10'

    ELSE NULL

    END

    INSERT INTO @output

    VALUES (@word)

    END

    END

    SELECT @Rate = (SELECT Min(value) Rating

    FROM @output)

    RETURN @rate

    END

    to further filter and get rid of a few errors i simply used another function that, if this returns a null, uses a standard get numeric function to get a value

    a simple example of how to use it is:

    CaseWHEN dbo.udf_GetLowestNumSMS(T2.ReceivedData) IS NULL

    THEN [dbo].[ufn_GetNumeric1](T2.ReceivedData)

    ELSE dbo.udf_GetLowestNumSMS(T2.ReceivedData)

    END Rating

    I know it ends up slowing the entire process down drastically calling multiple functions or a function multiple times but at the end of the day, this is the closest solution myself and my colleague could come up with which seems to cater for almost every scenario we have so far

    I hope that, in case someone else encounters this issue, the above will assist them

    thanks again,

    Nevarda

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply