finding non alphanumeric characters

  • Anyone have a good way of finding non-alphanumeric characters in text/character fields without having to do a query for every possible one a user could input?

    It has to be handled at ETL time since the CMS is not capable of preventing them.

    Any thoughts, tips or tricks most welcome.  Thanks Gang!

    Skål - jh

  • SELECT * FROM dbo.Table WHERE ColName NOT LIKE '%[^a-z0-9A-Z]%')

    This returns all VALID data.  Space and punctuation signs are considered invalid in this search, you can simply add them at the end.

    You can remove the 2nd A-Z if your data is not in case sensitive collation.

  • Thank you Ninja, but I think I'm still missing your point.

    that query does not return anything but

    SELECT * FROM dbo.Table WHERE ColName LIKE '%#%'

    does, so there are still forbidden characters in there I'm not catching.

    thanks gang!

    Skål - jh

  • This is a pattern match using double negativity :

    NOT LIKE '%[^a-z0-9A-Z]%')

    =

    NOT (any character outside a alphabetical letter or number returns true)

     

    So even a space in that pattern match will FILTER OUT the ROW.

  • Yes thank you, understand the NOT part, so it should return names with "#" in them; but still does not seem to work right, it appears to be trying to match the sting literally, not operating as a regex.

    Does that ) at the end mean your actually using a function?  Have seen mention of integrating the CLR into a TSLQ function, that may be the way to go.

    I appreciate the help!

    Skål - jh

  • IF OBJECT_ID ('tempdb.dbo.#Demo') > 0

     DROP TABLE #Demo

    GO

    CREATE TABLE #Demo (ID INT NOT NULL IDENTITY(1,1), Name VARCHAR(50) NOT NULL)

    INSERT INTO #Demo (Name)

    SELECT '0123456789'

    UNION ALL

    SELECT 'abcdefghijklmnopqrstuvwxyz'

    UNION ALL

    SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

    UNION ALL

    SELECT '0.a'

    UNION ALL

    SELECT '0,a'

    UNION ALL

    SELECT '0éa'

    UNION ALL

    SELECT '0!a'

    UNION ALL

    SELECT '0?a'

    UNION ALL

    SELECT '0''a'

    UNION ALL

    SELECT '0#a'

    UNION ALL

    SELECT '0+a'

    SELECT 'pass' AS Result, * FROM #Demo WHERE Name NOT LIKE '%[^0-9A-Z]%'

    SELECT 'fail' AS Result, * FROM #Demo WHERE Name LIKE '%[^0-9A-Z]%'

    GO

    IF OBJECT_ID ('tempdb.dbo.#Demo') > 0

     DROP TABLE #Demo

    GO

  • Thank you.

    Skål - jh

  • You might want to check it out more closely, it appears that many more characters are allowed than I first taught.

    Build a ascii table with all characters and see for yourself how many more characters go through that filter.

  • You could also do...

    SELECT Column FROM Table WHERE ISNUMERIC(Column) = 1


  • Hmmm..

    ISNUMERIC('$1') = 1

    ISNUMERIC('1E5') = 1

    ISNUMERIC('1D0') = 1

    ISNUMERIC('$1.8') = 1

     


    N 56°04'39.16"
    E 12°55'05.25"

  • I have created a function to replace unacceptable characters as below.

    This can probably be modified to meet your needs...

    IF EXISTS (SELECT 1 FROM sysobjects WHERE type = 'FN' AND name = 'fn_replace_invalidchars')

    DROP FUNCTION dbo.fn_replace_invalidchars

    GO

    -- ===================================================================================

    -- Function Name: fn_replace_invalidchars

    -- Description: Replaces all characters other than those in validcharacters list

    -- Arguments: string - String to be replaced

    -- validchars - List of valid characters

    -- Default: '- a-zA-Z0-9'

    -- replacechar - Character that you want invalid replaced with

    -- Default: ''

    -- Return: String invalid characted removed/replaced

    -- Author: Biju Nair

    -- Create date: 06/29/2007

    -- Description:Function to return invalid characters replaced

    -- ===================================================================================

    CREATE FUNCTION dbo.fn_replace_invalidchars

    ( @string VARCHAR(8000),

    @validchars VARCHAR(100),

    @replacechar VARCHAR(1))

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @return VARCHAR(4000),

    @invalidchars VARCHAR(100)

    SELECT @return = @string

    SELECT @replacechar = ISNULL(@replacechar, '')

    SELECT @invalidchars = '%[^' + ISNULL(@validchars,'- a-zA-Z0-9') + @replacechar + ']%'

    WHILE PATINDEX(@invalidchars, @return) != 0

    BEGIN

    SELECT @return = REPLACE(@return, SUBSTRING(@return, PATINDEX(@invalidchars, @return), 1), '.')

    SELECT @return = REPLACE(@return,'.',@replacechar)

    END

    RETURN LTRIM(RTRIM(@return))

    END

    GO

  • Try this...

    Select * from dbo.YourTable

    where SomeCol like '%[^ -~]%' collate Latin1_General_BIN

    the key here is to specify a binary collation type so that the like-mask works as you would expect.

    the like mask is is composed of: NOT space thru tilda

    Fiddle with things to tailor it to your environment...



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • thanks everyone, good stuff!

    Skål - jh

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply