July 31, 2007 at 10:11 am
Anyone have a good way of finding non-alphanumeric characters in text/character fields without having to do a query for every possible one a user could input?
It has to be handled at ETL time since the CMS is not capable of preventing them.
Any thoughts, tips or tricks most welcome. Thanks Gang!
Skål - jh
July 31, 2007 at 10:57 am
SELECT * FROM dbo.Table WHERE ColName NOT LIKE '%[^a-z0-9A-Z]%')
This returns all VALID data. Space and punctuation signs are considered invalid in this search, you can simply add them at the end.
You can remove the 2nd A-Z if your data is not in case sensitive collation.
July 31, 2007 at 11:06 am
Thank you Ninja, but I think I'm still missing your point.
that query does not return anything but
SELECT * FROM dbo.Table WHERE ColName LIKE '%#%'
does, so there are still forbidden characters in there I'm not catching.
thanks gang!
Skål - jh
July 31, 2007 at 11:28 am
This is a pattern match using double negativity :
NOT LIKE '%[^a-z0-9A-Z]%')
=
NOT (any character outside a alphabetical letter or number returns true)
So even a space in that pattern match will FILTER OUT the ROW.
July 31, 2007 at 11:52 am
Yes thank you, understand the NOT part, so it should return names with "#" in them; but still does not seem to work right, it appears to be trying to match the sting literally, not operating as a regex.
Does that ) at the end mean your actually using a function? Have seen mention of integrating the CLR into a TSLQ function, that may be the way to go.
I appreciate the help!
Skål - jh
July 31, 2007 at 12:03 pm
IF OBJECT_ID ('tempdb.dbo.#Demo') > 0
DROP TABLE #Demo
GO
CREATE TABLE #Demo (ID INT NOT NULL IDENTITY(1,1), Name VARCHAR(50) NOT NULL)
INSERT INTO #Demo (Name)
SELECT '0123456789'
UNION ALL
SELECT 'abcdefghijklmnopqrstuvwxyz'
UNION ALL
SELECT 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
UNION ALL
SELECT '0.a'
UNION ALL
SELECT '0,a'
UNION ALL
SELECT '0éa'
UNION ALL
SELECT '0!a'
UNION ALL
SELECT '0?a'
UNION ALL
SELECT '0''a'
UNION ALL
SELECT '0#a'
UNION ALL
SELECT '0+a'
SELECT 'pass' AS Result, * FROM #Demo WHERE Name NOT LIKE '%[^0-9A-Z]%'
SELECT 'fail' AS Result, * FROM #Demo WHERE Name LIKE '%[^0-9A-Z]%'
GO
IF OBJECT_ID ('tempdb.dbo.#Demo') > 0
DROP TABLE #Demo
GO
July 31, 2007 at 12:51 pm
Thank you.
Skål - jh
July 31, 2007 at 12:59 pm
You might want to check it out more closely, it appears that many more characters are allowed than I first taught.
Build a ascii table with all characters and see for yourself how many more characters go through that filter.
August 1, 2007 at 6:29 am
August 1, 2007 at 6:58 am
Hmmm..
ISNUMERIC('$1') = 1
ISNUMERIC('1E5') = 1
ISNUMERIC('1D0') = 1
ISNUMERIC('$1.8') = 1
N 56°04'39.16"
E 12°55'05.25"
August 2, 2007 at 3:31 pm
I have created a function to replace unacceptable characters as below.
This can probably be modified to meet your needs...
IF EXISTS (SELECT 1 FROM sysobjects WHERE type = 'FN' AND name = 'fn_replace_invalidchars')
DROP FUNCTION dbo.fn_replace_invalidchars
GO
-- ===================================================================================
-- Function Name: fn_replace_invalidchars
-- Description: Replaces all characters other than those in validcharacters list
-- Arguments: string - String to be replaced
-- validchars - List of valid characters
-- Default: '- a-zA-Z0-9'
-- replacechar - Character that you want invalid replaced with
-- Default: ''
-- Return: String invalid characted removed/replaced
-- Author: Biju Nair
-- Create date: 06/29/2007
-- Description:Function to return invalid characters replaced
-- ===================================================================================
CREATE FUNCTION dbo.fn_replace_invalidchars
( @string VARCHAR(8000),
@validchars VARCHAR(100),
@replacechar VARCHAR(1))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @return VARCHAR(4000),
@invalidchars VARCHAR(100)
SELECT @return = @string
SELECT @replacechar = ISNULL(@replacechar, '')
SELECT @invalidchars = '%[^' + ISNULL(@validchars,'- a-zA-Z0-9') + @replacechar + ']%'
WHILE PATINDEX(@invalidchars, @return) != 0
BEGIN
SELECT @return = REPLACE(@return, SUBSTRING(@return, PATINDEX(@invalidchars, @return), 1), '.')
SELECT @return = REPLACE(@return,'.',@replacechar)
END
RETURN LTRIM(RTRIM(@return))
END
GO
August 3, 2007 at 8:26 am
Try this...
Select * from dbo.YourTable
where SomeCol like '%[^ -~]%' collate Latin1_General_BIN
the key here is to specify a binary collation type so that the like-mask works as you would expect.
the like mask is is composed of: NOT space thru tilda
Fiddle with things to tailor it to your environment...
August 6, 2007 at 11:44 am
thanks everyone, good stuff!
Skål - jh
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply