April 12, 2011 at 2:38 am
All
We have to implement some validation on data sent to us to ensure that we only allow letters (English alphabet) and integers. Some fields also allow other characters. The field is of variable length, so we used the REPLICATE function in the following code to build a string for use by the LIKE operator:
SELECTL6
FROM[tablename]
WHEREL6 NOT LIKE REPLICATE('[0-9A-Z]',LEN(L6))
We thought that this was working fine, but have now found out that this code has allowed through some characters which should not be valid. All of the characters below are deemed by SQL Server 2005 to be between 0 and 9 or A and Z, but should not be allowed by our system:
ƒŠŒ™šœŸª²³¹º¼½¾ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
SQL Server even thinks that the "squared" "²" is the same as "2":
select case when '²'='2' then 'y' else 'n' end
returns "y"
The proposed solution is to build a table containing the ASCII values of the acceptable characters for each field (or "All" where characters are universally accepted), then use lots of SUBSTRING and ASCII functions, similar to this:
SELECTL6
FROM[tablename]
WHERE(ASCII(LEFT(L6,1)) NOT IN (SELECT ASCIICode FROM ValidCharsPerField WHERE FieldNumber IN ('L6','All')))
OR (LEN(L6)>1 AND ASCII(SUBSTRING(L6,2,1)) NOT IN (SELECT ASCIICode FROM ValidCharsPerField WHERE FieldNumber IN ('L6','All')))
OR (LEN(L6)>2 AND ASCII(SUBSTRING(L6,3,1)) NOT IN (SELECT ASCIICode FROM ValidCharsPerField WHERE FieldNumber IN ('L6','All')))
This sample code only checks the first 3 characters; we have up to 80 characters to check, depending on the field, so the WHERE clause is going to be quite long.
Have any of you come across this before?
Is there some way that we can get SQL Server to classify these non-standard characters as not being between 0 to 9 or A to Z, perhaps by amending the collation on the table that the data is imported to?
The code below should be useful if anyone wants to test some code:
create table testdata (L6 varchar(80))
insert into testdata
select '12345' as data union all --should pass
select 'test4' union all --should pass
select 'kj&98' union all --should fail
select 'tËst' union all --should fail
select 'ad9½A' union all --should fail
select '5m²' --should fail
CREATE TABLE ValidCharsPerField
(FieldNumber VARCHAR(5),
PrintableCharacter CHAR(1),
ASCIICode int,
CharacterType VARCHAR(6))
declare @loopcounter int
--insert numbers
set @loopcounter=48
while @loopcounter<58
begin
INSERT INTO ValidCharsPerField SELECT 'All',char(@loopcounter),@loopcounter,'Number'
set @loopcounter=@loopcounter+1
end
--insert letters
set @loopcounter=65
while @loopcounter<91
begin
INSERT INTO ValidCharsPerField SELECT 'All',char(@loopcounter),@loopcounter,'Letter'
set @loopcounter=@loopcounter+1
end
--insert more letters
set @loopcounter=97
while @loopcounter<123
begin
INSERT INTO ValidCharsPerField SELECT 'All',char(@loopcounter),@loopcounter,'Letter'
set @loopcounter=@loopcounter+1
end
Thanks all
Alun
April 12, 2011 at 2:40 am
Sorry, I should have said that we are currently using Latin1_General_CI_AS collation.
Thanks
April 12, 2011 at 3:40 am
Would a binary collation work?
SELECT L6
FROM testdata
WHERE L6 LIKE '%[^0-9A-Za-z]%' COLLATE Latin1_General_BIN
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537April 12, 2011 at 6:30 am
Hi.
Yes, that works perfectly. I think I'll need to read up about binary collations. I'd never heard of them!
Thanks very much for your help
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply