Identifying ASCII characters in NVARCHAR columns

Question

Identifying ASCII characters in NVARCHAR columns

chrismalone1

SSChasing Mays

Points: 642
More actions
May 25, 2010 at 8:25 am

#235539

Hello,
I have an issue where I am storing various international characters in nvarchar columns, but need to branch the data at one point of processing so that ASCII characters are run through an additional cleansing process and all non-ASCII characters are set aside.
Is there a way to identify which nvarchar values are within the ASCII range and can be converted to varchar without corruption? Also, the strings may contain a mix of english and international character sets, so the entire string must be checked and not just the first character.
i.e.
Pass:
'Hello', 'abc123'
Fail:
'太平市', 'abc太123'
Any help would be appreciated and let me know if examples would make this more clear.
Regards,
Chris

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 736494 More actions · Answer 1

I know CAST will allow you to change it to a VARCHAR, and you'll end up with a variety of ? in the items for non-ASCII characters.

Perhaps you can somehow search for those placeholders and if they aren't there (CHARINDEX=0), you can do your processing?

chrismalone1 SSChasing Mays Points: 642 More actions · Answer 2

Thanks Steve, this may be an option if we cannot figure out a better way around this. I just need to confirm that we are not allowing ? characters to exist in any of the strings I am cleansing and, if so, create some type of internal process for escaping these out so they do not trigger the 'international' flag.

Appreciate the help.

Lowell SSC Guru Points: 323518 More actions · Answer 3

proof of concept for Steve's idea;i thought that was kind of neat:

--results:

Status cnv id val id val

------ ------------------------------ ----------- ------- ----------- -------

Pass Hello 1 Hello 1 Hello

Pass abc123 2 abc123 2 abc123

Fail ??? 3 ??? 3 ???

Fail abc?123 4 abc?123 4 abc?123

the test code:

SELECT 1 as id,N'Hello' As val

INTo #TMP UNION ALL

SELECT 2 as id,N'abc123' UNION ALL

SELECT 3 as id,N'???' UNION ALL

SELECT 4 as id,N'abc?123'

SELECT

CASE

WHEN t1.val = CONVERT(varchar,t2.val)

THEN 'Pass'

ELSE 'Fail'

END As Status,

CONVERT(varchar,t2.val) As cnv,*

FROM #TMP t1

LEFT OUTER JOIN #TMP t2

ON t1.Id = t2.ID

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

chrismalone1 SSChasing Mays Points: 642 More actions · Answer 4

Thanks for the code Lowell, I have been playing around with this myself and it looks like it is going to work out well for my problem.

Thanks again both of you for the help!

debrbender SSC Enthusiast Points: 107 More actions · Answer 5

Also, once failures are found, you could distinguish actual '?' values from Unicode values and get the percentage of Unicode characters with ...

SELECT t.val

, UnicodeStringLen = len(t.val)

, UnicodeNonQmarkChars = len(replace(t.val,'?',''))

, NonQmarkChars = len(replace(convert(varchar, t.val),'?',''))

, UnicodeChars = len(replace(t.val,'?',''))

- len(replace(convert(varchar, t.val),'?',''))

, PercentUnicodeChars =

(LEN(replace(t.val,'?','')) - len(replace(convert(varchar, t.val),'?','')))

* 100.0 / len(t.val)

FROM #TMP t

WHERE t.val <> CONVERT(varchar,t.val) --FAILURES ONLY