May 25, 2010 at 8:25 am
Hello,
I have an issue where I am storing various international characters in nvarchar columns, but need to branch the data at one point of processing so that ASCII characters are run through an additional cleansing process and all non-ASCII characters are set aside.
Is there a way to identify which nvarchar values are within the ASCII range and can be converted to varchar without corruption? Also, the strings may contain a mix of english and international character sets, so the entire string must be checked and not just the first character.
i.e.
Pass:
'Hello', 'abc123'
Fail:
'太平市', 'abc太123'
Any help would be appreciated and let me know if examples would make this more clear.
Regards,
Chris
May 25, 2010 at 8:50 am
I know CAST will allow you to change it to a VARCHAR, and you'll end up with a variety of ? in the items for non-ASCII characters.
Perhaps you can somehow search for those placeholders and if they aren't there (CHARINDEX=0), you can do your processing?
May 25, 2010 at 9:19 am
Thanks Steve, this may be an option if we cannot figure out a better way around this. I just need to confirm that we are not allowing ? characters to exist in any of the strings I am cleansing and, if so, create some type of internal process for escaping these out so they do not trigger the 'international' flag.
Appreciate the help.
May 25, 2010 at 9:33 am
proof of concept for Steve's idea;i thought that was kind of neat:
--results:
Status cnv id val id val
------ ------------------------------ ----------- ------- ----------- -------
Pass Hello 1 Hello 1 Hello
Pass abc123 2 abc123 2 abc123
Fail ??? 3 ??? 3 ???
Fail abc?123 4 abc?123 4 abc?123
the test code:
SELECT 1 as id,N'Hello' As val
INTo #TMP UNION ALL
SELECT 2 as id,N'abc123' UNION ALL
SELECT 3 as id,N'???' UNION ALL
SELECT 4 as id,N'abc?123'
SELECT
CASE
WHEN t1.val = CONVERT(varchar,t2.val)
THEN 'Pass'
ELSE 'Fail'
END As Status,
CONVERT(varchar,t2.val) As cnv,*
FROM #TMP t1
LEFT OUTER JOIN #TMP t2
ON t1.Id = t2.ID
Lowell
May 25, 2010 at 10:00 am
Thanks for the code Lowell, I have been playing around with this myself and it looks like it is going to work out well for my problem.
Thanks again both of you for the help!
May 27, 2015 at 9:03 am
Also, once failures are found, you could distinguish actual '?' values from Unicode values and get the percentage of Unicode characters with ...
SELECT t.val
, UnicodeStringLen = len(t.val)
, UnicodeNonQmarkChars = len(replace(t.val,'?',''))
, NonQmarkChars = len(replace(convert(varchar, t.val),'?',''))
, UnicodeChars = len(replace(t.val,'?',''))
- len(replace(convert(varchar, t.val),'?',''))
, PercentUnicodeChars =
(LEN(replace(t.val,'?','')) - len(replace(convert(varchar, t.val),'?','')))
* 100.0 / len(t.val)
FROM #TMP t
WHERE t.val <> CONVERT(varchar,t.val) --FAILURES ONLY
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply