September 17, 2009 at 9:37 am
Is there a way to identify if a unicode column, such as Forename (nvarchar), contains any non basic latin characters?
For example I want to be able to find forenames like Öla, Åke or Jørgen, but not John, Jane or Stan.
September 17, 2009 at 10:55 am
sample data below, i was trying LIKE '%[^A-Z,a-z] %' but that's not working for me yet; still playing
create table names(
namesid int identity(1,1) not null primary key,
TheName nvarchar(60) )
insert into names(TheName)
SELECT N'Öla' UNION ALL
SELECT N'Åke' UNION ALL
SELECT N'Jørgen' UNION ALL
SELECT N'John' UNION ALL
SELECT N'Jane' UNION ALL
SELECT N'Stan'
Lowell
September 17, 2009 at 12:23 pm
i fiddled with COLLATE but couldn't find anything that works, so i reverted to a tally table to test true/false for High Ascii characers.
someone might have a better solution, but this works:
--results namesid TheName HasHighAscii ----------- ------------------------------------------------------------ ------------ 1 Öla 1 2 Åke 1 3 Jørgen 1 4 John 0 5 Jane 0 6 Stan 0
CREATE FUNCTION IsHighAscii(@OriginalText VARCHAR(max))
RETURNS int
BEGIN
DECLARE @HighCharFound int
SET @HighCharFound = 0
;WITH tally (N) as
(SELECT TOP 1000000 row_number() OVER (ORDER BY sc1.id)
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2)
SELECT @HighCharFound = @HighCharFound +
CASE
--ascii numbers are 48(for '0') thru 57 (for '9')
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 1 AND 126
THEN 0
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 127 AND 255
THEN 1
ELSE 0
END
FROM Tally
WHERE Tally.N 0
SET @HighCharFound= 1
RETURN @HighCharFound
END
GO
create table names(
namesid int identity(1,1) not null primary key,
TheName nvarchar(60) )
insert into names(TheName)
SELECT N'Öla' UNION ALL
SELECT N'Åke' UNION ALL
SELECT N'Jørgen' UNION ALL
SELECT N'John' UNION ALL
SELECT N'Jane' UNION ALL
SELECT N'Stan'
select *,dbo.IsHighAscii(TheName) As HasHighAscii from names
Lowell
September 18, 2009 at 9:07 am
Thanks! That will do nicely to just identify people.
The LIKE '%[^A-Z,a-z]%' works on non-latin based languages so is kind of useful for something else
🙂
January 6, 2010 at 11:38 am
So if I want to find only Names with non0latin characters - I should use this SQL?
select *, dbo.IsHighAscii(TheName) As HasHighAscii from names where HasHighAscii = 1
January 6, 2010 at 11:46 am
i believe you are correct, you can use it to find non latin characters. I'd need to test it against more varied data; is it working for you as expected? does your name table return any non latin results?
Lowell
January 6, 2010 at 3:42 pm
Compiled this function and tried to run it against AdventureWorks database.
Got error:
Msg 208, Level 16, State 1, Line 1
Invalid object name 'Tally'.
:crying:
January 6, 2010 at 4:24 pm
i copied and pasted the example, and it works fine; it might be the case-sensitive collation of your database...i noticed i spelled it "tally" in one spot, and "Tally" in the rest....change that and it should work fine.
since it is using a CTE, it might also be the database compatibility level, but even when my local database was set to 80, it still worked.
Lowell
January 6, 2010 at 4:55 pm
select *
from names
where TheName like '%[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]%'
Why mess about with functions and stuff?
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply