January 21, 2016 at 8:32 am
I need to find all foreign alphabet characters in a column. It might Russian, Chinese, ect.
January 21, 2016 at 8:42 am
TJT (1/21/2016)
I need to find all foreign alphabet characters in a column. It might Russian, Chinese, ect.
This isn't very clear what you want here. Are you wanting to write a query that will return only certain characters in a string or just trying to determine if the string has certain characters? What would be the rule for determining if it is a "foreign" language? The ASCII value perhaps? If so, what is the threshold where it would be considered one of the characters you are looking for?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 21, 2016 at 8:42 am
TJT (1/21/2016)
I need to find all foreign alphabet characters in a column. It might Russian, Chinese, ect.
What's your question?
As we do not know where you are from, we do not know what 'foreign' means to you.
Please provide some sample data along with your desired results.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
January 21, 2016 at 8:45 am
Do you want each character? Or the rows that contain the foreign characters?
January 21, 2016 at 8:46 am
I want to find any non latin chartacters: cyriliic, chineese, Japanese, ect
January 21, 2016 at 8:49 am
TJT (1/21/2016)
I want to find any non latin chartacters: cyriliic, chineese, Japanese, ect
Identify them or simply remove them?
😎
January 21, 2016 at 8:50 am
Just identify them
January 21, 2016 at 9:04 am
TJT (1/21/2016)
Just identify them
Meaning you want them returned? Do you need any other information? This would be a LOT simpler if you would tell us what you want instead of us trying to extract it from you.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 21, 2016 at 9:09 am
When you say "foreign", I assume you mean identifying any characters not belonging to the Latin General language. Maybe I'm way off base, but as a first stab, experiment to see if comparing the column against a version of itself explicitly casted to SQL_Latin1_General_CP1_CI_AS will at least result in identifying those values containing non Latin General characters.
select LastName
from Employee
where LastName != (cast (LastName as nvarchar(100))
collate SQL_Latin1_General_CP1_CI_AS);
https://msdn.microsoft.com/en-us/library/ms143726(v=sql.110).aspx
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 21, 2016 at 9:40 am
Yes I mean identifying any characters not belonging to the Latin General language.
Your query picks up some, but still misses others.
January 21, 2016 at 10:09 am
TJT (1/21/2016)
Yes I mean identifying any characters not belonging to the Latin General language.Your query picks up some, but still misses others.
This is my last attempt at trying. Can you please provide us some sample data and expected output? Unless you provide the details to us we are all guessing.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 21, 2016 at 10:15 am
TJT (1/21/2016)
Yes I mean identifying any characters not belonging to the Latin General language.Your query picks up some, but still misses others.
Try to imagine yourself reading this thread for the first time.
Would you consider 'still misses others' to be sufficiently detailed for people to know what to look at? Provide examples.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
January 21, 2016 at 11:36 am
TJT (1/21/2016)
Just identify them
Here is how
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @TEST_STRING NVARCHAR(4000) = N'ABCDEFG?Ô?1234567Ò???HIJKLM]? ?NOP??60987654321';
;WITH T(N) AS (SELECT N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@TEST_STRING)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
SELECT
NM.N AS POS
,SUBSTRING(@TEST_STRING,NM.N,1) AS EXTENDED_CHAR
FROM NUMS NM
WHERE UNICODE(SUBSTRING(@TEST_STRING,NM.N,1)) > 255;
Output
POS EXTENDED_CHAR
---- -------------
8 ?
10 ?
19 ?
21 ?
29 ?
31 ?
36 ?
January 21, 2016 at 1:24 pm
Expanding on what Eirikur put together, here's how you could do it against a table (note my comments).
-- Sample data
DECLARE @table TABLE(stringID int identity primary key, stringTxt NVARCHAR(4000));
INSERT @table(stringTxt) VALUES(N'abc'),(N'123?'),('xyz'),(N'xxx??');
-- to get the string with the location of the bad character
WITH T(N) AS (SELECT N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
SELECT StringID, Location = N, BadChar = SUBSTRING(t.stringTxt,CAST(NM.N AS int),1)
FROM @table t
CROSS APPLY
(
SELECT TOP(CAST(LEN(t.stringTxt) AS bigint)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM T T1,T T2,T T3,T T4
) NM(N)
WHERE UNICODE(SUBSTRING(t.stringTxt,CAST(NM.N AS int),1)) > 255;
-- to simply get the a list of strings that contain the bad character
WITH T(N) AS (SELECT N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
SELECT StringID
FROM @table t
CROSS APPLY
(
SELECT TOP(CAST(LEN(t.stringTxt) AS bigint)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM T T1,T T2,T T3,T T4
) NM(N)
WHERE UNICODE(SUBSTRING(t.stringTxt,CAST(NM.N AS int),1)) > 255
GROUP BY stringID;
-- Itzik Ben-Gan 2001
January 21, 2016 at 2:19 pm
I tried
SELECT DISTINCT PGNAME
FROM INSTALLS
WHERE PGNAME != (CAST (PGNAME as NVARCHAR(100)) collate SQL_Latin1_General_CP1_CI_AS)
This returned rows like:
???????? ????? ??? ????? ?????????? ?????? ???????
Thermo Fisher Scientific Inc.
But it failed to find rows like:
???????
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply