Simple LIKE with wildcard

Question

Simple LIKE with wildcard

honza.mf

SSCertifiable

Points: 5519
More actions
May 24, 2010 at 10:19 pm

#221446

Comments posted to this topic are about the item Simple LIKE with wildcard

See, understand, learn, try, use efficient
© Dr.Plch

Viewing 15 posts - 1 through 15 (of 55 total)

You must be logged in to reply to this topic. Login to reply

malleswarareddy_m SSCertifiable Points: 5847 More actions · Answer 1

hi ,

like operator will search for _ only.and return any characters. then it does not return any rows.because i worked many times using like operator.but if we use '%_%' or '%_' or '_%'. then i think it will display the number of characters.but directly with like '_' it will not return the rows.and also i extecuted the above query but it will not return any rows.

Malleswarareddy
I.T.Analyst
MCITP(70-451)

tommyh SSCertifiable Points: 6302 More actions · Answer 2

Nice question.

Taught me once again that Microsoft has some serious dope available. Doing some semiunicode shit into a normal varchar... damn.

Even more funny. Doing a Len on this gives 2. So SQL still considers this to be 2 chars. But treats it as 1... hrmm.

honza.mf SSCertifiable Points: 5519 More actions · Answer 3

tommyh (5/25/2010)
Nice question.
Taught me once again that Microsoft has some serious dope available. Doing some semiunicode shit into a normal varchar... damn.
Even more funny. Doing a Len on this gives 2. So SQL still considers this to be 2 chars. But treats it as 1... hrmm.

I think it'not semiunicode, it's a funny behviour. It's two characters, but like comparison treat it as one.

When I've seen it, I was struck.

Another funny thing is, that this behaviour can be seen only with "Windows" collation. If I have a server with old unsupported SQL collation, it works normally.

See, understand, learn, try, use efficient
© Dr.Plch

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

Great question!

I did consider collations before replying, but only thinking about the possibility of a collation where the space character would fall somewhere between 'A' and 'Z'. And after discarding that option, I picked the most popular answer: 0.

Point lost, nothing learned - I was already aware of this collation-specific behaviour, but failed to consider it before replying.

Thanks, Honza!

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

malleswarareddy_m (5/24/2010)
hi ,
like operator will search for _ only.and return any characters. then it does not return any rows.because i worked many times using like operator.but if we use '%_%' or '%_' or '_%'. then i think it will display the number of characters.but directly with like '_' it will not return the rows.and also i extecuted the above query but it will not return any rows.

Did you try setting the collation to CZECH_CI_AS, as Honza indicated in the explanation?

Certain Nordic collations will also consider some two-character combinations as a single character, and I expect German collations to do that as well (e.g. ss = ß, ue = ü, etc)

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

honza.mf SSCertifiable Points: 5519 More actions · Answer 6

stewartc-708166 (5/25/2010)
Tricky
Requires some knowledge of collations

Yes, the question was in Collations category 😛

See, understand, learn, try, use efficient
© Dr.Plch

honza.mf SSCertifiable Points: 5519 More actions · Answer 7

Hugo Kornelis (5/25/2010)
malleswarareddy_m (5/24/2010)
hi ,
like operator will search for _ only.and return any characters. then it does not return any rows.because i worked many times using like operator.but if we use '%_%' or '%_' or '_%'. then i think it will display the number of characters.but directly with like '_' it will not return the rows.and also i extecuted the above query but it will not return any rows.
Did you try setting the collation to CZECH_CI_AS, as Honza indicated in the explanation?
Certain Nordic collations will also consider some two-character combinations as a single character, and I expect German collations to do that as well (e.g. ss = ß, ue = ü, etc)

If you like to set a collation for this script, you can use create table #t (string varchar (2) COLLATE XYZ) (XYZ can be Czech_CI_AS or some other) in the first line and change the insert statement to insert into #t select (@a1 + @a2) COLLATE XYZ to avoid collation confilct.

It is possible to surround full script by cursor that will enumerate all collations available, but I must say I was to lazy to write it.

I had an idea of German collations to work similar way, but not. Those, I have tried, returned no rows.

If someone discover another collation in another language with similar behaviour, I will be very glad to have such information.

See, understand, learn, try, use efficient
© Dr.Plch

tommyh SSCertifiable Points: 6302 More actions · Answer 8

Out of curiousity. If one would like to get all strings containing a C from the below code... how would you do that in that collation?

create table #t (string varchar (5))

insert into #t (string) values ('ACH')

insert into #t (string) values ('ACY')

insert into #t (string) values ('AHHC')

select string from #t

where string like '%C%' -- will miss ACH

drop table #t

ZeroFusion Old Hand Points: 326 More actions · Answer 9

Great question, I'll keep that in mind as I also work with Czech_CI_AS collations.

tommyh (5/25/2010)
Out of curiousity. If one would like to get all strings containing a C from the below code... how would you do that in that collation?
create table #t (string varchar (5))
insert into #t (string) values ('ACH')
insert into #t (string) values ('ACY')
insert into #t (string) values ('AHHC')
select string from #t
where string like '%C%' -- will miss ACH
drop table #t

Suppose you have a Czech_CI_AS collation set up as your DB / table collation. Then you do this:

create table #t (string varchar (5))

insert into #t (string) values ('ACH')

insert into #t (string) values ('ACY')

insert into #t (string) values ('AHHC')

select string from #t

where string COLLATE Czech_BIN like '%C%' -- won't miss ACH

drop table #t

honza.mf SSCertifiable Points: 5519 More actions · Answer 10

tommyh (5/25/2010)
Out of curiousity. If one would like to get all strings containing a C from the below code... how would you do that in that collation?
create table #t (string varchar (5))
insert into #t (string) values ('ACH')
insert into #t (string) values ('ACY')
insert into #t (string) values ('AHHC')
select string from #t
where string like '%C%' -- will miss ACH
drop table #t

It's very hard.

And if you are playing with our very special language, you can continue with case in case insensitive collation :w00t: : there is a difference if you are using 'ch', 'CH', 'Ch' and 'cH'.

See, understand, learn, try, use efficient
© Dr.Plch

vk-kirov SSCertifiable Points: 7686 More actions · Answer 11

Very nice question, I learned something new.

honza.mf
On SQL Server with CZECH_CI_AS collation returns 3 rows ('CH', 'Ch', and 'ch') - this collation is case insensitive.

Note that CZECH_CS_AS (case sensitive collation) returns the same 3 rows.

...congratulations on winning the world hockey championship 😉

honza.mf SSCertifiable Points: 5519 More actions · Answer 12

ZeroFusion (5/25/2010)
Suppose you have a Czech_CI_AS collation set up as your DB / table collation. Then you do this:
create table #t (string varchar (5))
insert into #t (string) values ('ACH')
insert into #t (string) values ('ACY')
insert into #t (string) values ('AHHC')
select string from #t
where string COLLATE Czech_BIN like '%C%' -- will miss ACH
drop table #t

You are right, it works.

Are you really doing it? I must say, sometimes I change collation from case-insensitive to case-sensitive, but I have never used BIN type collation yet.

See, understand, learn, try, use efficient
© Dr.Plch

vk-kirov SSCertifiable Points: 7686 More actions · Answer 13

honza.mf (5/25/2010)
If someone discover another collation in another language with similar behaviour, I will be very glad to have such information.

Here is a script which finds that collations (you should have the table #t from the QOTD):

SET NOCOUNT ON;

DECLARE @name SYSNAME;

DECLARE cur CURSOR LOCAL STATIC FOR

SELECT name

FROM ::fn_helpcollations();

CREATE TABLE #result (collation SYSNAME, string VARCHAR(2));

OPEN cur;

FETCH NEXT FROM cur INTO @name;

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC('ALTER TABLE #t ALTER COLUMN string VARCHAR(2) COLLATE ' + @name);

IF @@ERROR = 0

INSERT #result (collation, string)

SELECT @name, string

FROM #t

WHERE string LIKE '_';

FETCH NEXT FROM cur INTO @name;

END;

CLOSE cur;

DEALLOCATE cur;

SELECT DISTINCT collation FROM #result;

SELECT DISTINCT collation, UPPER(string) FROM #result;

DROP TABLE #result;

It returns Croatian, Czech, Danish_Norwegian, Hungarian, Slovak, Traditional_Spanish, and Vietnamese on my machine.

ZeroFusion Old Hand Points: 326 More actions · Answer 14

honza.mf (5/25/2010)
You are right, it works.
Are you really doing it? I must say, sometimes I change collation from case-insensitive to case-sensitive, but I have never used BIN type collation yet.

Well, as I said, I just learned how LIKE behaves with Czech_CI_AS collation, so no, I haven't used it yet. However, I use Czech_BIN when I need to order Czech strings (with Czech collation) in order of English alphabet:

CREATE TABLE #t ([string] VARCHAR (5));

INSERT INTO #t ([string])

VALUES ('ACH');

INSERT INTO #t ([string])

VALUES ('ACY');

INSERT INTO #t ([string])

VALUES ('AHHC');

-- Output: 1. ACY, 2. AHHC, 3. ACH

SELECT

Row_Number() OVER (ORDER BY [string]), [string]

FROM

#t;

-- Output: 1. ACH, 2. ACY, 3. AHHC

SELECT

Row_Number() OVER (ORDER BY [string] COLLATE Czech_BIN), [string]

FROM

#t;

DROP TABLE #t;

A note for people not familiar with Czech language: in the Czech alphabet, a letter "CH" is just between letters "H" and "I".