Simple LIKE with wildcard

Question

Simple LIKE with wildcard

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply

Michael Poppers SSCrazy Points: 2151 More actions · Answer 1

malleswarareddy_m (5/25/2010)
...
select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...

I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).

honza.mf SSCertifiable Points: 5519 More actions · Answer 2

Michael Poppers (5/25/2010)
malleswarareddy_m (5/25/2010)
...
select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...
I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).

Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.

See, understand, learn, try, use efficient
© Dr.Plch

Vladan SSC-Insane Points: 22072 More actions · Answer 3

Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:

So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).

honza.mf SSCertifiable Points: 5519 More actions · Answer 4

Vladan (5/25/2010)
Nice one, and you got me with that question. I absolutely forgot about "ch" and went with the popular "0" answer :crazy:
So-called "letter" ch can create some confusion because of the special alphabetical order it has. Anyone expects words beginning with "ch..." under the letter C in a dictionary/encyclopedia - but if it is a Czech one, you'll find them under a separate heading "CH", not even close to "C" (it is G, H, Ch, I...). IMHO this should be addressed in some future release of Czech orthography and we should declare, that "CH" is in fact two letters and standard ordering rules would apply from that time (in fact, it is long overdue... but I'm afraid no one will bother with it and we will be stuck with this "letter" forever).

I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.

See, understand, learn, try, use efficient
© Dr.Plch

Trey Staker SSCarpal Tunnel Points: 4796 More actions · Answer 5

Thanks for the question. I bombed this one, I should of looked at the category or went with my gut, everything always "depends".

---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum

SQLRNNR SSC Guru Points: 281334 More actions · Answer 6

Great question

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Mark D Powell SSCarpal Tunnel Points: 4775 More actions · Answer 7

I think of instead of 'It Depends' the last choice should have been collation dependent. By itself 'It depends' is not much of an answer.

-- Mark D Powell --

jlennartz SSCommitted Points: 1574 More actions · Answer 8

honza.mf (5/25/2010)
Michael Poppers (5/25/2010)
malleswarareddy_m (5/25/2010)
...
select string from #t where string COLLATE CZECH_CI_AS like '_'
the above you have given returns gives
CH
Ch
ch
¢h
four rows.
...
I was wondering why my SQL Query Analyzer output apparently listed "ch" twice -- thanks for listing the accurate output (in which the fourth row is not "ch"). Now try SELECT DISTINCTing the column 😉 -- the #rows should tell us precisely how many LIKE '_' characters actually exist between 'A' and 'Z' :-).
Your idea with DISTINCT is great. It makes the things little bit more funny with case-insensitive collations.

If all characters were inserted into DB as upper case how come query is returning lower case?

honza.mf SSCertifiable Points: 5519 More actions · Answer 9

jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?

It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!

You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.

If someone uses case sensitive collation, he will have all characters in uppercase.

See, understand, learn, try, use efficient
© Dr.Plch

Michael Poppers SSCrazy Points: 2151 More actions · Answer 10

honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.

Sorry, ignore this reply and see my next reply :-). Thanks.

Michael Poppers SSCrazy Points: 2151 More actions · Answer 11

honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.

Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!

jlennartz SSCommitted Points: 1574 More actions · Answer 12

honza.mf (5/25/2010)
jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?
It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!
You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.
If someone uses case sensitive collation, he will have all characters in uppercase.

Thank you for the answer it make sense, now. You won't get A-Z but whatever the Check equivqlent of their ascii number is.

Jerry

LutzM SSC Guru Points: 107049 More actions · Answer 13

honza.mf (5/25/2010)
...
I don't know our linguists (ÚPJC) but this will be very hard to do. Orthographic rules are very rigid and cannot be changed so easy. I remember last great changes in czech orthography or several years ago changes Germans made. It was always painful.

First of all: very nice question!! I just "lost" one point, but learned something new. I was aware of the collation issue but I thought 'A' to 'Z' didn't make room for it.

Regarding our orthography changes (I'm German): at least it allows me to write the way I'd like to: most probably there was/is/will_be a time and place where the spelling I use was/is/will_be valid 🙂

But it definitely makes it really hard to help your kids through (school) homework... Ooops, getting off topic. Sorry. Again: Nice QOTD!

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

honza.mf SSCertifiable Points: 5519 More actions · Answer 14

Michael Poppers (5/25/2010)
honza.mf (5/25/2010)
If someone uses case sensitive collation, he will have all characters in uppercase.
Sounds reasonable, but use COLLATE Czech_CS_AS (instead of the case-insensitive collation) -- SELECT DISTINCT will return three rows (instead of one), and one of the result rows appears (to my non-Czech eyes) to not be what in the Roman alphabet would be called "uppercase"!

Sorry, it was my mistake as I haven't checked it. Case sensitivity of a collation is used probably in equality comparisons only. Loops in example are controled by less or equal operator, and it says 'a' <= 'Z'

Tom Garth SSCertifiable Points: 6193 More actions · Answer 15

honza.mf (5/25/2010)
jlennartz (5/25/2010)
If all characters were inserted into DB as upper case how come query is returning lower case?
It's another aspect of this question not mentioned yet: All characters inserted into are not guaranted to be uppercase!
You feel they are uppercase due to "while @a2 <= 'Z'" condition. You must evaluate this condition in appropriate collation too. Most of us use case-insensitive collations and in such collations lowercase letters also fit this condition. And it si true for the special characters between 'Z' and 'a' in ASCII table like '[', '\', ']' - it's not obvious but for me it's clear. There can be even more characters that fit, it's collation dependant.
If someone uses case sensitive collation, he will have all characters in uppercase.

Quite correct. The actual number of rows in #t with a default installation is 20449. To get the 26 by 26 result requireswhile ASCII(@a1) <= ASCII('Z') begin, etc.

This QotD turned out to be much more interesting than I initially thought it would.

Thanks for the lesson!

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers