Comparing ss and ß using LIKE

  • In the Latin1_General collation, ss is treated like ß (German sharp s) in comparisons:

    'ss' = 'ß' --> true

    Note that this is also true with LIKE:

    'ss' LIKE 'ß' --> true

    But if a '%' is part of the search pattern, things become complicated:

    'ss' LIKE 'ß%' --> true

    'ß' LIKE 'ss%' --> false (!!!)

    Notably, the following are all true:

    'ß' LIKE '%ss' --> true

    'ß' LIKE '%ss%' --> true

    'ß' LIKE 'ss%%' --> true

    'ßa' LIKE 'ss%' --> true

    Take a look at the tests with two ß:

    'ßß' LIKE 'ssss%' --> false (guessed that)

    'ßßa' LIKE 'ssss%' --> false (now that's a surprise)

    'ßßaa' LIKE 'ssss%' --> true

    Alright, there's a pattern in this behavior:

    'ßßßaa' LIKE 'ssssss%' --> false

    'ßßßaaa' LIKE 'ssssss%' --> true

    My guess is that LIKE is optimized for the special case of patterns that end with '%' and contain no other wildcards. It seems that the comparison always fails if the searched text is shorter than the non-wildcard part of the pattern.

    My questions are:

    1. Is this behavior a bug or a feature?

    2. Are there other subtleties in the behavior of LIKE?

    3. Is the precise algorithm for LIKE comparison documented somewhere?

    I need to rebuild the comparison routine in C as accurately as possible. I've tried SQL Server 2000 and 2005.

    I appreciate any help on this.

  • Hi there...

    I found an answer to your problem

    Convert them to Nvarchar.

    Ok ok, I knowusing CAST or CONVERT would add some more code... but you can do something like this

    IF N'ßßa' LIKE N'ssss%'

    SELECT 'TRUE'

    or you could just put the N'' to only one of the two...

    if you start a string with an N automatically converts it into an NVARCHAR Format

    Correct me if I'm wrong... but I think its because of the difference in the bytes used... uhmmm like ASCII??? actually I'm not really sure of the reason 😛

    :cool:well atleast it worked

    _____________________________________________
    [font="Comic Sans MS"]Quatrei Quorizawa[/font]
    :):D:P;):w00t::cool::hehe:
    MABUHAY PHILIPPINES!

    "Press any key...
    Where the heck is the any key?
    hmmm... Let's see... there's ESC, CTRL, Page Up...
    but no any key"
    - Homer Simpson
  • Kirill Müller (6/21/2007)


    In the Latin1_General collation, ss is treated like ß (German sharp s) in comparisons:

    Apart from your specific problems regarding the LIKE operator, I'm very surprised that ss is treated like ß. Thus in is the sequence:

    create table t(c varchar(10))

    insert into t values('ß')

    insert into t values('ss')

    select *

    from t

    where c='ss'

    select *

    from t

    where c='ß'

    both records always get selected.

    I was under the impression that ß can 'optionally' be written as 'ss' but this is not true the other way around. In any case, I just don't see the ability to make this distinction built into a collation sequence.

  • I was under the impression that ß can 'optionally' be written as 'ss' but this is not true the other way around.

    If I remember correctly, under the German spelling reform in the 1990s, all occurrences of ß were changed to ss. This spelling reform had many problems (mostly a lack of popularity), and was officially rejected/changed a while later. So many words were changed back to being officially spelt with ß. Hard work for teachers. I do agree with the quote above, though.

  • No, actually, the ß is used after "long" vowels (like in "Fuß" which is pronounced like "fooss") and after diphthongs (like "außen"). All other instances of ß have been replaced by ss, the most popular word that has been changed is "dass" (translated: "so that").

    Thanks for your replies!

  • Hi Kirill,

    Thanks for the explanation; I am trying to learn German, but I am finding it hard work. Bring back Esperanto 🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply