LIKE pattern is not Case Sensitive on CS/AS unicode-Column

  • Hi there,

    I just had a wiered phenomenon:

    SQL Server 2008 EE + SQL Server 2008 R2 EE behave in the same way.

    Database and table have Collate Latin1_General_100_CS_AS.

    name column is nvarchar(255)

    SELECT name

    FROM table

    WHERE name LIKE N'[A-Z]%'

    The resultset includes columns with small first letter

    This should not happend or am I wrong?

    Versions of the sql-Servers:

    Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)

    Microsoft SQL Server 2008 (SP2) - 10.0.4064.0 (X64)

  • Not entirely sure if it's a bug or not, but I think it's to do with the expansion of [A-Z]. If you change it to [ABCDEFGHIJKLMNOPQRSTUVWXYZ] it works correctly.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • sorry, I did not get the notification of your reply.

    Mark-101232 (7/20/2011)


    Not entirely sure if it's a bug or not, but I think it's to do with the expansion of [A-Z]. If you change it to [ABCDEFGHIJKLMNOPQRSTUVWXYZ] it works correctly.

    If I change that to your suggestion it works, however to write [A-Z] is what should work. M$ is describing similar patterns. But usually example-DBs are CI.

    Do you found any hint that this Like patterns work just CI?

    I tried to Hint with a Collate "CS" in the where-clause, but it didn't help.

  • According to Books Online LIKE is implemented per the ISO standards when it comes to Unicode data:

    LIKE (Transact-SQL)

    It also says Collations can affect results. I think I found a workaround for you. Here is my PoC for testing in your database where the default collation is Latin1_General_100_CS_AS:

    CREATE TABLE dbo.a ( name NVARCHAR(255) COLLATE Latin1_General_100_CS_AS ) ;

    INSERT INTO dbo.a

    ( name )

    VALUES ( N'harold' ),

    ( N'Harold' ) ;

    GO

    SELECT name

    FROM dbo.a

    WHERE name COLLATE Latin1_General_100_BIN LIKE N'[A-Z]%' ;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • WHERE name COLLATE Latin1_General_100_BIN LIKE N'[A-Z]%' ;

    Hi,

    COLLATE to BIN works fine.

    I guess it is not quite a performance-booster to change the collation while using LIKE.

    So is it a bug or feature? and what does it mean for choosing a collation?

    Should I switch to BIN?

  • mitch.fh (7/25/2011)


    WHERE name COLLATE Latin1_General_100_BIN LIKE N'[A-Z]%' ;

    Hi,

    COLLATE to BIN works fine.

    I guess it is not quite a performance-booster to change the collation while using LIKE.

    Are you saying you took a performance hit by overriding the collation? Can you post the actual execution plans for both queries, the one that overrides the collation and the original one that returns incorrect results?

    See if saying the query with the collation applied to the constant string in the LIKE changes anything:

    SELECT name

    FROM dbo.a

    WHERE name LIKE N'[A-Z]%' COLLATE Latin1_General_100_BIN ;

    So is it a bug or feature? and what does it mean for choosing a collation?

    Should I switch to BIN?

    I did find this bug report that talks about issues with LIKE and a specific Collation that looks very similar to your issue:

    http://connect.microsoft.com/SQLServer/feedback/details/481198/bug-in-sql-server-2005-sp3-like-operator-chinese-prc

    I am not convinced if you're seeing a bug with LIKE or a bug with the Collation. Note there is a workaround listed in the bug report that you might try.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi,

    Performance was just a presumption. The execution plans don't differ in tasks, however the estimation is 48% to 52% if you estimate them together.

    Moving the collation doesn't make difference

  • Can you post the actual execution plans to the thread?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • sorry it took me so long.

    Here you have the execution plan with all there query-versions

  • I was hoping to see the plans when run against your live data 🙂

    I went ahead and mocked up 1MM rows of test data and your query 1 performs worse than the other 2 simply because it ends up returning more rows due to LIKE not working as CS as you would like.

    From a comparison standpoint query 2 and 3 perform the same, although they go about their business differently. Query 2, because the COLLATE is on the column, does an explicit convert on that data:

    <ScalarOperator ScalarString="CONVERT(nvarchar(100),[tempdb].[dbo].[a].[name],0) like N'[A-Z]%'">

    Query 3 does an implicit convert on the column data probably because the COLLATE on the literal string in the LIKE takes precedence so it has to convert the column:

    <ScalarOperator ScalarString="CONVERT_IMPLICIT(nvarchar(100),[tempdb].[dbo].[a].[name],0) like N'[A-Z]%'">

    All of that said, the actual execution plan has query 2 and 3 at 50% cost each so what is the performance hit you were talking about?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I can post the plans that run against my live data tomorrow as well 🙂

    so what is the performance hit you were talking about?

    As posted earlier I >presumed< it would be a performance issue but the actual testing didn't show, that it would be a problem 😉

  • Ahhh...got it. Sorry, I was looking at your latest "is not quite a performance-booster" comment as a claim of loss of performance or something.

    <still_learning>I know collations as a set of rules for comparing characters so [theoretically] forcing the use of a specific collation should not really affect performance. I have heard about issues when mixing collations, but not when everything is using the same one.</still_learning>

    Let us know how you fare with your live data 🙂

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • finally the execution plan on my live database

  • Thanks, but these are Estimated plans. We need the Actual plans 🙂

    Ctrl+M to turn on Actual, then execute the queries. Pipe the results into a temp table using SELECT...INTO to prevent you from retrieving all the results to your machine.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply