April 23, 2020 at 12:00 am
Comments posted to this topic are about the item Finding Lowercase Characters
April 23, 2020 at 7:30 am
Hi there. Interesting question. Here are some notes:
Latin1_General_100_CI_AS_CS
collation. Those last two letters are transposed. The actual collation name is: Latin1_General_100_CI_AS_SC
.This is a very common misconception, but binary collations are in fact not case-sensitive due to not allowing for combining characters, full width characters, and a few other scenarios. For full details, please see: No, Binary Collations are not Case-Sensitive
In cases where a binary collation does make sense to use for comparisons, then it's entirely irrelevant which binary collation is used as they're all simply comparing bytes (even the BIN2 collations that compare code units are still doing the same binary comparison across all BIN2 collations).
WHERE LEFT(FirstName, 1) = LOWER(LEFT(FirstName, 1)) COLLATE Latin1_General_100_CS_AS_SC;
Take care,
Solomon....
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
April 23, 2020 at 8:43 am
Hi Solomon,
Your solution does also find those FirstNames that does not really have a lowercase version, i.e. where FirstName starts with a digit or a dash or a dot...
I would probably try something like:
where left(su.name,1) = lower(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
and left(su.name,1) <> upper(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
My database (Azure SQL database) is using SQL_Latin1_General_CP1_CI_AS, so should I be using SQL_Latin1_General_CP1_CS_AS?
Best regards, Henrik
April 23, 2020 at 12:08 pm
Before seeing the answers your's is what I was going for
April 23, 2020 at 12:24 pm
Interesting question.
You beat me to it, Solomon. Your explanation sums it up nicely...
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
April 23, 2020 at 12:34 pm
Hi there. Interesting question. Here are some notes:
<li style="list-style-type: none;">
- At the start of the question it mentions that the database is using the
Latin1_General_100_CI_AS_CS
collation. Those last two letters are transposed. The actual collation name is:Latin1_General_100_CI_AS_SC
.
<li style="list-style-type: none;">
- Regarding the following statement in the explanation: "The reason Latin1_General_BIN was used in this scenario is due to the fact that binary collations are case-sensitive and enforce a binary sort order."
This is a very common misconception, but binary collations are in fact not case-sensitive due to not allowing for combining characters, full width characters, and a few other scenarios. For full details, please see: No, Binary Collations are not Case-Sensitive
<li style="list-style-type: none;">
- Regarding the following statement in the explanation: "The following collations would also be viable:..."
In cases where a binary collation does make sense to use for comparisons, then it's entirely irrelevant which binary collation is used as they're all simply comparing bytes (even the BIN2 collations that compare code units are still doing the same binary comparison across all BIN2 collations).
<li style="list-style-type: none;">
- In this particular scenario, using a binary collation is not ideal as it really only works with the US English alphabet. The preferred solution for finding lower-case letters is the following:
WHERE LEFT(FirstName, 1) = LOWER(LEFT(FirstName, 1)) COLLATE Latin1_General_100_CS_AS_SC;
Take care,
Solomon....
Hey Solomon,
Thank you for the feedback! That very appropriately named article sheds some light on that misconception. Definitely saving this for future reference.
I'll see what I can do about getting that typo fixed and the explanation updated in the interim.
April 23, 2020 at 12:59 pm
Just curious how many of us use case insensitive fields?
412-977-3526 call/text
April 23, 2020 at 2:32 pm
Just curious how many of us use case insensitive fields?
I generally have no need for case sensitive data. However, if I did, I'd likely NOT make the whole server or even a given database case sensitive... just the columns that needed it. Yes, I know that has some other ramifications especially when it comes to TempDB but I'd rather put up with that little nuance than have the whole server or even database be case sensitive.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2020 at 4:13 pm
Your solution does also find those FirstNames that does not really have a lowercase version, i.e. where FirstName starts with a digit or a dash or a dot...
I would probably try something like:
where left(su.name,1) = lower(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
and left(su.name,1) <> upper(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
Hello Henrik. Excellent point, thanks for catching that. Yes, I think that I would also approach it that way (as long as it's ok to exclude characters from alphabets that do not have the concept of upper-case/lower-case).
The specific collation to use would be the case-sensitive version of the collation already being used by that column. In this case, @btylerwhite did use the correct collation of Latin1_General_100_CS_AS_SC
.
Also keep in mind that when working with non-Latin alphabets, it helps to use the most recent version of the desired collation (i.e. a 100 version instead of a 90 or non-versioned) as the newer versions have more upper-case to lower-case mappings which are used by the UPPER
and LOWER
functions. In some cases, using the Japanese_XJIS_140_*
series is best as those are the most recent and have the most mappings (e.g. Japanese_XJIS_140_CS_AS
). Even if you aren't working with Japanese characters, the upper-case/lower-case mappings are correct for all cultures except for Turkish and one other that uses the same dotted upper-case "I" and dotless lower-case "i".
My database (Azure SQL database) is using SQL_Latin1_General_CP1_CI_AS, so should I be using SQL_Latin1_General_CP1_CS_AS?
Whether or not to use a case-sensitive, or even binary, collation for the instance and/or database levels depends on you and your application. There is no correct answer. Case-insensitive collations are easier to interact with as you can use PascalCase for your object names (i.e. the correct approach 😉 ) yet not worry about the casing when writing ad hoc queries / debugging. However, that also allows for some sloppiness if the casing used in the code does not match the actual casing of the objects. Binary and case-sensitive collations enforce consistent casing between code and objects but are less forgiving when you might be in a hurry (i.e. debugging) and allow for a different form of sloppiness: objects having similar names that only differ in casing (e.g. Employee
vs employee
).
The bigger issue, for me, would be the use of the older SQL Server collations (i.e. names starting with SQL_
) as they have fewer mappings (not important for US English, but certainly can be for other languages), fewer characters with defined sort weights, and they allow for performance degradation when VARCHAR
columns are indexed but then compared to NVARCHAR
literals or variables (for full details on that, please see: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types ). Hence, when working on SQL Server 2012 or newer, I would recommend using the Latin1_General_100_
series, preferably one ending in _SC
when not using a _BIN*
collation.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
April 24, 2020 at 7:32 am
hi Solomon,
I'm using Azure SQL Database, and it's default collation is SQL_Latin1_General_CP1_CI_AS. It is not possible to choose another collation for the database. That also goes for TempDB, and hence I have to be very careful about having other collations in my varchar columns, as your blog post point out.
Thank you for the nice words,
Henrik
April 24, 2020 at 7:54 am
Henrik - you can change its collation - just not as easy as doing it on Prem
see https://www.mssqltips.com/sqlservertip/5530/how-to-change-an-azure-sql-database-collation/
April 24, 2020 at 8:41 am
As the database is using a case-sensitive collation, I don't see why the 2nd option wouldn't work.
April 24, 2020 at 11:39 am
Hi Frederico Fonseca,
Yes, that might change the collation of my database, but not that of TEMPDB. So any query using a #temp table would need to be checked (and possibly revised) for any collation conflicts.
I once had a test server with 3 different collations in Master, Msdb and mydatabase. That was "fun". It ended up with a re-install of SQL Server.
Best regards, Henrik
April 24, 2020 at 1:18 pm
Hi Frederico Fonseca,
Yes, that might change the collation of my database, but not that of TEMPDB. So any query using a #temp table would need to be checked (and possibly revised) for any collation conflicts.
I once had a test server with 3 different collations in Master, Msdb and mydatabase. That was "fun". It ended up with a re-install of SQL Server.
Best regards, Henrik
yes I am aware of those issues with On Prem - in Azure I don't know if it works the same as I don't have any setup - if you do maybe you could test it.
regardless if any code using temp tables is created using collate default database it will work fine.
in any case I was just pointing out that you can indeed change the user database collation which you said could not be done.
April 24, 2020 at 2:11 pm
hi Frederico,
<<I was just pointing out that you can indeed change the user database collation
Yes, Agree. Sorry, English is not my first language.
Best regards, Henrik
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply