Finding Lowercase Characters

Question

Post reply

Finding Lowercase Characters

BTylerWhite

SSCommitted

Points: 1569
More actions
April 23, 2020 at 12:00 am

#3741918

Comments posted to this topic are about the item Finding Lowercase Characters

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 1

Hi there. Interesting question. Here are some notes:

At the start of the question it mentions that the database is using the Latin1_General_100_CI_AS_CS collation. Those last two letters are transposed. The actual collation name is: Latin1_General_100_CI_AS_SC.
Regarding the following statement in the explanation: "The reason Latin1_General_BIN was used in this scenario is due to the fact that binary collations are case-sensitive and enforce a binary sort order."
This is a very common misconception, but binary collations are in fact not case-sensitive due to not allowing for combining characters, full width characters, and a few other scenarios. For full details, please see: No, Binary Collations are not Case-Sensitive
Regarding the following statement in the explanation: "The following collations would also be viable:..."
In cases where a binary collation does make sense to use for comparisons, then it's entirely irrelevant which binary collation is used as they're all simply comparing bytes (even the BIN2 collations that compare code units are still doing the same binary comparison across all BIN2 collations).
In this particular scenario, using a binary collation is not ideal as it really only works with the US English alphabet. The preferred solution for finding lower-case letters is the following:
```
WHERE LEFT(FirstName, 1) = LOWER(LEFT(FirstName, 1)) COLLATE Latin1_General_100_CS_AS_SC;
```

Take care,

Solomon....

This reply was modified 4 years, 8 months ago by Solomon Rutzky.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Henrik Staun Poulsen SSCertifiable Points: 6583 More actions · Answer 2

Hi Solomon,

Your solution does also find those FirstNames that does not really have a lowercase version, i.e. where FirstName starts with a digit or a dash or a dot...

I would probably try something like:

where left(su.name,1) = lower(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
and left(su.name,1) <> upper(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS

My database (Azure SQL database) is using SQL_Latin1_General_CP1_CI_AS, so should I be using SQL_Latin1_General_CP1_CS_AS?

Best regards, Henrik

This reply was modified 4 years, 8 months ago by Henrik Staun Poulsen.

dyawlak SSC Enthusiast Points: 156 More actions · Answer 3

Before seeing the answers your's is what I was going for

Stewart "Arturius" Campbell SSC Guru Points: 72634 More actions · Answer 4

Interesting question.

You beat me to it, Solomon. Your explanation sums it up nicely...

____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”

BTylerWhite SSCommitted Points: 1569 More actions · Answer 5

Solomon Rutzky wrote:

Hi there. Interesting question. Here are some notes:

<li style="list-style-type: none;">
At the start of the question it mentions that the database is using the Latin1_General_100_CI_AS_CS collation. Those last two letters are transposed. The actual collation name is: Latin1_General_100_CI_AS_SC.

<li style="list-style-type: none;">
Regarding the following statement in the explanation: "The reason Latin1_General_BIN was used in this scenario is due to the fact that binary collations are case-sensitive and enforce a binary sort order."
This is a very common misconception, but binary collations are in fact not case-sensitive due to not allowing for combining characters, full width characters, and a few other scenarios. For full details, please see: No, Binary Collations are not Case-Sensitive

<li style="list-style-type: none;">
Regarding the following statement in the explanation: "The following collations would also be viable:..."
In cases where a binary collation does make sense to use for comparisons, then it's entirely irrelevant which binary collation is used as they're all simply comparing bytes (even the BIN2 collations that compare code units are still doing the same binary comparison across all BIN2 collations).

<li style="list-style-type: none;">
In this particular scenario, using a binary collation is not ideal as it really only works with the US English alphabet. The preferred solution for finding lower-case letters is the following:
WHERE LEFT(FirstName, 1) = LOWER(LEFT(FirstName, 1)) COLLATE Latin1_General_100_CS_AS_SC;
Take care,
Solomon....

Hey Solomon,

Thank you for the feedback! That very appropriately named article sheds some light on that misconception. Definitely saving this for future reference.

I'll see what I can do about getting that typo fixed and the explanation updated in the interim.

Robert Sterbal SSChampion Points: 11053 More actions · Answer 6

Just curious how many of us use case insensitive fields?

412-977-3526 call/text

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

Robert Sterbal wrote:

Just curious how many of us use case insensitive fields?

I generally have no need for case sensitive data. However, if I did, I'd likely NOT make the whole server or even a given database case sensitive... just the columns that needed it. Yes, I know that has some other ramifications especially when it comes to TempDB but I'd rather put up with that little nuance than have the whole server or even database be case sensitive.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 8

Henrik Staun Poulsen wrote:

Your solution does also find those FirstNames that does not really have a lowercase version, i.e. where FirstName starts with a digit or a dash or a dot...
I would probably try something like:
where left(su.name,1) = lower(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS
and left(su.name,1) <> upper(left(su.name,1)) collate SQL_Latin1_General_CP1_CS_AS

Hello Henrik. Excellent point, thanks for catching that. Yes, I think that I would also approach it that way (as long as it's ok to exclude characters from alphabets that do not have the concept of upper-case/lower-case).

The specific collation to use would be the case-sensitive version of the collation already being used by that column. In this case, @btylerwhite did use the correct collation of Latin1_General_100_CS_AS_SC.

Also keep in mind that when working with non-Latin alphabets, it helps to use the most recent version of the desired collation (i.e. a 100 version instead of a 90 or non-versioned) as the newer versions have more upper-case to lower-case mappings which are used by the UPPER and LOWER functions. In some cases, using the Japanese_XJIS_140_* series is best as those are the most recent and have the most mappings (e.g. Japanese_XJIS_140_CS_AS ). Even if you aren't working with Japanese characters, the upper-case/lower-case mappings are correct for all cultures except for Turkish and one other that uses the same dotted upper-case "I" and dotless lower-case "i".

Henrik Staun Poulsen wrote:

My database (Azure SQL database) is using SQL_Latin1_General_CP1_CI_AS, so should I be using SQL_Latin1_General_CP1_CS_AS?

Whether or not to use a case-sensitive, or even binary, collation for the instance and/or database levels depends on you and your application. There is no correct answer. Case-insensitive collations are easier to interact with as you can use PascalCase for your object names (i.e. the correct approach 😉 ) yet not worry about the casing when writing ad hoc queries / debugging. However, that also allows for some sloppiness if the casing used in the code does not match the actual casing of the objects. Binary and case-sensitive collations enforce consistent casing between code and objects but are less forgiving when you might be in a hurry (i.e. debugging) and allow for a different form of sloppiness: objects having similar names that only differ in casing (e.g. Employee vs employee ).

The bigger issue, for me, would be the use of the older SQL Server collations (i.e. names starting with SQL_ ) as they have fewer mappings (not important for US English, but certainly can be for other languages), fewer characters with defined sort weights, and they allow for performance degradation when VARCHAR columns are indexed but then compared to NVARCHAR literals or variables (for full details on that, please see: Impact on Indexes When Mixing VARCHAR and NVARCHAR Types ). Hence, when working on SQL Server 2012 or newer, I would recommend using the Latin1_General_100_ series, preferably one ending in _SC when not using a _BIN* collation.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Henrik Staun Poulsen SSCertifiable Points: 6583 More actions · Answer 9

hi Solomon,

I'm using Azure SQL Database, and it's default collation is SQL_Latin1_General_CP1_CI_AS. It is not possible to choose another collation for the database. That also goes for TempDB, and hence I have to be very careful about having other collations in my varchar columns, as your blog post point out.

Thank you for the nice words,

Henrik

This reply was modified 4 years, 8 months ago by Henrik Staun Poulsen.

frederico_fonseca SSCoach Points: 16167 More actions · Answer 10

Henrik - you can change its collation - just not as easy as doing it on Prem

see https://www.mssqltips.com/sqlservertip/5530/how-to-change-an-azure-sql-database-collation/

Chris Wooding SSCarpal Tunnel Points: 4558 More actions · Answer 11

As the database is using a case-sensitive collation, I don't see why the 2nd option wouldn't work.

Henrik Staun Poulsen SSCertifiable Points: 6583 More actions · Answer 12

Hi Frederico Fonseca,

Yes, that might change the collation of my database, but not that of TEMPDB. So any query using a #temp table would need to be checked (and possibly revised) for any collation conflicts.

I once had a test server with 3 different collations in Master, Msdb and mydatabase. That was "fun". It ended up with a re-install of SQL Server.

Best regards, Henrik

frederico_fonseca SSCoach Points: 16167 More actions · Answer 13

Henrik Staun Poulsen wrote:

Hi Frederico Fonseca,
Yes, that might change the collation of my database, but not that of TEMPDB. So any query using a #temp table would need to be checked (and possibly revised) for any collation conflicts.
I once had a test server with 3 different collations in Master, Msdb and mydatabase. That was "fun". It ended up with a re-install of SQL Server.
Best regards, Henrik

yes I am aware of those issues with On Prem - in Azure I don't know if it works the same as I don't have any setup - if you do maybe you could test it.

regardless if any code using temp tables is created using collate default database it will work fine.

in any case I was just pointing out that you can indeed change the user database collation which you said could not be done.

Henrik Staun Poulsen SSCertifiable Points: 6583 More actions · Answer 14

hi Frederico,

<<I was just pointing out that you can indeed change the user database collation

Yes, Agree. Sorry, English is not my first language.

Best regards, Henrik