I thought my database was collation agnostic

  • Yes, this tortured me during a project, constantly having to specify COLLATION DATABASE_DEFAULT. We lost days over this, seriously.

    This is a real defect (not bug, as it's intentional) in MSSQL, something I've never had to worry about in Oracle. I was actually quite stunned that this was even an issue at first.

  • Might be worth noting that the default collation on a modern 2014 install appears to be different depending on your os locale settings... so some of the servers will install differently to say developer desktops.

  • The solution we use is actually to never create #temp tables that way. Ever.

    In fact, there are multiple reasons not to.

    1. The collation issue as explained in the article

    2. Changing data-types over time or even per installation. E.g. if ever a client needs to change the account_code field from length 20 to 25, then you'd have to chase around all the code that creates a temporary table with field(s) that will hold the account-code information. You're bound to forget one rarely used one and it will bite you when you least expect it (String or binary data would be truncated).

    3. Pretty much in line with 2, we use user-defined data-types (UUDT) for ALL our tables. This makes things a lot easier as we then can re-use d_account_code (being a varchar of whatever length) for the table-definition, for the @parameters that run stored procedures (and reports) on the table and all @variables that will hold account-code information (e.g. when used in a cursor)

    However, since one can't use UDDT's on temporary tables (the UDDT sits in the current db, not in the tempdb) there is no way to do

    CREATE TABLE #test (account_code d_account_code)

    To get around this we use

    SELECT account_code INTO #test FROM t_account_code WHERE 1 = 2

    The added benefit is that the field in the temp-table follows the same NULLability behaviour as the originating table did.

    If you don't want that, and you can use a place-holder variable like this:

    DECLARE @plc_account_code d_account_code;

    SELECT account_code = @plc_account_code INTO #test WHERE 1 = 2

    In which case the field in the #temp-table will always be NULLable.

    (edit: confusing typo)

  • deroby (1/18/2016)


    However, since one can't use UDDT's on temporary tables (the UDDT sits in the current db, not in the tempdb) there is no way to do

    CREATE TABLE #test (account_code d_account_code)

    Hi there. Yes, you can use UDDT's in temp tables. You just need to create the type(s) in tempdb first. And since it is not feasible to do that each time the instance starts, you actually create the type(s) in the model database since that database is used as the template to create new databases, including tempdb.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (1/19/2016)


    deroby (1/18/2016)


    However, since one can't use UDDT's on temporary tables (the UDDT sits in the current db, not in the tempdb) there is no way to do

    CREATE TABLE #test (account_code d_account_code)

    Hi there. Yes, you can use UDDT's in temp tables. You just need to create the type(s) in tempdb first. And since it is not feasible to do that each time the instance starts, you actually create the type(s) in the model database since that database is used as the template to create new databases, including tempdb.

    Take care,

    Solomon...

    Well, assuming you have the entire server just for you; well, yes you could do that indeed. In practice this might turn out to be problematic though.

    It also means you'll have to make sure to keep things in sync again, not impossible but knowing Murphy...

    Hmm... it might work, but I honestly can't see that as the better solution.

  • I would have used the COLLATE option to force the collation in the stored proc's WHERE clause, e.g.

    SELECT Result = 'This fails'

    WHERE 'abc' COLLATE SQL_Latin1_General_CP1_CI_AS

    = 'abc' COLLATE Latin1_General_CI_AS

    GO

    SELECT Result = 'This works'

    WHERE 'abc' COLLATE SQL_Latin1_General_CP1_CI_AS

    = 'abc' COLLATE SQL_Latin1_General_CP1_CI_AS


    Regards,

    Sean

  • I agree - I prefer to not use temp tables if I can.

    Tom Romeo
    Practical Compliance Solutions Corp.
    One Orient Way - Suite F156.
    Rutherford, NJ 07070-2524
    --------------------------------
    www.complianceabc.com
    email: tromeo@complianceabc.com
    Phone: 201-728-8809

  • We use the "collate database_default" in all our SPs and #temptable declarations. It is trivial to remember and is described in our developer guidance document that all new team members must read before starting.
    For those saying you shouldn't declare #temptables in this way in case of things like changing data types, for those kinds of scenarios we have this handy script:

    set nocount on

    declare @SearchFor varchar (100)

    set @SearchFor = 'SomeTextToSearchFor'

    select distinct
     object_schema_name(SO.Id) as [Schema],
     SO.[Name],
     case rtrim (XType)
      when 'TF' then 'Function'
      when 'FN' then 'Function'
      when 'P' then 'Stored procedure'
      when 'TR' then 'Trigger'
      when 'V' then 'View'
      else rtrim (XType)
     end as [Type]
    from
     sysobjects SO
     inner join syscomments SC on SO.[ID] = SC.[ID]
    where
     SC.text like '%' + @SearchFor + '%'
    order by
     [Schema],
     [Type],
     [Name]

    If every we change the schema in any way we can use the above script to find out any database object that makes mention of it. All our database objects like SPs, views, functions, etc... are stored in source control as separate SQL files named the same as the object in sub-folders by type which contain drop/create statements and thus it is simple to use the results of the above script to update the relevant files. Our build process will automatically generate an upgrade script that includes all the changed individual database object scripts into 1 file.

  • hey, really nice article. I missed it 2 years ago, but came here now and remembered from a blog post I wrote 5 years ago. talk memories!

    in this post I quickly discuss the differences between the old "SQL_" collations and the newer ones. the important point is that the newer ones have more predictable behavior with Unicode data and you avoid the already cited index seek vs convert situation.

    having all temp tables created this way is actually interesting because adding the commands to "where" clause could lead to un-needed conversions that may hurt performance.

    https://thelonelydba.wordpress.com/2013/05/15/sql-difference-between-collations/

    I hope you don't mind I added a reference to your article! thanks for sharing the knowledge!

  • Solomon Rutzky - Thursday, January 14, 2016 9:35 AM

    Hi Thomas. Good job raising the visibility of the DATABASE_DEFAULT option for the COLLATE clause. Just a few notes about things mentioned in the article:

    2) "Latin1 makes the server treat strings using charset Latin 1, basically ASCII": the "Latin1" specifies the Language / Locale used for NVARCHAR/NCHAR (i.e. Unicode) operations and implies the Code Page for the non-SQL_ Collations for VARCHAR/CHAR operations. The term "ASCII" technically refers to just the first 128 characters (decimal values 0 - 127). In a more general sense that term is often used to mean non-Unicode and cover all 256 characters (values 0 - 255). In this later case, the 2nd set of 128 characters (values 128 - 255) are determined by the Code Page (in this case:1252, which is Latin1 characters) and is technically referred to as "Extended ASCII".

    Hello again, Thomas. Two things that I failed to clarify the first time around:

    1. While still correct that "Latin1_General" is the locale / culture (not directly the Code Page), it is actually used for both NVARCHAR (Unicode) and VARCHAR (non-Unicode) data. For VARCHAR (non-Unicode) data it indicates the "sort order" to use for SQL Server Collations (i.e. names starting with "SQL_") or the language-specific rules to use for Windows Collations (i.e. names not starting with "SQL_"). "Latin1_General" is essentially the "US English" locale (for all but 2 of the SQL Server Collations that are Turkish / Code Page 1254).
    2. Regarding "Latin1", charset, and ASCII: the "Latin1_General" part of the Collation name does not directly indicate the charset / character set because the Latin1_General Collations aren't the only ones to use Code Page 1252 (Latin1 / ANSI), so do several others: French, Finnish_Swedish, German, Spanish, Norwegian, etc.

      And, it should be noted that across all Collations, and for both VARCHAR and NVARCHAR (non-Unicode and Unicode) data, characters with underlying values of 0 - 127 are identical. Collation does not affect what characters are represented by those values (0 - 128 / 0x00 - 0xFF), only their sort order and comparison rules. However, the Collation can affect what characters are represented by values 128 - 255 since those are controlled by the Code Page. But again, there is no difference in those characters at those values between French and German and Norwegian since they all use the same Code Page. The only difference is the sorting and comparison rules applied to that same character set.

    3) "One would think that 'SQL_Latin1_General_CP1_CI_AS' and 'Latin1_General_CP1_CI_AS', should be identical with the exception that one is a SQL collation and the other is a Windows collation.": For NVARCHAR/NCHAR data they are mostly identical, but for VARCHAR/CHAR data there are some slight differences.

    Two more things that I failed to clarify the first time around:

    1. There are no "Latin1_General_CP1" Collations. In fact, none of the Windows Collations (names not starting with "SQL_") include the Code Page number in their names. The Windows Collations can, however, have a version number in their name: Latin1_General_CI_AS vs Latin1_General_100_CI_AS. The "_100_" version number refers to the version of SQL Server that the Collation started in. "100" refers to "10.0", and SQL Server 10.0 is SQL Server 2008.
    2. One major difference between SQL Server and Windows Collations is that in Windows Collations, VARCHAR data (i.e. non-Unicode) still uses Unicode sorting and comparison rules. This is why there is no mention of sort order numbers for non-Unicode data for Windows Collations. While sorting and comparison operations are a little slower with the Unicode rules as compared to the simplistic sort orders, the benefits include consistent behavior between non-Unicode and Unicode data, and avoiding the performance hit of comparing an indexed VARCHAR column to an NVARCHAR variable or string literal ( Impact on Indexes When Mixing VARCHAR and NVARCHAR Types

    For more info on Collations / encodings / Unicode / ASCII / etc (especially as those things relate to SQL Server), please visit: Collations.Info

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • jim.riedemann - Thursday, January 14, 2016 7:56 AM

    Adding the COLLATION DATABASE_DEFAULT to temp tables and table vars was easy, and maintaining/remembering it is a piece of cake.

    Hi there. Just thought I would mention that there is no need to add COLLATE DATABASE_DEFAULT to table variables since they already use the current Database's default Collation. This is only an issue with temporary tables.

    Take care,
    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • peter.row - Friday, June 8, 2018 2:44 AM

    For those saying you shouldn't declare #temptables in this way in case of things like changing data types, for those kinds of scenarios we have this handy script:

    set nocount on

    declare @SearchFor varchar (100)

    set @SearchFor = 'SomeTextToSearchFor'

    select distinct
     object_schema_name(SO.Id) as [Schema],
     SO.[Name],
     case rtrim (XType)
      when 'TF' then 'Function'
      when 'FN' then 'Function'
      when 'P' then 'Stored procedure'
      when 'TR' then 'Trigger'
      when 'V' then 'View'
      else rtrim (XType)
     end as [Type]
    from
     sysobjects SO
     inner join syscomments SC on SO.[ID] = SC.[ID]
    where
     SC.text like '%' + @SearchFor + '%'
    order by
     [Schema],
     [Type],
     [Name]


    Hi Peter. I just thought I would mention that sysobjects and syscomments were deprecated as of the release of SQL Server 2005. You should be using sys.objects which has the added benefit of having a "type_desc" column that is the translation from "P" to "SQL Stored Procedure" (so you can get rid of the CASE. But more importantly, you should get the object's definition from either the OBJECT_DEFINITION built-in T-SQL function or the [definition] field of sys.sql_modules. Both of those options return a single NVARCHAR(MAX) value containing the entire definition of the object. The main problem with syscomments is that definitions are broken up into multiple rows of NVARCHAR(4000), which means that the string you are searching for could very well be split between the end of one chunk and the beginning of the next chunk, in which case you won't find it.

    Take care, Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky - Tuesday, June 19, 2018 12:37 PM

    peter.row - Friday, June 8, 2018 2:44 AM

    For those saying you shouldn't declare #temptables in this way in case of things like changing data types, for those kinds of scenarios we have this handy script:

    set nocount on

    declare @SearchFor varchar (100)

    set @SearchFor = 'SomeTextToSearchFor'

    select distinct
     object_schema_name(SO.Id) as [Schema],
     SO.[Name],
     case rtrim (XType)
      when 'TF' then 'Function'
      when 'FN' then 'Function'
      when 'P' then 'Stored procedure'
      when 'TR' then 'Trigger'
      when 'V' then 'View'
      else rtrim (XType)
     end as [Type]
    from
     sysobjects SO
     inner join syscomments SC on SO.[ID] = SC.[ID]
    where
     SC.text like '%' + @SearchFor + '%'
    order by
     [Schema],
     [Type],
     [Name]


    Hi Peter. I just thought I would mention that sysobjects and syscomments were deprecated as of the release of SQL Server 2005. You should be using sys.objects which has the added benefit of having a "type_desc" column that is the translation from "P" to "SQL Stored Procedure" (so you can get rid of the CASE. But more importantly, you should get the object's definition from either the OBJECT_DEFINITION built-in T-SQL function or the [definition] field of sys.sql_modules. Both of those options return a single NVARCHAR(MAX) value containing the entire definition of the object. The main problem with syscomments is that definitions are broken up into multiple rows of NVARCHAR(4000), which means that the string you are searching for could very well be split between the end of one chunk and the beginning of the next chunk, in which case you won't find it.

    Take care, Solomon..

    Thanks for the update Solomon, have updated our script to use the new system views.
    🙂

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply