DB conversion to new collation; unexpected query results using collate

  • I have converted a database to a new codepage and did not get the results expected.

    I have a database that is defined as collation sql_latin1_general_cp850_ci_as on a 2000 SQL instance who's default collation is the same.  All char columns are defined as this collation also.

    I want to convert this database to collation sql_latin1_general_cp1_ci_as.  I created the new database by scripting the old database and then altering the script to replace all references to CP850 with CP1.  The metadata of the databases compare using Red-Gate SQL Compare.

    I exported the data (from olddb_CP850) using the following command for each table:

    bcp olddb_CP850.dbo.UMEASURE out UMEASURE.txt -k -T -E -Smyserver -N

    I then imported the data into the new database (newdb_CP1) using the following in query analyzer for each table. 

    bulk insert newdb_CP1.dbo.UMEASURE from 'e:\mssql\data\UMEASURE.txt'

    with (datafiletype='widenative',

    keepidentity,

    keepnulls)

    After doing this I would expect the following queries to be equivalent.  BUT, the second query returns differences (4 rows) that the first query does not.

    select CP850.unit_id, CP850.unit_name, CP850.unit_abbrev, cp1.UNIT_ID, CP1.UNIT_NAME, CP1.UNIT_ABBREV

    from olddb_CP850.dbo.umeasure CP850

    join newdb_CP1.dbo.umeasure cp1 

    on CP850.unit_id = cp1.unit_id

    where CP850.unit_name collate sql_latin1_general_cp1_ci_as <> cp1.unit_name

    select CP850.unit_id, CP850.unit_name, CP850.unit_abbrev, cp1.UNIT_ID, CP1.UNIT_NAME, CP1.UNIT_ABBREV

    from olddb_CP850.dbo.umeasure CP850

    join newdb_CP1.dbo.umeasure cp1 

    on CP850.unit_id = cp1.unit_id

    where CP850.unit_name  <> cp1.unit_name collate sql_latin1_general_cp850_ci_as

    What am I missing here?  Why are these queries not equivalent?  Reg-Gate data compare tools give me the same rows as different also.

    Have I exported and imported the properly to get the data converted?  Is there an easier way to do this?

  • I don't know how granular it is in 2000 but in 2005 they are not the same because CP1 is code page 1252 the generic western Europe code page while CP850 covers more specialized code pages like Scandinavian countries.  Try the link below for details.  Hope this helps.

    http://msdn2.microsoft.com/en-us/library/ms180175.aspx

    Kind regards,
    Gift Peddie

  • Thanks for the response.  

    I guess another just as appropriate question should have been, if the comparison si true and the differences are there becuase there is not an equivalent conversion,

    how can I tell what characters in the original database cannot be converted (1 for 1) during a code page conversion of a database?

  • That depends so try the link below and print out code page 1252 the TXT version and CP850 the TXT version to see the difference, and the short answer it is related to what is stored in the database before migration.  Hope this helps.

    http://czyborra.com/charsets/codepages.html

    Kind regards,
    Gift Peddie

  • The link is great.  I have looked for those.

    I think I know the problem.  I didn't expect there to be any data that really got lost in conversion.  We haven't ever stored any real data that the 1252 code page couldn't handle.  It was an old, old decision that left us with the CP850 code page.  Now some purchased software has decided to break after an upgrade and I'm left with a vendor saying to convert it.  I think they have a couple tables that are used to stored rows that have mutiple meanings.  So for some rows in a table a particular field will have alphanumeric data, but then some rows it contains what is really binary data and has no meaningfull character representation.  The data is defined as character in the database.  I think to effectively convert this I'm going to have to update the ones that are really binary data with the original data in binary form with no conversion.

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply