Character Encoding Conversion Issues?

  • I have a column thats defined as nvarchar(256). I wish i could post the exact data i'm having issues with but i can't because its sensitive information. Anyways when do the following:

    Select MyColumn from MyTable

    and then copy the output from the output window and do the following:

    Select Cast(N'MyCopiedOutPut' as VarBinary(50))

    and then compare it to:

    Select Cast(MyColumn as VarBinary(50)) from MyTable

    in many cases it matches but it sometimes it doesn't. When it doesn't match up though its VERY VERY close. I mean like one character off in the hex representation. I'm thought maybe it had to deal with maybe the output window using UTF8 so i outputed it to a file i chose unicode encoding. But same thing happened.

    Do you know whats going on and how to fix the issue?

  • Ncage (8/10/2016)


    I have a column thats defined as nvarchar(256). I wish i could post the exact data i'm having issues with but i can't because its sensitive information. Anyways when do the following:

    Select MyColumn from MyTable

    and then copy the output from the output window and do the following:

    Select Cast(N'MyCopiedOutPut' as VarBinary(50))

    and then compare it to:

    Select Cast(MyColumn as VarBinary(50)) from MyTable

    in many cases it matches but it sometimes it doesn't. When it doesn't match up though its VERY VERY close. I mean like one character off in the hex representation. I'm thought maybe it had to deal with maybe the output window using UTF8 so i outputed it to a file i chose unicode encoding. But same thing happened.

    Do you know whats going on and how to fix the issue?

    This is a tough one. Since the data is sensitive I understand you can't post it. However, without a way of reproducing the behavior you are seeing it is impossible to help here. How does the varbinary get generated? Could it be as simple as case sensitivity?

    Notice these strings have the same characters but the case is different. The varbinary is incredibly close.

    Select Cast(N'MyCopiedOutPut' as VarBinary(50)) union all

    Select Cast(N'MYCOPIEDOUTPUT' as VarBinary(50))

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I think this is really hard without specific data. Can you mock up an example of something that doesn't work that's exact?

    I'd think that the conversions from varchar to nvarchar to binary might result in something different, depending on order. However this is likely some corner case of the character sets.

    Don't forget, when you copy from the output window, you're also introducing a variable of properly selecting and moving the data through the clipboard.

  • Steve Jones - SSC Editor (8/10/2016)


    I think this is really hard without specific data. Can you mock up an example of something that doesn't work that's exact?

    I'd think that the conversions from varchar to nvarchar to binary might result in something different, depending on order. However this is likely some corner case of the character sets.

    Don't forget, when you copy from the output window, you're also introducing a variable of properly selecting and moving the data through the clipboard.

    Thanks Guys. Like you suggested i created some mock up data:

    CREATE TABLE temp2(

    hash NVARCHAR(256) NOT NULL,

    salt UNIQUEIDENTIFIER NOT NULL

    )

    DECLARE @counter INT = 0

    WHILE@counter < 500

    BEGIN

    DECLARE @hash NVARCHAR(256), @salt UNIQUEIDENTIFIER

    SET @salt = NEWID()

    SELECT @hash = HASHBYTES('MD5', CAST(@salt AS NVARCHAR(36)) +'Password' + CAST(@counter AS NVARCHAR(3)))

    INSERT INTO dbo.temp2

    ( hash, salt )

    VALUES ( @hash, -- hash - nvarchar(256)

    @salt -- salt - uniqueidentifier

    )

    SET @counter = @counter + 1

    END

    I randomly selected 20 of those entries and out of 20; 16 matched and 4 didn't. Attached is an example of one that didn't work and and example of one that did. The item with the following guid A46BD50C-44B1-4E8A-B82C-EECD772E5E59 did work and the other one didn't.

    The one that didn't work:

    0x4FE49ED94DA84C7A26632CFEABFAEF5E

    0x4FE4FDFF4DA84C7A26632CFEABFAEF5E

    They one that did:

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    ----------------------------------

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    Let me know if they is absolutely anything else i can provide....

    thanks

  • Ncage (8/10/2016)


    Steve Jones - SSC Editor (8/10/2016)


    I think this is really hard without specific data. Can you mock up an example of something that doesn't work that's exact?

    I'd think that the conversions from varchar to nvarchar to binary might result in something different, depending on order. However this is likely some corner case of the character sets.

    Don't forget, when you copy from the output window, you're also introducing a variable of properly selecting and moving the data through the clipboard.

    Thanks Guys. Like you suggested i created some mock up data:

    CREATE TABLE temp2(

    hash NVARCHAR(256) NOT NULL,

    salt UNIQUEIDENTIFIER NOT NULL

    )

    DECLARE @counter INT = 0

    WHILE@counter < 500

    BEGIN

    DECLARE @hash NVARCHAR(256), @salt UNIQUEIDENTIFIER

    SET @salt = NEWID()

    SELECT @hash = HASHBYTES('MD5', CAST(@salt AS NVARCHAR(36)) +'Password' + CAST(@counter AS NVARCHAR(3)))

    INSERT INTO dbo.temp2

    ( hash, salt )

    VALUES ( @hash, -- hash - nvarchar(256)

    @salt -- salt - uniqueidentifier

    )

    SET @counter = @counter + 1

    END

    I randomly selected 20 of those entries and out of 20; 16 matched and 4 didn't. Attached is an example of one that didn't work and and example of one that did. The item with the following guid A46BD50C-44B1-4E8A-B82C-EECD772E5E59 did work and the other one didn't.

    The one that didn't work:

    0x4FE49ED94DA84C7A26632CFEABFAEF5E

    0x4FE4FDFF4DA84C7A26632CFEABFAEF5E

    They one that did:

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    ----------------------------------

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    Let me know if they is absolutely anything else i can provide....

    thanks

    And how are you selecting the data here? I am missing something. I see a bunch of data is generated but what is that showing us? And what happened to the varbinary(50)?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/10/2016)


    Ncage (8/10/2016)


    Steve Jones - SSC Editor (8/10/2016)


    I think this is really hard without specific data. Can you mock up an example of something that doesn't work that's exact?

    I'd think that the conversions from varchar to nvarchar to binary might result in something different, depending on order. However this is likely some corner case of the character sets.

    Don't forget, when you copy from the output window, you're also introducing a variable of properly selecting and moving the data through the clipboard.

    Thanks Guys. Like you suggested i created some mock up data:

    CREATE TABLE temp2(

    hash NVARCHAR(256) NOT NULL,

    salt UNIQUEIDENTIFIER NOT NULL

    )

    DECLARE @counter INT = 0

    WHILE@counter < 500

    BEGIN

    DECLARE @hash NVARCHAR(256), @salt UNIQUEIDENTIFIER

    SET @salt = NEWID()

    SELECT @hash = HASHBYTES('MD5', CAST(@salt AS NVARCHAR(36)) +'Password' + CAST(@counter AS NVARCHAR(3)))

    INSERT INTO dbo.temp2

    ( hash, salt )

    VALUES ( @hash, -- hash - nvarchar(256)

    @salt -- salt - uniqueidentifier

    )

    SET @counter = @counter + 1

    END

    I randomly selected 20 of those entries and out of 20; 16 matched and 4 didn't. Attached is an example of one that didn't work and and example of one that did. The item with the following guid A46BD50C-44B1-4E8A-B82C-EECD772E5E59 did work and the other one didn't.

    The one that didn't work:

    0x4FE49ED94DA84C7A26632CFEABFAEF5E

    0x4FE4FDFF4DA84C7A26632CFEABFAEF5E

    They one that did:

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    ----------------------------------

    0x49649F08A2B2E9E2D8F5BA79E46FC099

    (1 row(s) affected)

    Let me know if they is absolutely anything else i can provide....

    thanks

    And how are you selecting the data here? I am missing something. I see a bunch of data is generated but what is that showing us? And what happened to the varbinary(50)?

    How am i selecting the data? I'm just simply selecting the data from the output window using copy/paste:

    Select cast(hash as binary(16)) from temp2

    select cast(N'MyCopyPasteValueFromTheOutputWindow' as binary(16))

    and im comparing the binary output of the two above.

    In most cases they match but in some they don't. Why am i doing this? Essentially of course i'm not going to copy/paste every value in the table but i have to output a file which another program uses and the same thing is happening in the file i'm outputing. Hopefully that makes sense. I was just copy/pasting when i was testing why they weren't working.

  • Nope. Maybe I am being think but I don't get it. What are you comparing the store md5 results to? And not sure how that is demonstrating the issue you are seeing.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/10/2016)


    Nope. Maybe I am being think but I don't get it. What are you comparing the store md5 results to? And not sure how that is demonstrating the issue you are seeing.

    We are moving from using sql server to do our hashing to .net (which is the way it should have been initially). So i'm doing to following to create data for xUnit:

    SELECT TOP 20 N'[InlineData(' + N'"' + REPLACE(UPPER(master.sys.fn_varbintohexstr(CAST(hash AS BINARY(16)))),'0X','0x') + N'"' + N',' + '"' + hash+ N'"' + N')]'

    which creates the following for us to use in our unit tests:

    [InlineData("0xBE860EFA4B569E234BF56BD45D4D9FCB", "????????")]

    Whats the best way to test it. Use real data of course.

  • OK, so the issue here is you're moving hashing and trying to do comparisons. I think that helps.

    However, with your temp data mockup, we'll get different results with newid() than you do. I'm guessing you're taking some of the values from temp2 and doing something like this?

    SELECT CAST( N'????????' AS VARBINARY(50))

    , CAST( hash AS VARBINARY(50))

    FROM dbo.temp2

    WHERE CAST( N'????????' AS VARBINARY(50)) = CAST( hash AS VARBINARY(50))

    Is that right? Or something else?

    I think the flaw is in using a copy/paste from an output window as opposed to directly moving the nvarchar() data to perform the hashing. There might be issues related to this:

    http://weblogs.sqlteam.com/mladenp/archive/2009/04/28/Comparing-SQL-Server-HASHBYTES-function-and-.Net-hashing.aspx

    What I'd suggest you do is work with the data raw, never using this in a text format for display, and generate the hashes either completely with t-sql or completely with .NET, but not trying to mix the two.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply