Latin small letter u with diaeresis

Question

Latin small letter u with diaeresis

etl2016

Default port

Points: 1459
More actions
December 12, 2016 at 6:03 am

#332676

Hi,
I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.
Both source and target columns are varchar type.
However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.
Do I need to change any visual settings under SSMS?
thanks

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99143 More actions · Answer 1

What exactly is your question here? Are you saying that the character, ü, is not the character it should be? What character should it be?

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

etl2016 Default port Points: 1459 More actions · Answer 2

Hi,

I am seeing that special character on my SSMS query, where as the actual data does not contain that character.

Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?

Thanks

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Is the code page the same for source and destination? Is the ETL tool doing some kind of data conversion?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Thom A SSC Guru Points: 99143 More actions · Answer 4

etl2016 (12/12/2016)
Hi,
I am seeing that special character on my SSMS query, where as the actual data does not contain that character.
Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?
Thanks

What shows instead in the initial table? What ETL tool are you using, and how is it moving the data across?

SSMS simply displays what's in the record, if it's showing a ü, there's a ü in there record. If it isn't in the source data, this implies your ETL process is doing it.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Sergiy SSC Guru Points: 110209 More actions · Answer 5

etl2016 (12/12/2016)
Hi,
I am seeing that special character on my SSMS query, where as the actual data does not contain that character.
Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?
Thanks

1 byte is 8 bits, which allows 256 characters.

Therefore, there is a perfect possibility to hold the character ASCII 252.

the actual data does not contain that character

Are you sure?

Try convert(varbinary(max), SourceData) - see what you can find at the place where you're getting 'ü' at the destination.

_____________
Code for TallyGenerator

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 6

As Sergiy has already indicated, the assumption that it can't be in the source data because it's CHAR/VARCHAR is incorrect.

That can be shown in multiple ways.

To pick a couple at random, the return type of the CHAR() function is CHAR(1), and it can display ü with no issues (just pass 252 to the function); CASTing that character or a string containing it as CHAR/VARCHAR will also work.

Cheers!

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 7

etl2016 (12/12/2016)
Hi,
I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.
Both source and target columns are varchar type.
However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.
Do I need to change any visual settings under SSMS?
thanks

Check if there's a collation difference in the tables, run this query for the source table and destination table:

SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = 'YourSchema'

AND TABLE_NAME = 'YourTable'

JustMarie SSCertifiable Points: 7771 More actions · Answer 8

We found, much to our surprise and dismay, that starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.

This caused much head scratching and profanity when moving data to systems that weren't so tolerant.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 9

JustMarie (12/14/2016)
starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.

the original post said both the source table and destination table are on SQL Server 2014, so that may be a red herring, as both would have had that character right? If your SSIS package wasn't doing any transformations, then there's likely either a collation mismatch between the source table and the destination table, or the SSIS connections themselves had the wrong code page defined somewhere.

etl2016 Default port Points: 1459 More actions · Answer 10

Hi,

Latin1_General_CI_AS is the collation at both source and destination (both of which are SQL server databases)

At source, the character is – (hyphen) and at destination, it is coming as û

Between Source and Destination, there is SSIS ETL package, and it is doing a basic one to one mapping without any Transformations.

Source, middle-man ETL, Destination all are Microsoft products, so, a good compatibility while data propagates.

Under SSIS, the settings are Code page 1252, Data type string [DT_STR] Length 100 precision 0 scale 0

Source and target both have data types as [varchar](100) NULL

thanks

Sergiy SSC Guru Points: 110209 More actions · Answer 11

Check hex view of your inbound string.

Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.

_____________
Code for TallyGenerator

etl2016 Default port Points: 1459 More actions · Answer 12

Sergiy (12/15/2016)
Check hex view of your inbound string.
Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.

Hi,

true, actually as a next step, i was trying to explore options to see raw bytes, than rely on what is shown on screen.

if it were unix kind of system I would use OD -c command (octal dump), could you please suggest how would I use hex view ? Can I do it from SSMS or do I need to do it from command line using sqlcmd etc?

thanks for your suggestion....

etl2016 Default port Points: 1459 More actions · Answer 13

okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)

At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.

Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.

VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.

Then, why is characters getting deformed?

thanks

Sergiy SSC Guru Points: 110209 More actions · Answer 14

etl2016 (12/15/2016)
okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)
At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.
Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.
VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.
Then, why is characters getting deformed?
thanks

What is the character right before the dash, and right before û?

_____________
Code for TallyGenerator