December 12, 2016 at 6:03 am
Hi,
I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.
Both source and target columns are varchar type.
However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.
Do I need to change any visual settings under SSMS?
thanks
December 12, 2016 at 6:45 am
Hi,
I am seeing that special character on my SSMS query, where as the actual data does not contain that character.
Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?
Thanks
December 12, 2016 at 7:01 am
Is the code page the same for source and destination? Is the ETL tool doing some kind of data conversion?
December 12, 2016 at 7:07 am
etl2016 (12/12/2016)
Hi,I am seeing that special character on my SSMS query, where as the actual data does not contain that character.
Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?
Thanks
What shows instead in the initial table? What ETL tool are you using, and how is it moving the data across?
SSMS simply displays what's in the record, if it's showing a ü, there's a ü in there record. If it isn't in the source data, this implies your ETL process is doing it.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
December 13, 2016 at 3:42 pm
etl2016 (12/12/2016)
Hi,I am seeing that special character on my SSMS query, where as the actual data does not contain that character.
Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?
Thanks
1 byte is 8 bits, which allows 256 characters.
Therefore, there is a perfect possibility to hold the character ASCII 252.
the actual data does not contain that character
Are you sure?
Try convert(varbinary(max), SourceData) - see what you can find at the place where you're getting 'ü' at the destination.
_____________
Code for TallyGenerator
December 13, 2016 at 4:15 pm
As Sergiy has already indicated, the assumption that it can't be in the source data because it's CHAR/VARCHAR is incorrect.
That can be shown in multiple ways.
To pick a couple at random, the return type of the CHAR() function is CHAR(1), and it can display ü with no issues (just pass 252 to the function); CASTing that character or a string containing it as CHAR/VARCHAR will also work.
Cheers!
December 14, 2016 at 6:49 am
etl2016 (12/12/2016)
Hi,I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.
Both source and target columns are varchar type.
However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.
Do I need to change any visual settings under SSMS?
thanks
Check if there's a collation difference in the tables, run this query for the source table and destination table:
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'YourSchema'
AND TABLE_NAME = 'YourTable'
December 14, 2016 at 9:50 am
We found, much to our surprise and dismay, that starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.
This caused much head scratching and profanity when moving data to systems that weren't so tolerant.
December 14, 2016 at 10:28 am
JustMarie (12/14/2016)
starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.
the original post said both the source table and destination table are on SQL Server 2014, so that may be a red herring, as both would have had that character right? If your SSIS package wasn't doing any transformations, then there's likely either a collation mismatch between the source table and the destination table, or the SSIS connections themselves had the wrong code page defined somewhere.
December 15, 2016 at 2:31 am
Hi,
Latin1_General_CI_AS is the collation at both source and destination (both of which are SQL server databases)
At source, the character is – (hyphen) and at destination, it is coming as û
Between Source and Destination, there is SSIS ETL package, and it is doing a basic one to one mapping without any Transformations.
Source, middle-man ETL, Destination all are Microsoft products, so, a good compatibility while data propagates.
Under SSIS, the settings are Code page 1252, Data type string [DT_STR] Length 100 precision 0 scale 0
Source and target both have data types as [varchar](100) NULL
thanks
December 15, 2016 at 3:09 am
Check hex view of your inbound string.
Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.
_____________
Code for TallyGenerator
December 15, 2016 at 3:22 am
Sergiy (12/15/2016)
Check hex view of your inbound string.Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.
Hi,
true, actually as a next step, i was trying to explore options to see raw bytes, than rely on what is shown on screen.
if it were unix kind of system I would use OD -c command (octal dump), could you please suggest how would I use hex view ? Can I do it from SSMS or do I need to do it from command line using sqlcmd etc?
thanks for your suggestion....
December 15, 2016 at 4:07 am
okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)
At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.
Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.
VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.
Then, why is characters getting deformed?
thanks
December 15, 2016 at 4:36 am
etl2016 (12/15/2016)
okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.
Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.
VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.
Then, why is characters getting deformed?
thanks
What is the character right before the dash, and right before û?
_____________
Code for TallyGenerator
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply