September 10, 2015 at 3:11 pm
Hello,
I have an SSIS package that loads data into a table with all VARCHAR(MAX) datatypes. Let's call this table RAW. Once the RAW table is loaded, I then transform/convert the data and push it into a production table. After several grueling days of my package failing I decided to dig through the data in the RAW table and I found this (�) character in the RAW table. Once I removed this row (manually) everything worked perfectly.
I tried using a REPLACE() function to remove this character, but it is not working. I did some research on Google and this is apparently an escape character used in PDF files.
Does anyone know how to remove characters from a SQL table using a script based on the ASCII value? My plan is to load the RAW table and then use an EXECUTE SQL Task to remove all instances of this (�). I am not sure how to do this in the SQL script.
Any help would be greatly appreciated.
STG
The are no problems, only solutions. --John Lennon
September 10, 2015 at 3:45 pm
Should be able to REPLACE the NCHAR(65533).
Something like this:
DECLARE @UnicodeString NCHAR(100);
SET @UnicodeString=NCHAR(65533)+N'Hey, what''s that at the beginning?';
SELECT @UnicodeString;
SELECT REPLACE(@UnicodeString,NCHAR(65533),'');
Cheers!
September 11, 2015 at 7:12 am
Awesome! Thank you very much for making my Friday :-D:-D
The are no problems, only solutions. --John Lennon
September 11, 2015 at 9:00 am
Ok, so here is what I am trying to do but it is not working. Any ideas?
UPDATE TABLE
SET DS = REPLACE(DS,NCHAR(65533),'');
The are no problems, only solutions. --John Lennon
September 11, 2015 at 9:13 am
SQLTougherGuy (9/11/2015)
Any ideas?
Perhaps NCHAR(65533) is not actually the offending character?
At the least I would suggest doing
UPDATE TABLE
SET DS = REPLACE(DS,NCHAR(65533),'')
WHERE DS LIKE '%' + NCHAR(65533) + '%'
to reduce the number of rows updated / time / log space used, and perhaps add
PRINT CONVERT(varchar(20), @@ROWCOUNT) + ' NCHAR(65533) rows fixed'
When we import via staging tables like this we add two columns for HasError (Bit) and ErrorMessage (varchar(MAX))
We update to "flag" any rows that fail any validation rules we have - to which we add new "rules" over time as new data-validation problems are found.
UPDATE TABLE
SET HasError = 1, ErrorMessage = IsNull(ErrorMessage+'; ', '') + 'Invalid character'
WHERE DS LIKE '%[^' + CHAR(32) + '-' + CHAR(127) + ']%' -- Any character not within Space to Tilde
PRINT CONVERT(varchar(20), @@ROWCOUNT) + ' rows have invalid characters'
We would likely have other updates that check that Text Dates are valid, Customer IDs can be found, all those sorts of things.
Then we either import only the rows that don't have HasError = 1, or we abort the whole import if ANY rows have HasError set.
We report ErrorMessage (and PKeys) to end user for any rows with HasError = 1
September 11, 2015 at 9:18 am
Just to confirm, it's running without errors but not actually replacing any characters?
Might be something like this: https://connect.microsoft.com/SQLServer/feedback/details/385082/replace-function-does-not-work-with-valid-unicode-characters-above-65500
Try using the explicit binary collation as in that workaround, i.e.:
UPDATE TABLE
SET DS = REPLACE(DS COLLATE Latin1_General_BIN,NCHAR(65533),'');
If that doesn't work, then offhand I'm not too sure about other possibilities. I'll have to see if I can get some time to tinker around and reproduce your situation.
Cheers!
September 11, 2015 at 9:20 am
You might have characters that are not NCHAR(65533).
You need to identify the characters that don't belong to your dataset.
CREATE TABLE #CharsTest(
nstring nvarchar(100));
INSERT INTO #CharsTest
VALUES
(N'This has a unicode char ?'),
(N'This has a unicode char ? as well'),
(N'This has a ascii only chars')
SELECT nstring, CAST( nstring AS varchar(100)), UNICODE( SUBSTRING( nstring, 25, 1))
FROM #CharsTest
WHERE nstring <> CAST( nstring AS varchar(100))
GO
DROP TABLE #CharsTest
September 11, 2015 at 9:25 am
This can help you find unicode characters that you might not want. It will return the string and the characters, you just need to change the test table to your table and the appropriate columns.
CREATE TABLE #CharsTest(
nstring nvarchar(100));
INSERT INTO #CharsTest
VALUES
(N'This has a unicode char ?'),
(N'This also has a unicode char ? '),
(N'This has a ascii only chars');
--Query starts here:
WITH
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E4
)
SELECT nstring, SUBSTRING( nstring, n, 1), UNICODE( SUBSTRING( nstring, n, 1))
FROM #CharsTest
JOIN cteTally ON LEN( nString) >= n
WHERE SUBSTRING( nstring, n, 1) <> CAST( SUBSTRING( nstring, n, 1) AS varchar(100));
GO
DROP TABLE #CharsTest
September 11, 2015 at 9:41 am
Yup!!! Was missing the COLLATE.
Update TABLE
set DS = Replace(DS, nchar(65533) COLLATE Latin1_General_BIN2, '!')
Thank you all very much! I love this site :-):-)
The are no problems, only solutions. --John Lennon
September 11, 2015 at 11:07 am
SQLTougherGuy (9/10/2015)
loads data into a table with all VARCHAR(MAX)
That may not sit very well with NCHAR(65533) ... I'm not sure, but it might be better to change your staging table to Nvarchar(MAX). Perhaps the Binary Collation is fixing the comparison, but maybe the lack of Nvarchar(max) will be hiding some other problem?
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply