Remove weird character from one of my SQL tables

Question

Post reply

Remove weird character from one of my SQL tables

Lord Slaagh

SSCertifiable

Points: 5741
More actions
September 10, 2015 at 3:11 pm

#320520

Hello,
I have an SSIS package that loads data into a table with all VARCHAR(MAX) datatypes. Let's call this table RAW. Once the RAW table is loaded, I then transform/convert the data and push it into a production table. After several grueling days of my package failing I decided to dig through the data in the RAW table and I found this (�) character in the RAW table. Once I removed this row (manually) everything worked perfectly.
I tried using a REPLACE() function to remove this character, but it is not working. I did some research on Google and this is apparently an escape character used in PDF files.
Does anyone know how to remove characters from a SQL table using a script based on the ASCII value? My plan is to load the RAW table and then use an EXECUTE SQL Task to remove all instances of this (�). I am not sure how to do this in the SQL script.
Any help would be greatly appreciated.
STG
The are no problems, only solutions. --John Lennon

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 1

Should be able to REPLACE the NCHAR(65533).

Something like this:

DECLARE @UnicodeString NCHAR(100);

SET @UnicodeString=NCHAR(65533)+N'Hey, what''s that at the beginning?';

SELECT @UnicodeString;

SELECT REPLACE(@UnicodeString,NCHAR(65533),'');

Cheers!

Lord Slaagh SSCertifiable Points: 5741 More actions · Answer 2

Awesome! Thank you very much for making my Friday :-D:-D

The are no problems, only solutions. --John Lennon

Lord Slaagh SSCertifiable Points: 5741 More actions · Answer 3

Ok, so here is what I am trying to do but it is not working. Any ideas?

UPDATE TABLE

SET DS = REPLACE(DS,NCHAR(65533),'');

The are no problems, only solutions. --John Lennon

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 4

SQLTougherGuy (9/11/2015)
Any ideas?

Perhaps NCHAR(65533) is not actually the offending character?

At the least I would suggest doing

UPDATE TABLE

SET DS = REPLACE(DS,NCHAR(65533),'')

WHERE DS LIKE '%' + NCHAR(65533) + '%'

to reduce the number of rows updated / time / log space used, and perhaps add

PRINT CONVERT(varchar(20), @@ROWCOUNT) + ' NCHAR(65533) rows fixed'

When we import via staging tables like this we add two columns for HasError (Bit) and ErrorMessage (varchar(MAX))

We update to "flag" any rows that fail any validation rules we have - to which we add new "rules" over time as new data-validation problems are found.

UPDATE TABLE

SET HasError = 1, ErrorMessage = IsNull(ErrorMessage+'; ', '') + 'Invalid character'

WHERE DS LIKE '%[^' + CHAR(32) + '-' + CHAR(127) + ']%' -- Any character not within Space to Tilde

PRINT CONVERT(varchar(20), @@ROWCOUNT) + ' rows have invalid characters'

We would likely have other updates that check that Text Dates are valid, Customer IDs can be found, all those sorts of things.

Then we either import only the rows that don't have HasError = 1, or we abort the whole import if ANY rows have HasError set.

We report ErrorMessage (and PKeys) to end user for any rows with HasError = 1

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 5

Just to confirm, it's running without errors but not actually replacing any characters?

Might be something like this: https://connect.microsoft.com/SQLServer/feedback/details/385082/replace-function-does-not-work-with-valid-unicode-characters-above-65500

Try using the explicit binary collation as in that workaround, i.e.:

UPDATE TABLE

SET DS = REPLACE(DS COLLATE Latin1_General_BIN,NCHAR(65533),'');

If that doesn't work, then offhand I'm not too sure about other possibilities. I'll have to see if I can get some time to tinker around and reproduce your situation.

Cheers!

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

You might have characters that are not NCHAR(65533).

You need to identify the characters that don't belong to your dataset.

CREATE TABLE #CharsTest(

nstring nvarchar(100));

INSERT INTO #CharsTest

VALUES

(N'This has a unicode char ?'),

(N'This has a unicode char ? as well'),

(N'This has a ascii only chars')

SELECT nstring, CAST( nstring AS varchar(100)), UNICODE( SUBSTRING( nstring, 25, 1))

FROM #CharsTest

WHERE nstring <> CAST( nstring AS varchar(100))

GO

DROP TABLE #CharsTest

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

This can help you find unicode characters that you might not want. It will return the string and the characters, you just need to change the test table to your table and the appropriate columns.

CREATE TABLE #CharsTest(

nstring nvarchar(100));

INSERT INTO #CharsTest

VALUES

(N'This has a unicode char ?'),

(N'This also has a unicode char ? '),

(N'This has a ascii only chars');

--Query starts here:

WITH

E(n) AS(

SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

),

E2(n) AS(

SELECT a.n FROM E a, E b

),

E4(n) AS(

SELECT a.n FROM E2 a, E2 b

),

cteTally(n) AS(

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

FROM E4

)

SELECT nstring, SUBSTRING( nstring, n, 1), UNICODE( SUBSTRING( nstring, n, 1))

FROM #CharsTest

JOIN cteTally ON LEN( nString) >= n

WHERE SUBSTRING( nstring, n, 1) <> CAST( SUBSTRING( nstring, n, 1) AS varchar(100));

GO

DROP TABLE #CharsTest

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Lord Slaagh SSCertifiable Points: 5741 More actions · Answer 8

Yup!!! Was missing the COLLATE.

Update TABLE

set DS = Replace(DS, nchar(65533) COLLATE Latin1_General_BIN2, '!')

Thank you all very much! I love this site :-):-)

The are no problems, only solutions. --John Lennon

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 9

SQLTougherGuy (9/10/2015)
loads data into a table with all VARCHAR(MAX)

That may not sit very well with NCHAR(65533) ... I'm not sure, but it might be better to change your staging table to Nvarchar(MAX). Perhaps the Binary Collation is fixing the comparison, but maybe the lack of Nvarchar(max) will be hiding some other problem?