Locating unrecognized characters

  • I would like to locate some corrupted string data in a SQL 2000 database, which contains unrecognized characters; caused by an error in which special characters were not encoded properly.

    A notes field in front-end allowed users to update free-text, and has been "corrupted" if it contained a special character, eg Test1%Test2

    If I view this text through table view in SQL-SEM it appears as, say

     Test1[littleblackbox]     [littleblackbox] + [garbles]

    (I can't post the little black box on this forum but you know what I mean).

    If I query the note in QA, the result appears as

     Test1

    How would I search text fields to locate instances of the little black box? Too much data to do it visually, and I can't get the offending character into Transact-SQL.

    Many thanks in advance for any assistance and suggestions,

    Regards,

    Aidan.

     


    Aidan Mooney

  • Hello Aidan,

    First, find the ascii character of the offending character.  Example:

    declare @badcharcode int

    select @badcharcode = ascii(substring(Notes, 6, 1))

    from mytable

    where mypk = 1

    This example assumes that one instance of the black box occurs at character 6 of a column named notes in a table named mytable, in a row with a primary key value that is known (here, mypk = 1).  You obviously will need to adapt for your own situation.

    Then, run something like this:

    select *

    from mytable

    where charindex(char(@badcharcode)) != 0

    This is pretty simple, but is it enough to get you going?

    Cheers,

    Chris

  • Thanks very much, pointed me in the right direction.

    The ASCII value of the bad character was 0.

    So I was able to locate records containing the offending character using

    select * from dbo.notes where charindex(char(0), note) > 0

    Many thanks,

    Aidan

     

     

     


    Aidan Mooney

  • Hi Aidan,

    Hoo-rah - glad to help! 

    So it was char(0), huh?  You might want to talk to your developers about that one.  Or talk to yourself (in a non-neurotic kind of way) if you're the developer.  There's some control or code someplace that's adding those.  Are they supposed to be newline (CRLF) characters, or something like that?

    Just curious...

    Cheers,

    Chris

  • Hi Chris,

    Using XML to post data from ASP pages to SQL database. A function in a schema to encrypt special characters in a URL did not so for % (pure oversight); and that is where the dodgy characters were generated.

     

    Many thanks for your help.

    Aidan.


    Aidan Mooney

  • Gotcha.  I know I didn't have to know, so thanks for taking time to satisfy my curiosity. 

    -Chris

  • char(0) is NULL .. but not DBNULL which is what sql puts in NULL columns. A tricky thing that I see from time to time, that makes results look really screwy in QA / EM.

    Usually it appears in your data when you import data. Most developers won't have gone to the effort to break things that badly. But certainly badly encoded XML will do that to you.

    As a side question, make sure your developers actually use the MSXML object to create their xml and to parse their xml. MSXML actually produces properly encoded xml files. Whereas most of the time when people "hand craft" an xml doc, or have a program create one they get some things wrong.

    A good way to give an XML file a quick check is open it in Internet Explorer who will complain if the xml doc isn't at least syntactly correct.


    Julian Kuiters
    juliankuiters.id.au

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply