String Comparison Explanation

  • hi -

    i have a SQL 2005 table (TABLE 1) with one record of varchar(500)

    i have a TEXT file with one record

    i imported the TEXT file into SQL 2005 (TABLE 2 varchar(500) ) using SQL 2000 DTS

    visually, the value in TABLE 1 is the same as the value of TABLE 2, however when i compare the two values using methods below, i get results as if the value is different.

    the "<>" string comparison does not reflect accurate results, regardless of trimming

    select * from table1, table2 where table1.field1 <> table2.field2

    the checksums are different, even though the string appears to be identical in both tables

    select checksum(table1.field1) from table1

    select checksum(table2.field2) from table2

    is this happening because i entered the value into TABLE 1 using T-sql in SQL2005 and imported the value into TABLE 2 via SQL 2000?

    thanks !

  • Question:

    Did you attempt to import the text file into say Table3 using the same T-SQL used to import the text into Table1?

    If so what where the results of that?

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • The method of import won't make a difference.

    What you might try is breaking it down, character by character, and comparing that.

    If you have a numbers table, you can do something like this:

    select t1.number, t1char, t2char

    from

    (select number, substring(myfield, number, 1) as T1Char

    from dbo.table1

    inner join dbo.Numbers

    on Number <= len(myfield)) T1

    inner join

    (select number, substring(myfield, number, 1) as T2Char

    from dbo.table2

    inner join dbo.Numbers

    on Number <= len(myfield)) T2

    on T1.Number = T2.Number

    and T1Char != T2Char;

    That'll show you exactly where they are different.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • i did try various methods of importing and T-sql, still all different. after trying the char to char script, here are my results:

    select checksum(t1char),checksum(t2char), t1.number, t1char, t2char

    from

    (select appcolname,id,number, substring(appcolname, number, 1) as T1Char

    from dbo.tmp_planbenefitColumns

    inner join dbo.Numbers

    on Number <= len(appcolname) where id = 18) T1

    inner join

    (select description,id,number, substring(description, number, 1) as T2Char

    from dbo.tbl_plancodecomponenttype

    inner join dbo.Numbers

    on Number <= len(description) where id = 18) T2

    on T1.Number = T2.Number

    -- and T1Char != T2Char

    the grid is probably hard to read, however the difference is the space between Group and ( in "Small Group (Y/N)". im not sure how to fix this or understanding why its happening.

    (No column name)(No column name)numbert1chart2char

    1601601SS

    1541542mm

    1421423aa

    1531534ll

    1531535ll

    006

    1481487GG

    1591598rr

    1561569oo

    16216210uu

    15715711pp

    [font="Arial Black"]10012 [/font]

    404013((

    16616614YY

    474715//

    15515516NN

    414117))

  • Could you add the ASCII values of the characters to your output?

  • I'm not sure why you would want the checksum of the characters. Why not something more generally useful, like the ASCII value? With the ASCII value, you could tell exactly what the difference was between the characters.

    Also, why remove the non-equality check in the Where clause. I would think you would want to remove the noise and just get the signal on this one. What matters are the differences, not the identities.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • AscT1AscT2CheckSumT1CheckSumT2numbert1chart2char

    83831601601SS

    1091091541542mm

    97971421423aa

    1081081531534ll

    1081081531535ll

    3232006

    71711481487GG

    1141141591598rr

    1111111561569oo

    11711716216210uu

    11211215715711pp

    [font="Arial Black"]103210012 [/font]

    4040404013((

    898916616614YY

    4747474715//

    787815515516NN

    4141414117))

  • ASCII 10 is a line-feed, ASCII 32 is a space. That's why they are different. They'll look the same when you just select them, but to the computer, they are quite different. The Wikipedia article on ASCII has a table of the values and characters. So do a few dozen other web pages.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • so does the method of import really not matter or was that just a coincidence?

    thanks for the replies - extremely helpful

    -- to answer your question on why i took out the

    "and T1Char != T2Char ", with it in, i didnt get any results.

  • Ry-SQL (2/18/2009)


    so does the method of import really not matter or was that just a coincidence?

    Well, you got 2 different results. Does it matter to you?



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • More likely, there was actually some difference in the text when it was imported. Possibly in copy-and-pasting it into the SQL query, it got reformatted to have a space instead of a line-feed. That's what's most likely, but there are other possibilities.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I haven't had such a problem in a while, but for things like BULK INSERT and BCP, I seem to remember this type of thing happening with the file type parameter.... CHAR would convert LF's to spaces where as RAW would bring in the identical values.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply