Comparing Text Fields

  • I need to perform a delete between two tables based on a text field as part of an ETL process. I just need to find the fields that are exact matches...

    Can I use full text indexing for this ?

  • Please post some sample data (just the relevant fields will do) and datatype/size of text fields.  No need for full text indexing.

    Regards

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • The two tables are ScanResultStage and ScanResultBuild

    Each contain a single result column which is text, because of the fact that the size can easily surpass 8000 chararacters. The data is typical results from network scanning tools.

    I need to delete the duplicates...

    Thanks

  • Aha - never had to work with huge text fields - over to someone else!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Comparing a text field will be problematic once it gets too large, check out this new script for a easy way to determine if the text fields are an exact checksum match:

    http://www.sqlservercentral.com/scripts/contributions/1527.asp

    Not my script, but a great way to test for text field equality. Even then I'll bet that this process is still slow. I am also not sure if this will ignore "white space" differences.

    Try it and see...

    Andy

  • Thanks David,

    I broke the process into 2 parts...

    I process anything with a length less than 8000 using varchar. This is the bulk of those records.

    I used the example code from that function and actually broke the larger text values into smaller varchar chunks and stored them in an staging table to use for comparisons. It's slow, but it works.

    I move these records from the staging table into the final warehouse and insert the first "line" into the result table with a key tying it back to the table with the additional lines.

    The application simply looks at the main result table and adds a "(more)" if it has a link to the child table.

    If "(more)" is selected, the app will retrieve the lines from the child table.

    Not the best solution, but it works....

    Thanks

  • You might also look at SET TEXTSIZE in BOL, to work beyond the 8000 limit for a varchar.

    Andy

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply