June 28, 2005 at 9:34 am
I need to perform a delete between two tables based on a text field as part of an ETL process. I just need to find the fields that are exact matches...
Can I use full text indexing for this ?
June 28, 2005 at 9:56 am
Please post some sample data (just the relevant fields will do) and datatype/size of text fields. No need for full text indexing.
Regards
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 28, 2005 at 10:10 am
The two tables are ScanResultStage and ScanResultBuild
Each contain a single result column which is text, because of the fact that the size can easily surpass 8000 chararacters. The data is typical results from network scanning tools.
I need to delete the duplicates...
Thanks
June 28, 2005 at 10:17 am
Aha - never had to work with huge text fields - over to someone else!
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 29, 2005 at 12:06 am
Comparing a text field will be problematic once it gets too large, check out this new script for a easy way to determine if the text fields are an exact checksum match:
http://www.sqlservercentral.com/scripts/contributions/1527.asp
Not my script, but a great way to test for text field equality. Even then I'll bet that this process is still slow. I am also not sure if this will ignore "white space" differences.
Try it and see...
Andy
June 29, 2005 at 11:14 am
Thanks David,
I broke the process into 2 parts...
I process anything with a length less than 8000 using varchar. This is the bulk of those records.
I used the example code from that function and actually broke the larger text values into smaller varchar chunks and stored them in an staging table to use for comparisons. It's slow, but it works.
I move these records from the staging table into the final warehouse and insert the first "line" into the result table with a key tying it back to the table with the additional lines.
The application simply looks at the main result table and adds a "(more)" if it has a link to the child table.
If "(more)" is selected, the app will retrieve the lines from the child table.
Not the best solution, but it works....
Thanks
June 29, 2005 at 11:52 am
You might also look at SET TEXTSIZE in BOL, to work beyond the 8000 limit for a varchar.
Andy
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply