January 3, 2010 at 6:37 am
Actually the issue of not updating multiple rows was happening because theof the workaround second update query I wrote to update the duplicate_with field to have the lowest id and even same id as invoiceid if it is greater. this worked fine if there were only 2 rows but in case of multiple matching rows it caused this problem. I am pasting the second query here; please let me know what I can do to fix this
UPDATE mjtest
SET duplicate_with = InvoiceID
where duplicate_type = 'TRUE DUPLICATE'
AND InvoiceID < duplicate_with
AND mjtest.duplicate_with is not null and mjtest.duplicate_with <> 0
January 4, 2010 at 3:06 pm
Try this:
UPDATE mjtest
SET duplicate_with = MyAlias.DupId
FROM mjtest
LEFT JOIN
(
select IDSrc.InvoiceID, DupID=MIN(DupSrc.InvoiceID)
FROM MJTest IDSrc
CROSS JOIN MJTest DupSrc
WHERE IDSrc.VendorNumber like '%' + DupSrc.VendorNumber + '%'
AND IDSrc.CheckNumber like '%' + DupSrc.CheckNumber + '%'
AND IDSrc.InvoiceNumber like '%' + DupSrc.InvoiceNumber + '%'
AND IDSrc.InvoiceDate between DupSrc.InvoiceDate - 15 and DupSrc.InvoiceDate + 15
AND IDSrc.projectid = DupSrc.projectid
AND IDSrc.duplicate_with is null
GROUP BY IDSrc.InvoiceID
HAVING COUNT(DupSrc.InvoiceID) > 1
) MyAlias
ON MJTest.InvoiceID = MyAlias.InvoiceID
January 4, 2010 at 8:44 pm
thanks. This worked great.
January 5, 2010 at 9:38 am
You're welcome.
- Les
P.S.
Just out of curiosity, rough guess, what kind of speedup did you see over using your cursor approach?
January 10, 2010 at 6:52 am
Its taking 3minutes now compared to 25 minutes earlier.
January 11, 2010 at 7:40 am
If I may make a suggestion, if this is a query you run often, consider restructuring your tables so that, for the fields on which you are searching, the portions on which you are looking for a match are separated from the data which varies by record (thus, you could change your test from "like '%'+ <whatever> + '%'" to "= <whatever>". With proper indexing, your queries will be much, much faster and the exercise will probably lead you to rethinking some of your current data organization perhaps resulting in a cleaner implementation.
If you don't wish to lose the fields you currently have in favor of separate fields for the fixed vs. variable portions, you could create the split fields and then change the definition of the current fields to computed fields, moving your indexes to the fields on which you actually search. If you do other forms of searches on the composite fields you could either just relax normalization and duplicate the data in normal, non-computed fields (I would add check constraints to ensure they stay in sync or at least default constraints to create them automatically) or, if you are using SQL Server 2005 or later, you can leave them as computed fields but create indexes on the computed fields as well as on the component fields from which they are built.
In general, if you find the need to manipulate the data in the fields in your "where" clause (with the exception, perhaps, of the use of IFNULL() -- I suspect the optimizer likely handles that pretty efficiently) you should consider restructuring your tables such that you are storing the data on which you actually search rather than something from which the search data must be derived. It is much more efficient to derive the output data once the search is done than to derive your search data for every single comparison on every single record in the table.
January 12, 2010 at 12:42 am
This query is run only once for any of the tables and the data is received in a text format and then imported so we have to use like statements
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply