October 8, 2013 at 7:07 am
Good Morning!
I'm trying to update a column based on a WHERE statement that uses a SELECT statement as the qualifier.
My goal is to find duplicate records in my table, then have a value in one of the columns change so that the users can search for and view these duplicate records and decide how they'd like to deal with them.
Here's the SELECT statement that does a nice job of finding my duplicates:
select IDX_Invoice_Number, IDX_Vendor_Number, status from _obj_3
group by IDX_Invoice_Number, IDX_Vendor_number, status
having COUNT (*)>1
Here's what I'd use to accomplish my goal, if it were possible:
update _obj_3
set status='7'
where idx_invoice_number in
(
select IDX_Invoice_Number, IDX_Vendor_Number, status from _obj_3
group by IDX_Invoice_Number, IDX_Vendor_number, status
having COUNT (*)>1
)
But, I get an error because the SELECT statement yeilds multiple returns:
'Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.'
Which it will, because they're duplicates that it's finding.
I've tried using an INNER JOIN back to itself with no luck, and just created a VIEW using my SELECT statement, hoping to be able to update the VIEW.
No luck there, as the VIEW won't allow me to update it.
Well, thanks taking a look, and I can't wait to see what comes back.
Thanks!
October 8, 2013 at 7:17 am
cmcfarland 51322 (10/8/2013)
Good Morning!I'm trying to update a column based on a WHERE statement that uses a SELECT statement as the qualifier.
My goal is to find duplicate records in my table, then have a value in one of the columns change so that the users can search for and view these duplicate records and decide how they'd like to deal with them.
Here's the SELECT statement that does a nice job of finding my duplicates:
select IDX_Invoice_Number, IDX_Vendor_Number, status from _obj_3
group by IDX_Invoice_Number, IDX_Vendor_number, status
having COUNT (*)>1
Here's what I'd use to accomplish my goal, if it were possible:
update _obj_3
set status='7'
where
(
select IDX_Invoice_Number, IDX_Vendor_Number, status from _obj_3
group by IDX_Invoice_Number, IDX_Vendor_number, status
having COUNT (*)>1
)
But, I get an error because the SELECT statement yeilds multiple returns.
Which it will, because they're duplicates that it's finding.
I've tried using an INNER JOIN back to itself with no luck, and just created a VIEW using my SELECT statement, hoping to be able to update the VIEW.
No luck there, as the VIEW won't allow me to update it.
Well, thanks taking a look, and I can't wait to see what comes back.
Thanks!
You were getting an error (you should always post the error message btw) because you have nothing on one side of your condition. You can't just say Where (SubQuery).
Try this:
update _obj_3
set status='7'
where IDX_Invoice_Number in
(
select IDX_Invoice_Number from _obj_3
group by IDX_Invoice_Number, IDX_Vendor_number, status
having COUNT (*)>1
)
You might also refer to today's article by Stefan. It deals with finding and then deleting duplicates.
http://www.sqlservercentral.com/articles/duplicate+data/102383/[/url]
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
October 8, 2013 at 7:25 am
Hey, thanks for taking a look.
I added the error, and amended my query in the post, as I'd actually written it with the 'in' clause to start with but messed it up when I brought it into my post.
Thanks!
October 8, 2013 at 7:25 am
Since you are comparing 3 columns and it doesn't appear to have any different about them you could use a function like binary_checksum to create a single value for the duplicates. Like so:
create table Obj3 (InvoiceNumber int, VendorNumber int, StatusID int,);
go
insert into Obj3
values (1,1,1)
, (1,1,1)
, (2,2,2)
, (2,2,2)
, (3,3,3)
select InvoiceNumber, VendorNumber, StatusID
from Obj3
group by InvoiceNumber, VendorNumber, StatusID
having count(*) > 1
update Obj3
set StatusID = 7
where BINARY_CHECKSUM(InvoiceNumber, VendorNumber, StatusID) in
(select BINARY_CHECKSUM(InvoiceNumber, VendorNumber, StatusID)
from Obj3
group by InvoiceNumber, VendorNumber, StatusID
having count(*) > 1)
select InvoiceNumber, VendorNumber, StatusID
from Obj3
This should get you what you are looking for.
October 8, 2013 at 7:46 am
OK, it seems to work great!
I added some limiters to narrow it down to a single record that I knew should be modified by the query, and it did the trick.
Seems like it might just take some time to go through all of the rows.
I'll update the post once the run is complete.
Thanks!
October 8, 2013 at 7:56 am
For this type of problem I like the Windows functions to give each row a number which makes the row unique.
With CTE the 'double' rows can be removed.
CREATE TABLE dbo.SB_Dupli
(
Tekst varchar(300) NULL,
Number int NULL
) ON [PRIMARY]
insert into SB_Dupli values ('aaaa', 1)
insert into SB_Dupli values ('aaaa', 6)
insert into SB_Dupli values ('aaaa', 1)
insert into SB_Dupli values ('bbbb', 2)
insert into SB_Dupli values ('bbbb', 2)
insert into SB_Dupli values ('bbbb', 2)
insert into SB_Dupli values ('cccc', 3)
insert into SB_Dupli values ('dddd', 4)
insert into SB_Dupli values ('dddd', 5)
insert into SB_Dupli values ('dddd', 4)
-- show all rows.
select * from sb_dupli
-- show all distinct rows.
select distinct * from sb_dupli
-- show all rows with duplicates
select Tekst, number, COUNT(*) multiple_aantal from SB_Dupli
group by Tekst, number having COUNT(*) > 1
-- give the duplicates a 'number'
-- remove the double rows.
--------------------------------------------------------
-- Actual code to delete
--------------------------------------------------------
;with
A as (
select
row_number() OVER(PARTITION BY tekst, number ORDER BY (SELECT NULL)) volgnr
from sb_dupli
)
delete A where volgnr > 1
--------------------------------------------------------
-- End Actual code to delete
--------------------------------------------------------
-- show the table
select * from sb_dupli
-- drop the example table
drop table SB_Dupli
Variations. (Replace the row_number line with)
-- Duplicate texts are removed.
row_number() OVER(PARTITION BY tekst ORDER BY (SELECT NULL)) volgnr
-- Duplicate texts are removed Highest number is left.
row_number() OVER(PARTITION BY tekst ORDER BY number DESC) volgnr
-- Duplicate texts are removed Lowest number is left.
row_number() OVER(PARTITION BY tekst ORDER BY number) volgnr
-- Duplicate number are removed.
row_number() OVER(PARTITION BY number ORDER BY (SELECT NULL)) volgnr
With this set of code you can remove duplicates where the complete row is duplicated or a number of fields is duplicated.
The partition clause should contain all the fields to determine which are duplicates, first row in the order by is the row which is kept.
Ben
October 8, 2013 at 8:04 am
--------------------------------------------------------
-- Actual code to UPDATE
--------------------------------------------------------
;with
A as (
select
number,
row_number() OVER(PARTITION BY tekst, number ORDER BY (SELECT NULL)) volgnr
from sb_dupli
)
update A set number=number*100+volgnr where volgnr > 1
--------------------------------------------------------
-- End Actual code to UPDATE
--------------------------------------------------------
Rereading the question, you wanted an update, above an example of an update. Offcourse you have to come up with your own update strategy.
If the range is not know you could go for negative numbers for example:
set number = - (volgnr*100 +number)
This assumes you do not have more than 99 doubles.
Ben
October 8, 2013 at 8:05 am
Thanks!
This is my first brush with CET functions, and I'm sure I'll be using this more as time goes on.
October 8, 2013 at 8:08 am
Thanks.
That's closer to what I'm after this time around, but your previous post is also quite interesting to me.
I think I may have a large number of duplicates, so I guess I could probably just increase the number until the query is acting according to my wishes.
Does that sound right?
October 14, 2013 at 8:46 am
Use the following code to just see what is happening:
--------------------------------------------------------
-- Actual code to prepare for an UPDATE a Select
--------------------------------------------------------
;with
A as (
select
number,
row_number() OVER(PARTITION BY tekst, number ORDER BY (SELECT NULL)) volgnr
from sb_dupli
)
select Number*100+volgnr as New_Num, * where volgnr > 1
--------------------------------------------------------
-- End Actual code to prepare for an UPDATE a Select
--------------------------------------------------------
With the CTE functions it is always very easy to substitute in the last part with a SELECT, UPDATE or DELETE statement.
The above shows alle the doubles and not the 'first' row for each combination.
But you could also specify volgnr < 4, showing only for doubles for all situations or (volgnr > 1 and volgnr<4).
Ben
October 14, 2013 at 8:57 am
Thanks! I'll give that a try shortly!
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply