October 25, 2002 at 9:51 am
I have two tables
tbl_tracks
TrackID (identity)
TitleID (int)
ArtistID (int)
tbl_disc_tracks
DiscTrackID (identity)
TrackID (int)
DiscId (int)
I want to look in tbl_tracks for rows where titleID and ArtistID are the same for each row. I then want to get the min TrackID and update tbl_disc_tracks before deleting the duplicate rows.
with my update I update all the none min values but im unsure as to what I should do for my TrackID
UPDATE tbl_disc_tracks SET TrackID = ?????
WHERE (TrackID IN
(SELECT tbl_Temp.TrackID
FROM tbl_Tracks AS tbl_temp
WHERE EXISTS
(SELECT ArtistID, TitleID, COUNT(TrackID)
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_Temp.ArtistID AND tbl_Tracks.TitleID = tbl_Temp.TitleID
GROUP BY tbl_Tracks.ArtistID, tbl_Tracks.TitleID
HAVING COUNT(tbl_Tracks.TrackID) > 1))) AND (TrackID NOT IN
(SELECT MIN(TrackID)
FROM tbl_Tracks AS tbl_temp
WHERE EXISTS
(SELECT ArtistID, TitleID, COUNT(TrackID)
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_Temp.ArtistID AND tbl_Tracks.TitleID = tbl_Temp.TitleID
GROUP BY tbl_Tracks.ArtistID, tbl_Tracks.TitleID
HAVING COUNT(tbl_Tracks.TrackID) > 1)
GROUP BY ArtistID, TitleID))
ORDER BY ArtistID
then to delete delete all none min values Ive got ......
DELETE FROM
FROM tbl_tracks
WHERE (TrackID IN
(SELECT tbl_Temp.TrackID
FROM tbl_Tracks AS tbl_temp
WHERE EXISTS
(SELECT ArtistID, TitleID, COUNT(TrackID)
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_Temp.ArtistID AND tbl_Tracks.TitleID = tbl_Temp.TitleID
GROUP BY tbl_Tracks.ArtistID, tbl_Tracks.TitleID
HAVING COUNT(tbl_Tracks.TrackID) > 1))) AND (TrackID NOT IN
(SELECT MIN(TrackID)
FROM tbl_Tracks AS tbl_temp
WHERE EXISTS
(SELECT ArtistID, TitleID, COUNT(TrackID)
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_Temp.ArtistID AND tbl_Tracks.TitleID = tbl_Temp.TitleID
GROUP BY tbl_Tracks.ArtistID, tbl_Tracks.TitleID
HAVING COUNT(tbl_Tracks.TrackID) > 1)
GROUP BY ArtistID, TitleID))
ORDER BY ArtistID
October 25, 2002 at 10:08 am
Not sure what you mean about doing the update.
If there are duplicate title and artists, are you trying to get the trackid values in the child table set to the min(trackid)?
Won't this give you duplicates? I would think you want to remove the duplicates in the child first, then remove the parent. IS this not the case?
Steve Jones
October 25, 2002 at 11:51 am
I think you're making this overly complex for yourself. Try this:
UPDATE tbl_disc_tracks
SET TrackID = (SELECT MIN(TrackID)
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_disc_tracks.ArtistID
AND tbl_Tracks.TitleID = tbl_disc_tracks.TitleID)
Obviously, this will update ALL rows, but aside from performance concerns, this won't give you any errant data. If performance is a major concern, i.e. if tbl_disc_tracks is a very large table, you could add the following WHERE clause to the update statement:
WHERE EXISTS (SELECT tbl_Tracks.TrackID
FROM tbl_Tracks
WHERE tbl_Tracks.ArtistID = tbl_disc_tracks.ArtistID
AND tbl_Tracks.TitleID = tbl_disc_tracks.TitleID
GROUP BY tbl_Tracks.TrackID
HAVING COUNT(*) > 1)
(I believe I got that right)
Then if all you're trying to do is delete
rows in tbl_Tracks where the TrackID is not the minimum trackID, you don't even need to join to tbl_disc_tracks:
DELETE tbl_Tracks
WHERE TrackID > (SELECT MIN(TrackID)
FROM tbl_Tracks AS A
WHERE tbl_Tracks.ArtistID = A.ArtistID
AND tbl_Tracks.TitleID = A.TitleID)
I hope this helps.
Matthew Burr
October 26, 2002 at 6:33 am
it is the tblTracks that have duplicates.
e.g.
TrackID, ArtistID, TitleID
1, 1, 1
2, 1, 2
3, 2, 3
4, 2, 3
5, 2, 1
In this example I would want to delete the row trackID = 4 and keep TrackID = 3.
But because I relate to it in my tbl_Disc_Tracks I need to replace TrackID's in
tbl_Disc_Tracks where the TrackID is 4 with 3.
would i be better with a cursor and doing a loop ?
e.g.
UPDATE tbl_disc_tracks SET TrackID = 3 WHERE TrackID = 4?
I was trying to avoid this.
I wanted to try and do the update in one query then the delete in another.
My original DELETE query creates two lists. one of all the duplicates then one of all the duplicate minimum ID's. it says delete the ones IN the duplicates but and NOT IN the minimum. this bit works, im lost with my UPDATE.
hope that clears it up.
October 27, 2002 at 12:11 am
Look a little more closely at the response. Let's look at your first problem: you need to update tbl_disc_tracks so that the value for the TrackId is the value of the track that you are keeping in tbl_Tracks, and the value that you are keeping in tbl_Tracks is the minimum value of a set of duplicates; therefore, the value that you need for TrackID is the the minimum TrackID from the set of duplicates in tbl_tracks. So, first of all: how do you define duplicates? In your case, they are rows that have the same TitleID and the same ArtistID. Therefore, by grouping on these two columns, you will "merge" those duplicates into one row. And how do you get the minimum value for TitleID? By using the MIN aggregate function. Thus, the query:
SELECT TitleID, ArtistID, MIN(TrackID)
FROM tbl_Tracks
GROUP BY TitleID, ArtistID
This returns the minimum trackid for every TitleID, ArtistID combination in your table.
Now, you want to take that value and update tbl_Disc_Tracks so that that value is in the TrackID column, because you are going to delete the other duplicate rows out of tbl_Tracks, and in order to maintain a relationship between the two tables, tbl_Disc_Tracks.TrackID must match what exists in tbl_Tracks for a given TitleID, ArtistID combination. So, first the basic update statement:
UPDATE tbl_Disc_Tracks
SET TrackID = (SELECT tbl_Tracks.TitleID, tbl_Tracks.ArtistID, MIN(TrackID)
FROM tbl_Tracks
GROUP BY tbl_Tracks.TitleID, ArtistID)
*Note: the above code won't work; this is just conceptual.
We've used our select statement that gives us the TrackID for the duplicate record that we'll be keeping, and we're going to update tbl_Disc_Tracks so that the TrackID in that table matches the TrackID of the record we're keeping; but notice that right now, there's no correlation between our select statement and our update statement. We must alter our statement thusly:
UPDATE tbl_Disc_Tracks
SET TrackID = (SELECT MIN(TrackID)
FROM tbl_Tracks
WHERE tbl_Disc_Tracks.TitleID = tbl_Tracks.TitleID
AND tbl_Disc_Tracks.ArtistID = tbl_Tracks.ArtistID)
Notice that in this case, I removed the GROUP BY. Think of what this query does. First, it SELECTS all of the data from tbl_Disc_Tracks. Next, starting with the first row, it finds all of the rows in tbl_Tracks that have the corresponding ArtistID and TitleID. Now, working with that set of matching rows from tbl_Tracks, it finds the smallest TitleID among them, and it returns that value. Finally, it takes that value and it updates Title_ID in tbl_Disc_Tracks so that it has that value. Later, you'll be deleting all of the duplicate rows in tbl_Tracks, leaving only the row that has this smallest TrackID; so, this is the value you want in tbl_Disc_Tracks.TitleID. Thus, we've met your first requirement; we've updated tbl_Disc_Tracks with the minimum trackID before deleting the duplicates from tbl_Tracks.
Now, as for deleting the duplicates. You have decided that you want to delete the rows that do not have the minimum trackID. So we'll first find the minimum trackID, as we did before:
SELECT TitleID, ArtistID, MIN(TrackID)
FROM tbl_Tracks
GROUP BY TitleID, ArtistID
Next, our basic delete statement:
DELETE tbl_Tracks
WHERE TitleID ????
And how do we finish that WHERE clause? We're looking to delete rows with TitleID's that are greater than the minimum titleID, so:
WHERE TitleID > (SELECT MIN(TrackID)
FROM tbl_Tracks AS A
WHERE tbl_Tracks.TitleID = A.TitleID
AND tbl_Tracks.ArtistID = A.ArtistID)
See once again how we've established a correlation between our query that selects the minimum value and our delete query? This meets your second requirement: it deletes all of the duplicates, saving only the rows that have the minimum TrackID.
So, give it a shot. Copy your data to a test environment and ensure that it does what you're trying to do. I'm sure you'll see that it does.
Thanks,
Matthew Burr
Edited by - mdburr on 10/27/2002 12:13:39 AM
October 27, 2002 at 4:43 am
quote:
WHERE tbl_Disc_Tracks.TitleID = tbl_Tracks.TitleID AND tbl_Disc_Tracks.ArtistID = tbl_Tracks.ArtistID)
these two tables are only related by trackID. in tbl_Disc_Tracks there is no artist or title ID only a trackID.
October 27, 2002 at 3:45 pm
Okay. I don't know how I missed that. Anyway, let's start by filtering down the set of rows you want to UPDATE:
UPDATE tbl_Disc_Tracks
SET TrackID = ????
WHERE EXISTS (SELECT *
FROM tbl_Tracks AS A
JOIN tbl_Tracks AS B
ON A.TitleID = B.TitleID
AND A.ArtistID = B.ArtistID
AND A.TrackID <> B.TrackID
WHERE A.TrackID = tbl_Disc_Tracks.TrackID)
I've used a somewhat different approach. I'm identifying duplicates as situations where rows exist that have the same TitleID and ArtistID but different TrackIDs when I do a self-join on tbl_Tracks. I'm correlating this back to the tbl_Disc_Tracks that we're updating.
Now, the real part of your question: what do we fill-in for the SET?
How about this:
SET TrackID = (SELECT MIN_TRACK_ID
FROM tbl_Tracks AS A
JOIN (SELECT TitleID, ArtistID, MIN(TrackID)
FROM tbl_Tracks
GROUP BY TitleID, ArtistID) AS B (TitleID, ArtistID, MIN_TRACK_ID)
ON A.TitleID = B.TitleID
AND A.ArtistID = B.ArtistID
WHERE A.TrackID = tbl_Disc_Tracks.TrackID)
So, we're finding the row in tbl_Tracks that has the same TrackID as the given row in tbl_Disc_Tracks. We're then using the TitleID and ArtistID row to join back to a subquery on tbl_Tracks that groups all the records by TitleID and ArtistID and provides their minimum TrackID. We're feeding this minimum trackID back to the SET clause.
The DELETE statement I provided should still be fine.
Sorry for the confusion; I hope this solution works for you.
Matthew Burr
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply