October 13, 2008 at 8:40 am
I am using an inner join between two source tables (Album and Tracks) to insert records into a destination table (NewTracks). The statement goes something like this:
INSERT NewTracks
(AlbumID, TrackTitle, TrackPosition, TrackDuration)
SELECT
distinct Album.ProductID, Tracks.Title, Tracks.Position, Tracks.duration
FROM
Album
INNER JOIN
Tracks on Album.ProductID = Tracks.ProductID
This kind of works, except that in some cases there is more than one row in the Album table per album. For example, for a given album, the Album table would have a separate row for the main artist, the producer and the remixer, but each of these rows would have the same Album.ProductID. So in this case, the Insert command ends up having multiple sets of tracks in the NewTracks table, one for each of these album contributors.
Is there a way of doing a kind of 'distinct join' so that the join inserts just one set of tracks per album, regardless of how many rows there are in the Album table?
October 13, 2008 at 8:46 am
thomas.lemesurier (10/13/2008)
This kind of works, except that in some cases there is more than one row in the Album table per album. For example, for a given album, the Album table would have a separate row for the main artist, the producer and the remixer, but each of these rows would have the same Album.ProductID.
That sounds like a flawed database design. Without seeing the structure, I can't say for sure, but is sounds like that needs to be normalised a bit. From the sound of things, it's a completely unnormalised design, which leads to problems like this.
You can work around it like this, but it would definitely be better if you could fix the design, as problems like this will keep cropping up.
SELECT UniqueProducts.ProductID , Tracks.Title, Tracks.Position, Tracks.duration FROM
(SELECT distinct ProductID FROM Album) UniqueProducts
INNER JOIN
Tracks on UniqueProducts.ProductID = Tracks.ProductID
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 13, 2008 at 8:48 am
INSERTNewTracks
(
AlbumID,
TrackTitle,
TrackPosition,
TrackDuration
)
SELECTProductID,
Title,
Position,
Duration
FROM(
SELECTAlbum.ProductID,
Tracks.Title,
Tracks.Position,
Tracks.duration,
ROW_NUMBER() OVER (PARTITION BY Album.ProductID ORDER BY ...) AS RecID
FROMAlbum
INNER JOINTracks ON Tracks.ProductID = Album.ProductID
) AS d
WHERERecID = 1
N 56°04'39.16"
E 12°55'05.25"
October 13, 2008 at 11:13 am
Thanks Gail. I know what you're saying about data normalisation. The issue here is that these are raw data dumps (from another database with a completely different structure) that are being imported from huge CSV files. A SQL DTS package is bringing the data into intermediate tables and then I'm running scripts to get the data from there into the destination tables. It's a one-off import and so I kind of need to make the data fit this one time.
Thanks for your suggestion, looks like it'll do the trick. Thanks,
Tom
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply