March 7, 2004 at 8:39 am
Hi,
I made a boo-boo with an import query and would like some help. I maintain a hockey stats database, and for somereason the IS NULL command did not work on my prod server (worked great on my test system) and as a result, I imported the stats of some 450 players in DUPLICATE.
So, I am hoping that I can write a query which will allow me to identify the duplicates and delete them. The stats table has a team_id, player_id and season_id. So I know that if there is more than 1 instance of the same values for those 3 columns, it is a duplicate.
I am ssuming I need to write a query that groups on team_id, player_id and season_id where the count for each of those items is > 1. But I have had no luck so far.
Any help appreciated
JW
March 7, 2004 at 10:58 am
There are other cooler ways to do this but the fastest way might be to do a select distinct (those 3 columns) into a second table, dump the original table and re-import from the second table into the original
------------
Ray Higdon MCSE, MCDBA, CCNA
March 7, 2004 at 7:47 pm
Discussion of some alternatives
March 7, 2004 at 7:48 pm
http://www.databasejournal.com/features/mssql/article.php/1438651
duh!! Sending the link might help.
March 7, 2004 at 8:02 pm
Hmm, that was too easy. I'm always looking for the hard way!
JW
March 8, 2004 at 7:45 am
You could save yourself a lot of heartache if you put a unique index (constraint) on team_id, player_id and season_id then you could not insert duplicates without an error.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
March 8, 2004 at 9:39 am
Deleting duplicates can be done in a single T-SQL command,
when the table has two unique keys that don't have any columns in common.
For a table with only one unique key, add an identity column,
perform the delete, then remove the identity column.
create table sample_tbl
(
id int identity,
key1 int,
key2 int,
attr1 varchar (12),
attr2 varchar (12),
attr3 varchar (12)
)
Remove the duplicate rows, keeping only one of each
delete a
from dbo.sample_tbl as a
join dbo.sample_tbl as b
on a.key1 = b.key1
and a.key2 = b.key2
where a.id < b.id -- "<" keeps the row with the highest identity value
-- ">" keeps the row with the lowest identity value
If you want to review the data before deleting...
select num_dups, *
from sample_tbl as a
-- get duplicate keys
join (select key1, key2, count (*) as num_dups
from dbo.sample_tbl
group by key1, key2
having count (*) > 1
) as b
on a.key1 = b.key1
and a.key2 = b.key2
order by a.key1, a.key2
March 11, 2004 at 9:02 pm
Hi !
If the table structure is like.
Tbl1
(
pid integer,
.... some fields
)
then u can delete the duplicates as
delete from tbl1 where pid not in (select max(pid) from tbl1)
If u don;t have one then u can just add one identity column with seed 1 and it will work like pid.
It shud work...
--Vijendra
"Beginning is Half Done"
March 12, 2004 at 3:27 am
"delete from tbl1 where pid not in (select max(pid) from tbl1)"
Wouldn't that delete all rows exept the one with the max PID?
------------
Ray Higdon MCSE, MCDBA, CCNA
March 12, 2004 at 5:39 am
I think that was needed !!!
regards
Vijendra...
"Beginning is Half Done"
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply