March 12, 2009 at 7:12 pm
Hi when i heard the requirement it sounded me simple as i am a newbie but as am getting deeper and deeper its gettign complicated can any 1of you please help me out ...
req :-
there is a table wiht
Col1 Col2
John trivolta
trivolta John
Kate winslet
as you can see there are 2 johns but the value is swaped within the 2 coloumns so i need remove 1 john (trivolta John)
please help me out wiht the query...
Thanks
March 12, 2009 at 9:50 pm
What have you tried so far?
Tim Mitchell, Microsoft Data Platform MVP
Data Warehouse and ETL Consultant
TimMitchell.net | @Tim_Mitchell | Tyleris.com
ETL Best Practices
March 13, 2009 at 4:26 am
While this solution is probably not the most efficient one, it should work provided you have some kind of row_id to know the difference between different rows.
DELETE FROM #pom WHERE id = ANY (SELECT DISTINCT(a.id) FROM #pom a INNER JOIN #pom b ON (((a.var1 = b.var2 AND a.var2 = b.var1) OR (a.var1 = b.var1 AND a.var2 = b.var2)) AND (a.id < b.id)))
of course - you have to change table_name and columns name but that's just obvious.
March 14, 2009 at 8:24 am
Hi Nimdil,
I tried your query, and is perfect.
Anyway to delete duplicates either you should have a row number or one key column, right?
Dreamsz - Can you tell us how exactly you want the query(with out using row number or key)?
Regards,
Ramu
Ramu
No Dream Is Too Big....!
March 15, 2009 at 12:02 pm
--TO BE DELETED; SEE BELOW--
It would be probably a lot slower without primary key in the table. And why you have no primary key in the table? Primary keys are strictly fundmental.
Let's see...
if you can perform a script with a loop this should work (provided that your table has col1 and col2 as only columns):
DECLARE @i INT;
DECLARE @col1 VARCHAR(100);
DECLARE @col2 VARCHAR(100); --or whatever length
DECLARE curs CURSOR FOR SELECT col1, col2, count(1) FROM table GROUP BY col1, col2;
OPEN curs;
FETCH NEXT FROM curs INTO @col1, @col2, @i;
WHILE (@@FETCH_STATUS = 0)
BEGIN
DELETE TOP(@i - 1) FROM table WHERE @col1 = col1 AND @col2 = col2;
DELETE FROM table WHERE @col1 = col2 AND @col2 = col1;
FETCH NEXT FROM curs INTO @col1, @col2, @i;
END
CLOSE curs;
DEALLOCATE curs;
But I think it's kind of slow. You can probably also try with temporary table, s.t. like this:
SELECT 'greater' = CASE WHEN col1 1;
TRUNCATE TABLE table; -- or DELETE FROM table;
INSERT INTO table (col1, col2) SELECT greater, smaller FROM #temp1
Note, that the second code is a bit of guess as I don't have access to SQL Server until wednesday. You should probably test it and - probably - correct a mistake or two. Also - it's a bit of tricky, as you literally move all data to temporary table. You should probably perform the operation within transaction. And it may be a bit tricky on really HUGE table.
March 15, 2009 at 12:04 pm
It would be probably a lot slower without primary key in the table. And why you have no primary key in the table? Primary keys are strictly fundmental.
Let's see...
if you can perform a script with a loop this should work (provided that your table has col1 and col2 as only columns):
DECLARE @i INT;
DECLARE @col1 VARCHAR(100);
DECLARE @col2 VARCHAR(100); --or whatever length
DECLARE curs CURSOR FOR SELECT col1, col2, count(1) FROM table GROUP BY col1, col2;
OPEN curs;
FETCH NEXT FROM curs INTO @col1, @col2, @i;
WHILE (@@FETCH_STATUS = 0)
BEGIN
DELETE TOP(@i - 1) FROM table WHERE @col1 = col1 AND @col2 = col2;
DELETE FROM table WHERE @col1 = col2 AND @col2 = col1;
FETCH NEXT FROM curs INTO @col1, @col2, @i;
END
CLOSE curs;
DEALLOCATE curs;
But I think it's kind of slow. You can probably also try with temporary table, s.t. like this:
SELECT 'greater' = CASE WHEN col1 < col2 THEN col2 ELSE col1,
'smaller' = CASE WHEN col2 < col1 THEN col2 ELSE col1,
count(1) as no INTO #temp1
FROM table GROUP BY CASE WHEN col1 < col2 THEN col2 ELSE col1,
CASE WHEN col2 < col1 THEN col2 ELSE col1
HAVING COUNT(1) > 1;
TRUNCATE TABLE table; -- or DELETE FROM table;
INSERT INTO table (col1, col2) SELECT greater, smaller FROM #temp1
Note, that the second code is a bit of guess as I don't have access to SQL Server until wednesday. You should probably test it and - probably - correct a mistake or two. Also - it's a bit of tricky, as you literally move all data to temporary table. You should probably perform the operation within transaction. And it may be a bit tricky on really HUGE table.
March 15, 2009 at 3:25 pm
Thanks nimdil for the query as well for the time...
Nimdil the query is good but the thing is it is deleting both the duplicate rows whereas we need to get rid of 1 duplicate row and keep 1 ....
March 18, 2009 at 2:56 am
Can you explain why you can't just add primary key for the table? Every table -should- have primary key. It is even - funny thing, really - written in rule2 of Codd's Rules of Relational DB. Please - seriously consider adding PK - even simple integer column (in case of unbelievably huge db you can use bigint).
I admit that my suggestions are flawed. I will try to correct them.
March 18, 2009 at 3:05 am
Edit:- I posted a query which was not correct. I have removed that query. I could not delete this post.
"Keep Trying"
March 18, 2009 at 3:35 am
If you can afford to create an identity column/ transfer rows to an temp table
with identity column then this can work
create table #t1 ( id int identity(1,1) ,cid1 varchar(100), cid2 varchar(200))
insert into #t1(cid1,cid2) select 'john','candy'
insert into #t1(cid1,cid2) select 'candy','john'
insert into #t1(cid1,cid2) select 'irfan','khan'
delete x from #t1 x,#t1 y where x.cid2 = y.cid1 and x.id > y.id
Select * from #t1
drop table #t1
Basically your pblm is u need to differentiate which one you want to delete.
( Note: SQL Server doesnt understand that 'john' row came 1st and hence it needs to retained. )
Hey, just forgot. Are you using SQL 2005 then row_number shd solve ur problem easily.
if its 2000 then i guess the above one is the solution
Regards,
Raj
March 18, 2009 at 7:37 am
I think that you can run out of memory with temporary table.
March 18, 2009 at 8:08 am
It heavily depends on how much RAM you have and how many users are going to access, how much of data on your query..what kind of settings on SQL Server etc..Too many things to put down under this context..
Btw, It was a small query issue..why bring such a huge topic our here? 😉
Regards,
Raj
March 18, 2009 at 9:12 am
for 2005 this one shd work..
create table #t1(col1 varchar(200),col2 varchar(200))
insert into #t1 select 'john','trovolta'
insert into #t1 select 'trovolta','john'
insert into #t1 select 'carlos','moya'
delete x from
( select row_number() OVER (order by col1,col2) as id,col1,col2 from #t1 ) x,
( select row_number() OVER (order by col1,col2) as id,col1,col2 from #t1) Y
where x.col2 = y.col1 and x.id > y.id
select * from #t1
Regards,
Raj
March 18, 2009 at 9:36 am
create table #t1(col1 varchar(200),col2 varchar(200))
insert into #t1 select 'john','trovolta'
insert into #t1 select 'trovolta','john'
insert into #t1 select 'carlos','moya'
insert into #t1 select 'john','candy'
insert into #t1 select 'candy','john'
delete x from
( select row_number() OVER (order by getdate()) as id,col1,col2 from #t1 ) x,
( select row_number() OVER (order by getdate()) as id,col1,col2 from #t1) Y
where x.col2 = y.col1 and x.id > y.id
This one is better than the previous one as it ensures the order in which it is inserted
ie it deletes the duplicate inserted at the last and not the 1st one. So its not influencd by the aplbhabetical order of the data
this one deletes candy john and not john candy like the last one.
Regards,
Raj
March 19, 2009 at 8:47 am
This doesn't work. If you add the following to the list of inserts and try you'll see why:
insert into #t1 select 'travolta', 'carlos'
insert into #t1 select 'sainz', 'john'
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply