May 31, 2005 at 4:54 am
que 1:if a table is populated then how to remove duplicate rows from the table in a single query in sql server?
Que 2: how to get top nth (e.g. 5th) record from a table in a single line query in sqlserver?
May 31, 2005 at 5:09 am
Counterquestion:
Q1: You are aware of SQL Server's Books Online?
Q2: You have searched this forum for answers?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
May 31, 2005 at 5:13 am
i hv search this problem in sql help but could not find any result
May 31, 2005 at 5:24 am
Well, then...
Q1: Have a look at BOL for NOT EXISTS. You don't remove duplicates with it, but rather insert only those rows that are not already in your target table.
Q2: Several ways to skin that cat
Probably the "best" solution is a procedure like this:
USE PUBS
GO
CREATE PROC dbo.GetNthLatestEntry (@NthLatest INT)
AS
SET NOCOUNT ON
BEGIN
CREATE TABLE #Entry
(
ID INT PRIMARY KEY NOT NULL IDENTITY(1,1)
, Entry DATETIME NOT NULL
 
INSERT INTO #Entry (Entry) SELECT hire_date FROM employee ORDER BY hire_date DESC
SELECT
Entry
FROM
#Entry
WHERE
ID = @NthLatest
DROP TABLE #Entry
END
SET NOCOUNT OFF
GO
EXEC dbo.GetNthLatestEntry 3
DROP PROCEDURE dbo.GetNthLatestEntry
This one is too expensive:
SELECT TOP 1
hire_date
FROM
employee
WHERE
hire_date
NOT IN(
SELECT TOP 2
hire_date
FROM
employee
ORDER BY
hire_date DESC)
ORDER BY
hire_date DESC
And finally the ANSI SQL way of doing things:
SELECT
e1.hire_date
FROM
employee AS e1
INNER JOIN
employee AS e2
ON
e1.hire_date <= e2.hire_date
GROUP BY
e1.hire_date
HAVING COUNT(DISTINCT e2.hire_date) = 3
which is even more expensive than TOP, but portable...
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
May 31, 2005 at 5:36 am
thanks Frank,
but i hv heared that it is possible to remove duplicate rows in oracle as oracle keeps the track of each row and generate an unique rowid, and with the help of that rowid we can remove duplicate rows.
and according to u the possible is solution is to make a new table insert distinct rows from given table then remove all rows from parent table then insert rows from child table to parent table and then drop child table.
is it so?
May 31, 2005 at 5:48 am
Kirt,
Can you DELETE duplicate rows once they are INSERTED: YES,
Can you also set-up PRIMARY KEY to IGNORE DUPLICATE: YES
Question: IF you can ID them as DUPLICATE and NOT INSERT them WHY NOT?
I personally prefer to ONLY INSERT RECORDS that I need and NOT DELETE those I don't AFTER INSERTING them
Good Hunting!
AJ Ahrens
webmaster@kritter.net
May 31, 2005 at 5:49 am
If you have a unique column, this is also possible in SQL Server. And no, what I meant is that NOT EXISTS checks whether the keycolumn already exists in the target table or not. If it doesn't exists, you can take some action like INSERT that row. And INSERT works only on one table at a time. But I think I've misunderstood you. If you're only concerned about removing duplicates from a table, you should really search the script section here.
A very basic idea on your duplicates you might get using this query.
use northwind
select customerid, count(*)
from orders
group by customerid
having count(*)>1
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
May 31, 2005 at 5:51 am
Can you also set-up PRIMARY KEY to IGNORE DUPLICATE: YES
AJ, are you sure on this???
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
May 31, 2005 at 3:39 pm
Frank,
I have not done it but YES (scarily enough), I read it in BOL...
Good Hunting!
AJ Ahrens
webmaster@kritter.net
June 1, 2005 at 12:53 am
My understanding is that a PRIMARY KEY in SQL Server does never allow duplicates and no NULLs. When you read BOL at CREATE INDEX about IGNORE_DUP_KEY, you'll see it talks about indexes, however a PRIMARY KEY is a constraint no index. I would be very interested in the text passage from BOL. Can you post it?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
June 1, 2005 at 7:02 am
I could post it but it would be in french so I'll pass on this occasion, but I created this little script.
create table #a
(
A int not null
)
GO
Insert into #a (a) values (1)
--success
GO
CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#A (A) WITH IGNORE_DUP_KEY
Go
Insert into #a (a) values (1)
--fails with warning that duplicate entry was ignored
Select * from #a
1
(1 row(s) affected)
drop table #a
June 1, 2005 at 7:19 am
It's interesting to see what happens when you change your script to
A int not null
constraint pk_test PRIMARY KEY NONCLUSTERED(a)
)
and let it run and
A int not null
constraint pk_test PRIMARY KEY NONCLUSTERED(a)
)
...
--CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#a (A) WITH IGNORE_DUP_KEY
Watch the error messages. And in both cases the INSERT fails.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
June 1, 2005 at 7:23 am
I was just prooving your point.. you can't have duplicate values on the clustered index...
June 1, 2005 at 7:33 am
I know I found it just interesting that both cases return a different error message but yield the same result.
you can't have duplicate values on the clustered index
To be picky here, "on the UNIQUE CLUSTERED INDEX". Though this is via the uniquifier also true for CLUSTERED INDEXES, this is done internally and therefore out of reach.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
June 1, 2005 at 11:37 am
If you have a column ID that is a unique row identifier (either through identity/rowguid/your own algorithm) you can get rid of duplicates by doing something like:
----------------------------------------------
create table duptest
(
id int identity(0,1),
data varchar(10)
)
go
insert into duptest values ('a')
insert into duptest values ('aaa')
insert into duptest values ('aaa')
insert into duptest values ('aaaaa')
insert into duptest values ('aaaaa')
insert into duptest values ('aaaaa')
go
select * from duptest
go
delete from duptest
where id not in
(
select max(id) from duptest
group by data
)
go
select * from duptest
go
----------------------------------------------
WARNING!!!! BACK UP YOUR DATABASE BEFORE TRYING STUFF LIKE THIS!
There. Got that out of my system. Anyway, this will leave one record for each combination of whatever columns you have that are being duplicated.
Steve G.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply