sql queries

Question

Post reply

sql queries

kirtikhare

Valued Member

Points: 54
More actions
May 31, 2005 at 4:54 am

#167353

que 1:if a table is populated then how to remove duplicate rows from the table in a single query in sql server?
Que 2: how to get top nth (e.g. 5th) record from a table in a single line query in sqlserver?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Frank Kalis SSC Guru Points: 111183 More actions · Answer 1

Counterquestion:

Q1: You are aware of SQL Server's Books Online?

Q2: You have searched this forum for answers?

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

kirtikhare Valued Member Points: 54 More actions · Answer 2

i hv search this problem in sql help but could not find any result

Frank Kalis SSC Guru Points: 111183 More actions · Answer 3

Well, then...

Q1: Have a look at BOL for NOT EXISTS. You don't remove duplicates with it, but rather insert only those rows that are not already in your target table.

Q2: Several ways to skin that cat

Probably the "best" solution is a procedure like this:

USE PUBS

GO

CREATE PROC dbo.GetNthLatestEntry (@NthLatest INT)

AS

SET NOCOUNT ON

BEGIN

CREATE TABLE #Entry

(

ID INT PRIMARY KEY NOT NULL IDENTITY(1,1)

, Entry DATETIME NOT NULL

&nbsp

INSERT INTO #Entry (Entry) SELECT hire_date FROM employee ORDER BY hire_date DESC

SELECT

Entry

FROM

#Entry

WHERE

ID = @NthLatest

DROP TABLE #Entry

END

SET NOCOUNT OFF

GO

EXEC dbo.GetNthLatestEntry 3

DROP PROCEDURE dbo.GetNthLatestEntry

This one is too expensive:

SELECT TOP 1

hire_date

FROM

employee

WHERE

hire_date

NOT IN(

SELECT TOP 2

hire_date

FROM

employee

ORDER BY

hire_date DESC)

ORDER BY

hire_date DESC

And finally the ANSI SQL way of doing things:

SELECT

e1.hire_date

FROM

employee AS e1

INNER JOIN

employee AS e2

ON

e1.hire_date <= e2.hire_date

GROUP BY

e1.hire_date

HAVING COUNT(DISTINCT e2.hire_date) = 3

which is even more expensive than TOP, but portable...

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

kirtikhare Valued Member Points: 54 More actions · Answer 4

thanks Frank,

but i hv heared that it is possible to remove duplicate rows in oracle as oracle keeps the track of each row and generate an unique rowid, and with the help of that rowid we can remove duplicate rows.

and according to u the possible is solution is to make a new table insert distinct rows from given table then remove all rows from parent table then insert rows from child table to parent table and then drop child table.

is it so?

AJ Ahrens SSC-Insane Points: 20676 More actions · Answer 5

Kirt,

Can you DELETE duplicate rows once they are INSERTED: YES,

Can you also set-up PRIMARY KEY to IGNORE DUPLICATE: YES

Question: IF you can ID them as DUPLICATE and NOT INSERT them WHY NOT?

I personally prefer to ONLY INSERT RECORDS that I need and NOT DELETE those I don't AFTER INSERTING them

Good Hunting!

AJ Ahrens

webmaster@kritter.net

Frank Kalis SSC Guru Points: 111183 More actions · Answer 6

If you have a unique column, this is also possible in SQL Server. And no, what I meant is that NOT EXISTS checks whether the keycolumn already exists in the target table or not. If it doesn't exists, you can take some action like INSERT that row. And INSERT works only on one table at a time. But I think I've misunderstood you. If you're only concerned about removing duplicates from a table, you should really search the script section here.

A very basic idea on your duplicates you might get using this query.

use northwind

select customerid, count(*)

from orders

group by customerid

having count(*)>1

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Frank Kalis SSC Guru Points: 111183 More actions · Answer 7

Can you also set-up PRIMARY KEY to IGNORE DUPLICATE: YES

AJ, are you sure on this???

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

AJ Ahrens SSC-Insane Points: 20676 More actions · Answer 8

Frank,

I have not done it but YES (scarily enough), I read it in BOL...

Good Hunting!

AJ Ahrens

webmaster@kritter.net

Frank Kalis SSC Guru Points: 111183 More actions · Answer 9

My understanding is that a PRIMARY KEY in SQL Server does never allow duplicates and no NULLs. When you read BOL at CREATE INDEX about IGNORE_DUP_KEY, you'll see it talks about indexes, however a PRIMARY KEY is a constraint no index. I would be very interested in the text passage from BOL. Can you post it?

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

I could post it but it would be in french so I'll pass on this occasion, but I created this little script.

create table #a

(

A int not null

)

GO

Insert into #a (a) values (1)

--success

GO

CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#A (A) WITH IGNORE_DUP_KEY

Go

Insert into #a (a) values (1)

--fails with warning that duplicate entry was ignored

Select * from #a

1

(1 row(s) affected)

drop table #a

Frank Kalis SSC Guru Points: 111183 More actions · Answer 11

It's interesting to see what happens when you change your script to

A int not null

constraint pk_test PRIMARY KEY NONCLUSTERED(a)

)

and let it run and

A int not null

constraint pk_test PRIMARY KEY NONCLUSTERED(a)

)

...

--CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#a (A) WITH IGNORE_DUP_KEY

Watch the error messages. And in both cases the INSERT fails.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

I was just prooving your point.. you can't have duplicate values on the clustered index...

Frank Kalis SSC Guru Points: 111183 More actions · Answer 13

I know I found it just interesting that both cases return a different error message but yield the same result.

you can't have duplicate values on the clustered index

To be picky here, "on the UNIQUE CLUSTERED INDEX". Though this is via the uniquifier also true for CLUSTERED INDEXES, this is done internally and therefore out of reach.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

aureolin Hall of Fame Points: 3520 More actions · Answer 14

If you have a column ID that is a unique row identifier (either through identity/rowguid/your own algorithm) you can get rid of duplicates by doing something like:

----------------------------------------------

create table duptest

(

id int identity(0,1),

data varchar(10)

)

go

insert into duptest values ('a')

insert into duptest values ('aaa')

insert into duptest values ('aaaaa')

go

select * from duptest

go

delete from duptest

where id not in

(

select max(id) from duptest

group by data

)

go

select * from duptest

go

----------------------------------------------

WARNING!!!! BACK UP YOUR DATABASE BEFORE TRYING STUFF LIKE THIS!

There. Got that out of my system. Anyway, this will leave one record for each combination of whatever columns you have that are being duplicated.

Steve G.