sql queries

  • que 1:if a table is populated then how to remove duplicate rows from the table in a single query in sql server?

    Que 2: how to get top nth (e.g. 5th) record from a table in a single line query in sqlserver?

     

     

  • Counterquestion:

    Q1: You are aware of SQL Server's Books Online?

    Q2: You have searched this forum for answers?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • i hv search this problem in sql help but could not find any result

  • Well, then...

    Q1: Have a look at BOL for NOT EXISTS. You don't remove duplicates with it, but rather insert only those rows that are not already in your target table.

    Q2: Several ways to skin that cat

    Probably the "best" solution is a procedure like this:

    USE PUBS

    GO

    CREATE PROC dbo.GetNthLatestEntry (@NthLatest INT)

    AS

    SET NOCOUNT ON

    BEGIN

     CREATE TABLE #Entry

     (

      ID INT PRIMARY KEY NOT NULL IDENTITY(1,1)

      , Entry DATETIME NOT NULL

    &nbsp

     INSERT INTO #Entry (Entry) SELECT hire_date FROM employee ORDER BY hire_date DESC

     SELECT

      Entry

     FROM

      #Entry

     WHERE

      ID = @NthLatest

     DROP TABLE #Entry

    END

    SET NOCOUNT OFF

    GO

    EXEC dbo.GetNthLatestEntry 3

    DROP PROCEDURE dbo.GetNthLatestEntry

    This one is too expensive:

    SELECT TOP 1

     hire_date

    FROM

     employee

    WHERE

     hire_date

     NOT IN(

      SELECT TOP 2

       hire_date

      FROM

       employee

      ORDER BY

       hire_date DESC)

    ORDER BY

     hire_date DESC

    And finally the ANSI SQL way of doing things:

    SELECT

     e1.hire_date

    FROM

     employee AS e1

    INNER JOIN

     employee AS e2

    ON

     e1.hire_date <= e2.hire_date

    GROUP BY

     e1.hire_date

    HAVING COUNT(DISTINCT e2.hire_date) = 3

    which is even more expensive than TOP, but portable...

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • thanks Frank,

    but i hv heared that it is possible to remove duplicate rows in oracle as oracle keeps the track of each row and generate an unique rowid, and with the help of that rowid we can remove duplicate rows.

    and according to u the possible is solution is to make a new table insert distinct rows from given table then remove all rows from parent table then insert rows from child table to parent table and then drop child table.

    is it so?

  • Kirt,

    Can you DELETE duplicate rows once they are INSERTED:  YES,

    Can you also set-up PRIMARY KEY to IGNORE DUPLICATE:  YES

    Question:  IF you can ID them as DUPLICATE and NOT INSERT them WHY NOT? 

    I personally prefer to ONLY INSERT RECORDS that I need and NOT DELETE those I don't AFTER INSERTING them



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • If you have a unique column, this is also possible in SQL Server. And no, what I meant is that NOT EXISTS checks whether the keycolumn already exists in the target table or not. If it doesn't exists, you can take some action like INSERT that row. And INSERT works only on one table at a time. But I think I've misunderstood you. If you're only concerned about removing duplicates from a table, you should really search the script section here.

    A very basic idea on your duplicates you might get using this query.

    use northwind

    select customerid, count(*)

    from orders

    group by customerid

    having count(*)>1

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Can you also set-up PRIMARY KEY to IGNORE DUPLICATE:  YES

    AJ, are you sure on this???

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank,

    I have not done it but YES (scarily enough), I read it in BOL...



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • My understanding is that a PRIMARY KEY in SQL Server does never allow duplicates and no NULLs. When you read BOL at CREATE INDEX about IGNORE_DUP_KEY, you'll see it talks about indexes, however a PRIMARY KEY is a constraint no index. I would be very interested in the text passage from BOL. Can you post it?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I could post it but it would be in french so I'll pass on this occasion, but I created this little script.

    create table #a

    (

    A int not null

    )

    GO

    Insert into #a (a) values (1)

    --success

    GO

    CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#A (A) WITH IGNORE_DUP_KEY

    Go

    Insert into #a (a) values (1)

    --fails with warning that duplicate entry was ignored

    Select * from #a

    1

    (1 row(s) affected)

    drop table #a

  • It's interesting to see what happens when you change your script to

    A int not null

    constraint pk_test PRIMARY KEY NONCLUSTERED(a)

    )

    and let it run and

    A int not null

    constraint pk_test PRIMARY KEY NONCLUSTERED(a)

    )

    ...

    --CREATE UNIQUE CLUSTERED INDEX PK_a ON dbo.#a (A) WITH IGNORE_DUP_KEY

    Watch the error messages. And in both cases the INSERT fails.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I was just prooving your point.. you can't have duplicate values on the clustered index...

  • I know I found it just interesting that both cases return a different error message but yield the same result.

    you can't have duplicate values on the clustered index

    To be picky here, "on the UNIQUE CLUSTERED INDEX". Though this is via the uniquifier also true for CLUSTERED INDEXES, this is done internally and therefore out of reach.

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • If you have a column ID that is a unique row identifier (either through identity/rowguid/your own algorithm) you can get rid of duplicates by doing something like:

    ----------------------------------------------

    create table duptest

    (

    id int identity(0,1),

    data varchar(10)

    )

    go

    insert into duptest values ('a')

    insert into duptest values ('aaa')

    insert into duptest values ('aaa')

    insert into duptest values ('aaaaa')

    insert into duptest values ('aaaaa')

    insert into duptest values ('aaaaa')

    go

    select * from duptest

    go

    delete from duptest

    where id not in

    (

    select max(id) from duptest

    group by data

    )

    go

    select * from duptest

    go

    ----------------------------------------------

    WARNING!!!! BACK UP YOUR DATABASE BEFORE TRYING STUFF LIKE THIS!

    There. Got that out of my system. Anyway, this will leave one record for each combination of whatever columns you have that are being duplicated.

    Steve G.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply