Need to remove duplicate records and get distinct values

Question

Post reply

Need to remove duplicate records and get distinct values

Viewing 15 posts - 16 through 30 (of 59 total)

You must be logged in to reply to this topic. Login to reply

Telammica Ten Centuries Points: 1288 More actions · Answer 1

Telammica

Ten Centuries

Points: 1288

July 16, 2008 at 6:56 am

#843733

just insert rest of values into table too.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

Dugi (7/16/2008)
Hi Jeff!
Your code here just selecting and ordering asc the data inside the table it doesn't do any elimination of duplicate records!
Can you explain little bit this!
Dugi

Heh... Sure! I shouldn't write code before coffee! I screwed up... I'm deleting the post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

Dugi (7/16/2008)
Hi Jeff!
Your code here just selecting and ordering asc the data inside the table it doesn't do any elimination of duplicate records!
Can you explain little bit this!
Dugi

I believe that Telammica's code will do the trick... l appologize for my bit of bad code. More coffee is on the way! :hehe:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Ok... now that I've had a bit of coffee... 😛

In the original problem posted, San didn't care about which name was returned so just use the MIN or Max name for each ID posted.

SELECT ID,MAX(Name) AS Name

FROM dbo.JBMTest

GROUP BY ID

You can do the ranking in SQL Server 2000 a couple of ways... some are a lot faster than others...

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

karthik M One Orange Chip Points: 29139 More actions · Answer 5

karthik M

One Orange Chip

Points: 29139

July 16, 2008 at 7:45 am

#843761

Jeff,

how about my code ?

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 6

which one is good when we think performance ?

karthik

karthik M One Orange Chip Points: 29139 More actions · Answer 7

Because i saw more than 4 answer for this post so far.

karthik

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

karthikeyan (7/16/2008)
Jeff,
how about my code ?

What size column do you thing NAME = '' will make? Also, look at the update... which NAME shall it use to do the update?

And, just and FYI.... the basic form of update that you have used is an undocumented form that, under just the right conditions, will cause a 30 second query to slam a 4 CPU box into the wall for hours. Hard to duplicate the conditions for that, but I've made repairs for that several times. Here's the basis of what I'm saying...

--===== Wrong way to do a "Joined" update

UPDATE TableA

SET SomeTableAColumn = b.SomeTableBColumn

FROM TableB b

WHERE TableA.SomeOtherColumn = b.SomeOtherColumn

--===== Right way to do a "Joined" update

UPDATE TableA

SET SomeTableAColumn = b.SomeTableBColumn

FROM TableB b

INNER JOIN TableA a

ON a.SomeOtherColumn = b.SomeOtherColumn

In order for it to be "legal", TableA must also be in the FROM clause or the huge performance problem could take place. General rule to follow on joined updates is that the object of the update must also be in the FROM clause.

The hard part of this code is that it's very difficult to prove... I've not been able to make a test case using repeatable generated data that causes the problem every time so it can be studied. I only know that when I fix these types of updates that are real performance problems, the problem goes away.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

karthikeyan (7/16/2008)
which one is good when we think performance ?

Try it... let us know...

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

-- Column "ID" has a range of 1 to 50,000 non-unique numbers

-- Column "Name" has a range of "AA" to "ZZ" non-unique 2 character stringsontains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

RowNum = IDENTITY(INT,1,1),

ID = ABS(CHECKSUM(NEWID()))%50000+1,

Name = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

-- Takes about 1 second to execute.

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomeID)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

karthikeyan (7/15/2008)
use distinct keyword

Interesting solution - care to elaborate?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

San (7/15/2008)
Hi
I am sorry if someone has already posted this question and am asking again.
This problem is eating my head, I have a table
Table 1 -
1br
1er
2rt
3yh
2uj
3iu
3ol
Now I want only 1 row for each unique id, that row could be any row but id shouldn't repeat.
1br
2rt
3yh
or
1er
2uj
3ol
Thanks

San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

d_sysuk SSC Eights! Points: 809 More actions · Answer 12

Okay so the posters had part I published several times, with various answers... using min/max, a subquery, a temp-table.

The perfm. analysis can be made easily by Sam/others just run them across (expect the group by will be higher perf than temp tables, subquery join - but thats just an untested hypothesis - also depends on indexing of the subtable etc...[e.g. ID - clustered, value - non-clustered, or would a covering index be better here.. etc, I'm sure it will all affect the results])

Anyone up for programming PART II of the request from Sam?

I certainly know how to do it, want to see who comes up with a nice solution [Mine would involve two loops].

[SQL Server 2005 solution is neat and tidy using rank function].

Sanchita Old Hand Points: 399 More actions · Answer 13

San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.

Well initially I wanted just 1 row for each id, could be any row.

But yes now I want to control as to say i want the 2nd / 3rd... datavalue for each id and if an id doesnt have a 2nd/3rd value then the last value it has...

i hope you getting me..

also can someone provide me more links on 'Over' ? not the msdn ones..

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

San (7/16/2008)
San, do you have any more information regarding this puzzle? For instance, are you aiming to change an existing table to meet your criteria, or to return a satisfactory result set from an existing table? It's not yet clear from your posts.
Well initially I wanted just 1 row for each id, could be any row.
But yes now I want to control as to say i want the 2nd / 3rd... datavalue for each id and if an id doesnt have a 2nd/3rd value then the last value it has...
i hope you getting me..
also can someone provide me more links on 'Over' ? not the msdn ones..

So...you want a query to give you results from a table, and your primarily interested on how to do this using 'OVER'? Which version of SQL Server are you using?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

d_sysuk SSC Eights! Points: 809 More actions · Answer 15

This is the SQL Server 2005 version:

select distinct id,value, rank() over (partition by id order by value)

from

(select '1' id,'br' value

union

select '1','er'

union

select '2','rt'

union

select '2','uj'

union

select '3','iu'

union

select '3','ol') data

Output:

idvalue(No column name)

1br1

1er2

2rt1

2uj2

3iu1

3ol2

As said the SQL 2000 version, is a bit messy [waiting to check requirements - there is no OVER clause in SQL 2000 or RANK function].