How to bypass duplicate records ?

Question

How to bypass duplicate records ?

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

isuckatsql SSChampion Points: 10381 More actions · Answer 1

GilaMonster (7/2/2009)
ifila (7/2/2009)
I will grab the first record - in this case 'livelink, java'
The one with the lowest ID?

yes

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

ifila, we want to help, really. But the sample data doesn't match the table definition. The desired output doesn't match the table definition. Based on that, I cannot write a tested query.

If I've interpreted what you want, this query should get the data that you want.

SELECT OriginalEmail, OriginalResume

(SELECT OriginalEmail, OriginalResume, Row_Number() OVER (Partition by OriginalEmail Order By ID) as RowNo

FROM dbo.resumein) sub

Where RowNo = 1

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

isuckatsql SSChampion Points: 10381 More actions · Answer 3

I added a screen shot of my SQL DB.

All resumes are initially imported into dbo.emailaddress which has one big field 'emailaddresstxt'.

READ desktop text folder - WRITE to dbo.emailaddress

At this point the emails are parsed out and written, with a copy of the resume to dbo.outputresume3

READ dbo.emailaddress - WRITE parsed data to dbo.outputresume3

Apologies for my lack of clarity !

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

All we're asking for is the definitions of all tables involved (as CREATE TABLE statements), sample data for those tables (as insert statements), an example of the results you want, based on the sample data and a explanation of the rules involved.

Do the queries we provided do what you want? If not, please provide what I've requested above.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

isuckatsql SSChampion Points: 10381 More actions · Answer 5

I just tested Jeffrey's solution and it works ! 🙂

Many Thanks to both of you, for your assistance !

isuckatsql SSChampion Points: 10381 More actions · Answer 6

I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?

It took 35 minutes to process 600 000 records !

How do i run this query so that it uses resources on a different drive ?

It was rapidly using ALL my 'C' drive space !

Thanks

Jeffrey Williams SSC Guru Points: 90304 More actions · Answer 7

ifila (7/2/2009)
I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?
It took 35 minutes to process 600 000 records !
How do i run this query so that it uses resources on a different drive ?
It was rapidly using ALL my 'C' drive space !
Thanks

Is that returning 600,000 rows - or does the table contain 600,000 rows?

As for running out of space on your C: drive, that is going to happen because SSMS has to put the output results somewhere. This ends up being your temp directory - and if you want it redirected, you need to move your temp directory to another drive.

The query that I provided is not filtered in any way other than eliminating those rows you said you didn't want. If that equates to more than 600,000 rows - then it is going to take that long to return the results.

Oh, one more thing - if you had bothered with reviewing the article we mentioned and posted the table definitions (with indexes) we could offer better advise. As it is, we have no idea whether or not you even have any indexes on this table. If you don't have a clustered index on your ID, and a non-clustered index on your email1 column - I can't see how this query could be improved.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

isuckatsql SSChampion Points: 10381 More actions · Answer 8

Jeffrey Williams (7/2/2009)
ifila (7/2/2009)
I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?
It took 35 minutes to process 600 000 records !
How do i run this query so that it uses resources on a different drive ?
It was rapidly using ALL my 'C' drive space !
Thanks
Is that returning 600,000 rows - or does the table contain 600,000 rows? I had to stop after 600k, the DB has 1.4 Million records.
As for running out of space on your C: drive, that is going to happen because SSMS has to put the output results somewhere. This ends up being your temp directory - and if you want it redirected, you need to move your temp directory to another drive.
Thanks
The query that I provided is not filtered in any way other than eliminating those rows you said you didn't want. If that equates to more than 600,000 rows - then it is going to take that long to return the results.
Oh, one more thing - if you had bothered with reviewing the article we mentioned and posted the table definitions (with indexes) we could offer better advise. As it is, we have no idea whether or not you even have any indexes on this table. If you don't have a clustered index on your ID, and a non-clustered index on your email1 column - I can't see how this query could be improved.

I did review the article and believed what i provided in my attachment was a table definition. This shows the only index field i have. I also mentioned i was using FTI on the resume field.

Thanks

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 9

ifila (7/2/2009)
I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?
It took 35 minutes to process 600 000 records !
How do i run this query so that it uses resources on a different drive ?
It was rapidly using ALL my 'C' drive space !
Thanks

Are you saying that your database files (mdf/ldf) reside on the C drive and are growing to fill the entire disk ? What sort of disk io system have you ?

Clear Sky SQL
My Blog[/url]

isuckatsql SSChampion Points: 10381 More actions · Answer 10

No, i have :

SQL Server program files on - C drive

.ldf on an array on E drive

.mdf on an array on F drive

tempdb on an array on G drive

The query was filling up my C drive. It used 10 GB in 30 minutes, i only had 10 GB more available and my DB was 35 GB, so i did not want to use all my C drive space.

Thanks

Dave Ballantyne SSC-Dedicated Points: 33667 More actions · Answer 11

I would imagine that putting an index on the email column would do nothing but help with the performance as, internally, it will already have a ordered list to base the rownum on. As for it filling up your local disk , why do you want or need all the results in the SSMS results tab. Why not put them to a #table where they will be much easier to process to whatever your next step is.

Clear Sky SQL
My Blog[/url]

isuckatsql SSChampion Points: 10381 More actions · Answer 12

I am still trying to figure out the best solution !

It is a work in progress 🙂

The only reason i did not use a temp table is that i wanted to verify the results.

Since it is taking longer than i would like, i may have to wait until the user selects certain records and run this code on a much smaller record group or create the equivalent code in ASP.

Thanks