How to bypass duplicate records ?

  • GilaMonster (7/2/2009)


    ifila (7/2/2009)


    I will grab the first record - in this case 'livelink, java'

    The one with the lowest ID?

    yes

  • ifila, we want to help, really. But the sample data doesn't match the table definition. The desired output doesn't match the table definition. Based on that, I cannot write a tested query.

    If I've interpreted what you want, this query should get the data that you want.

    SELECT OriginalEmail, OriginalResume

    (SELECT OriginalEmail, OriginalResume, Row_Number() OVER (Partition by OriginalEmail Order By ID) as RowNo

    FROM dbo.resumein) sub

    Where RowNo = 1

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I added a screen shot of my SQL DB.

    All resumes are initially imported into dbo.emailaddress which has one big field 'emailaddresstxt'.

    READ desktop text folder - WRITE to dbo.emailaddress

    At this point the emails are parsed out and written, with a copy of the resume to dbo.outputresume3

    READ dbo.emailaddress - WRITE parsed data to dbo.outputresume3

    Apologies for my lack of clarity !

  • All we're asking for is the definitions of all tables involved (as CREATE TABLE statements), sample data for those tables (as insert statements), an example of the results you want, based on the sample data and a explanation of the rules involved.

    Do the queries we provided do what you want? If not, please provide what I've requested above.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I just tested Jeffrey's solution and it works ! 🙂

    Many Thanks to both of you, for your assistance !

  • I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?

    It took 35 minutes to process 600 000 records !

    How do i run this query so that it uses resources on a different drive ?

    It was rapidly using ALL my 'C' drive space !

    Thanks

  • ifila (7/2/2009)


    I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?

    It took 35 minutes to process 600 000 records !

    How do i run this query so that it uses resources on a different drive ?

    It was rapidly using ALL my 'C' drive space !

    Thanks

    Is that returning 600,000 rows - or does the table contain 600,000 rows?

    As for running out of space on your C: drive, that is going to happen because SSMS has to put the output results somewhere. This ends up being your temp directory - and if you want it redirected, you need to move your temp directory to another drive.

    The query that I provided is not filtered in any way other than eliminating those rows you said you didn't want. If that equates to more than 600,000 rows - then it is going to take that long to return the results.

    Oh, one more thing - if you had bothered with reviewing the article we mentioned and posted the table definitions (with indexes) we could offer better advise. As it is, we have no idea whether or not you even have any indexes on this table. If you don't have a clustered index on your ID, and a non-clustered index on your email1 column - I can't see how this query could be improved.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams (7/2/2009)


    ifila (7/2/2009)


    I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?

    It took 35 minutes to process 600 000 records !

    How do i run this query so that it uses resources on a different drive ?

    It was rapidly using ALL my 'C' drive space !

    Thanks

    Is that returning 600,000 rows - or does the table contain 600,000 rows? I had to stop after 600k, the DB has 1.4 Million records.

    As for running out of space on your C: drive, that is going to happen because SSMS has to put the output results somewhere. This ends up being your temp directory - and if you want it redirected, you need to move your temp directory to another drive.

    Thanks

    The query that I provided is not filtered in any way other than eliminating those rows you said you didn't want. If that equates to more than 600,000 rows - then it is going to take that long to return the results.

    Oh, one more thing - if you had bothered with reviewing the article we mentioned and posted the table definitions (with indexes) we could offer better advise. As it is, we have no idea whether or not you even have any indexes on this table. If you don't have a clustered index on your ID, and a non-clustered index on your email1 column - I can't see how this query could be improved.

    I did review the article and believed what i provided in my attachment was a table definition. This shows the only index field i have. I also mentioned i was using FTI on the resume field.

    Thanks

  • ifila (7/2/2009)


    I do have a couple of questions. While testing Jeffrey's solution, is there any way to make it run quicker ?

    It took 35 minutes to process 600 000 records !

    How do i run this query so that it uses resources on a different drive ?

    It was rapidly using ALL my 'C' drive space !

    Thanks

    Are you saying that your database files (mdf/ldf) reside on the C drive and are growing to fill the entire disk ? What sort of disk io system have you ?



    Clear Sky SQL
    My Blog[/url]

  • No, i have :

    SQL Server program files on - C drive

    .ldf on an array on E drive

    .mdf on an array on F drive

    tempdb on an array on G drive

    The query was filling up my C drive. It used 10 GB in 30 minutes, i only had 10 GB more available and my DB was 35 GB, so i did not want to use all my C drive space.

    Thanks

  • I would imagine that putting an index on the email column would do nothing but help with the performance as, internally, it will already have a ordered list to base the rownum on. As for it filling up your local disk , why do you want or need all the results in the SSMS results tab. Why not put them to a #table where they will be much easier to process to whatever your next step is.



    Clear Sky SQL
    My Blog[/url]

  • I am still trying to figure out the best solution !

    It is a work in progress 🙂

    The only reason i did not use a temp table is that i wanted to verify the results.

    Since it is taking longer than i would like, i may have to wait until the user selects certain records and run this code on a much smaller record group or create the equivalent code in ASP.

    Thanks

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply