Compare tables with more than 4.9 million records

  • Hi,

    I want to compare ONLY 1 Column values from 2 tables having more than 4.9 million records. There is a difference of 4000 rows between the 2 tables.

    SELECT ID From TABLE1 where ID not in (SELECT DISTINCT ID From TABLE2)

    My above query took nearly 4.5 hours to run and I had to cancel it. Is there a better way to write the query . I just want to compare the ID - column values which are missing in TABLE2

  • Have you tried the EXCEPT clause? Do you have indexes in your tables?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • or maybe (From a database I had handy)

    SELECT p.ProtocolNo

    FROM Protocol p

    WHERE NOT EXISTS (SELECT 1 FROM Enroll e WHERE e.ProtocolNo = p.ProtocolNo);

    ProtocolNo is the primary key of Protocol, so that's indexed. e.ProtocolNo is a foreign key, so I indexed that too.

    Are your PK and FK's indexed?

  • Is EXCEPT Clause faster

    or

    is NOT EXISTS Clause faster ?

  • ms-techie (3/18/2014)


    Is EXCEPT Clause faster

    or

    is NOT EXISTS Clause faster ?

    Have you tested them? That's the best indicator.

    Someone has made some tests for you. You could thank Gail, but you should test as well in your environment http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • EXCEPT Clause worked like a charm. I got output in just 9 mins from 4.5 hours. Thank you

    Now when I do a simple select , it is taking more than 30 mins to execute . There are 4.9 million rows in the table.

    SELECT ID FROM TABLE where ID = '1234'

    --Is there a better than this SELECT QUERY.

  • Is there an index on the column "ID"? If not, there's your problem. Check the execution plan for the query - that will show you what's going on and where to start looking for causes. (If I were smart enough to understand it, I would go get a copy of Grant Fritchey's book on execution plans.) Actually, dig around for some of Grant's posts on reading execution plans - he might even have some one YouTube (look up ScaryDBA, and you'll find him).

  • food for thought.......note index

    SELECT TOP 5000000

    TranID = IDENTITY(INT, 1, 1)

    INTO Table1

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    ALTER TABLE [dbo].[Table1] ADD CONSTRAINT [t1_ID] PRIMARY KEY CLUSTERED ([TranID] ASC)

    GO

    SELECT TOP 5000000

    TranID = IDENTITY(INT, 1, 1)

    INTO Table2

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    CROSS JOIN sys.all_columns ac3

    ALTER TABLE [dbo].[Table2] ADD CONSTRAINT [t2_ID] PRIMARY KEY CLUSTERED ([TranID] ASC)

    GO

    DELETE FROM Table2

    WHERE (TranID % 1250 = 0)

    SELECT Table1.TranID

    , Table2.TranID AS Expr1

    FROM

    Table1 LEFT OUTER JOIN Table2 ON Table1.TranID = Table2.TranID

    WHERE (Table2.TranID IS NULL);

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply