Join to only one record

  • Back in the day, when I had to join tables manually through code, I could handle single row joins. but in SQL I don't know how.

    I want to join a driver table to a second table.. but really only as a check. I only want to see if a match exists. I don't need to join to all the records it matches. Is there are way to do this?

    Psuedoquery follows

    select * from customer

    join orders on 'first' orders.cust number = customer.cust-number

    i don't want to join to all the other records for efficiency reasons. this example above is not the real world situation, but essentially the problem is I don't have a master table to join to so when the join happens I have to make MANY unnecessary joins (thousands) just to see if there is a match. Both the driver and the joined table are huge. The combined joined result is obviously even larger and takes forever to complete (if it even does)

    Thanks

  • read Jeff Modens article on "tally table"

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • [font="Arial"]

    Hello,

    If I understand your issue correctly, you have a driver table and a related table.

    Let's say:

    create table #driverTable

    ( rowId int identity

    columnOne varchar(10) null,

    column99 int )

    Let's say the columnOne contains a customer's name as in the defined table Customer Table below.

    Lets say there's an existing Customertable you want to restrict the driver table to for only those records

    that have a matching customerName in it:

    Customer

    ( customer_table_id int identity,

    customerName varchar(10 ) null )

    select columnOne, column99

    from #driverTable

    where columnOne in ( select distinct customerName from Customer )

    Is that what you mean?

    Regards,

    Terry

    [/font]

  • Essentially, yes. However picture the customer table you refer to having 40 million records in it and thousands of potential matches on customer name. But perhaps this is still the most efficient way to do it. And I was under the impression i should avoid the 'IN' if the list was to be larger than say 100 total entries. I will have tens of thousands.

  • mrogers (6/19/2008)


    Back in the day, when I had to join tables manually through code, I could handle single row joins. but in SQL I don't know how.

    I want to join a driver table to a second table.. but really only as a check. I only want to see if a match exists. I don't need to join to all the records it matches. Is there are way to do this?

    Psuedoquery follows

    select * from customer

    join orders on 'first' orders.cust number = customer.cust-number

    i don't want to join to all the other records for efficiency reasons. this example above is not the real world situation, but essentially the problem is I don't have a master table to join to so when the join happens I have to make MANY unnecessary joins (thousands) just to see if there is a match. Both the driver and the joined table are huge. The combined joined result is obviously even larger and takes forever to complete (if it even does)

    Thanks

    What would define the 'first' row from Orders? You can do something like:

    SELECT *

    FROM Customer AS c

    JOIN Orders AS o ON o.cust_number = c.cust-number

    AND o.'column that defines first' = 'first value';

    Jeff

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • [font="Arial"]

    For a 40 million row process, I would create a single column table of the unique customer names/id's. Create a clustered key of those customer names/id's and use it as the table to compare to. That will get you fast as it can go with no duplicate entries to compare. Obviously this would have to be built each time you ran the process but that's not a problem.

    If you can add a column to the 40 million row table, I would also use an update instead of a table scan process to populate the 40 million row added colmun; (like customer_id_verified int null ). That way you could screen out the records in the 40 that weren't verified and then use the result for your process ( if any ).

    Regarding your comment of the limit of 'IN' use to 100 records...that's probably a comment that's is applicable to smaller table results and not big ones like 40 million. It's a good idea to keep it's use to smaller result sets but several thousand is more like it. If it gets to a design criteria above that, that's when the create a temp or processing table with the key field as explained above is better. You may have 40 million client names to validate but the escential question is how many unique customer names. The unique values can be quickly checked and then you can decide which way is best.

    Regards,

    Terry

    [/font]

  • See that's the thing. There's nothing in the orders file that I can limit to. I just want to know that there is at least one record in there that matches on customer number

  • I think you may be right... creating a temp table with the disctinct matches may be the best way. I was trying to get it in one statement and that's probably just not practical.

    Thanks

  • Easy.

    You need what?

    List of customers which have existing orders?

    Just ask SQL Server for it!

    select * from customer

    WHERE EXISTS (select 1 from orders where orders.cust-number = customer.cust-number)

    _____________
    Code for TallyGenerator

  • Then maybe something like:

    SELECT *

    FROM Customer c

    INNER JOIN (SELECT DISTINCT cust_number

    FROM Orders) o ON o.cust_number = c.cust-number;

    Now, if you also want to see the Customers without any orders you would change the join to an outer join (LEFT OUTER JOIN). Performance will probably not be the best - and if not, you could probably try using a temp table:

    CREATE TABLE #tOrders (cust_number int PRIMARY KEY CLUSTERED);

    INSERT INTO #tOrders

    SELECT DISTINCT cust_number FROM Orders);

    SELECT *

    FROM Customer c

    INNER JOIN #tOrders o ON o.cust_number = c.cust-number;

    Jeff

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Sergiy (6/19/2008)


    Easy.

    You need what?

    List of customers which have existing orders?

    Just ask SQL Server for it!

    select * from customer

    WHERE EXISTS (select 1 from orders where orders.cust-number = customer.cust-number)

    That's the ticket! Exactly what I was trying to do.

    Thanks for all the input from everyone

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply