June 19, 2008 at 12:54 pm
Back in the day, when I had to join tables manually through code, I could handle single row joins. but in SQL I don't know how.
I want to join a driver table to a second table.. but really only as a check. I only want to see if a match exists. I don't need to join to all the records it matches. Is there are way to do this?
Psuedoquery follows
select * from customer
join orders on 'first' orders.cust number = customer.cust-number
i don't want to join to all the other records for efficiency reasons. this example above is not the real world situation, but essentially the problem is I don't have a master table to join to so when the join happens I have to make MANY unnecessary joins (thousands) just to see if there is a match. Both the driver and the joined table are huge. The combined joined result is obviously even larger and takes forever to complete (if it even does)
Thanks
June 19, 2008 at 1:32 pm
read Jeff Modens article on "tally table"
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
June 19, 2008 at 1:36 pm
[font="Arial"]
Hello,
If I understand your issue correctly, you have a driver table and a related table.
Let's say:
create table #driverTable
( rowId int identity
columnOne varchar(10) null,
column99 int )
Let's say the columnOne contains a customer's name as in the defined table Customer Table below.
Lets say there's an existing Customertable you want to restrict the driver table to for only those records
that have a matching customerName in it:
Customer
( customer_table_id int identity,
customerName varchar(10 ) null )
select columnOne, column99
from #driverTable
where columnOne in ( select distinct customerName from Customer )
Is that what you mean?
Regards,
Terry
[/font]
June 19, 2008 at 1:45 pm
Essentially, yes. However picture the customer table you refer to having 40 million records in it and thousands of potential matches on customer name. But perhaps this is still the most efficient way to do it. And I was under the impression i should avoid the 'IN' if the list was to be larger than say 100 total entries. I will have tens of thousands.
June 19, 2008 at 1:46 pm
mrogers (6/19/2008)
Back in the day, when I had to join tables manually through code, I could handle single row joins. but in SQL I don't know how.I want to join a driver table to a second table.. but really only as a check. I only want to see if a match exists. I don't need to join to all the records it matches. Is there are way to do this?
Psuedoquery follows
select * from customer
join orders on 'first' orders.cust number = customer.cust-number
i don't want to join to all the other records for efficiency reasons. this example above is not the real world situation, but essentially the problem is I don't have a master table to join to so when the join happens I have to make MANY unnecessary joins (thousands) just to see if there is a match. Both the driver and the joined table are huge. The combined joined result is obviously even larger and takes forever to complete (if it even does)
Thanks
What would define the 'first' row from Orders? You can do something like:
SELECT *
FROM Customer AS c
JOIN Orders AS o ON o.cust_number = c.cust-number
AND o.'column that defines first' = 'first value';
Jeff
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 19, 2008 at 1:53 pm
[font="Arial"]
For a 40 million row process, I would create a single column table of the unique customer names/id's. Create a clustered key of those customer names/id's and use it as the table to compare to. That will get you fast as it can go with no duplicate entries to compare. Obviously this would have to be built each time you ran the process but that's not a problem.
If you can add a column to the 40 million row table, I would also use an update instead of a table scan process to populate the 40 million row added colmun; (like customer_id_verified int null ). That way you could screen out the records in the 40 that weren't verified and then use the result for your process ( if any ).
Regarding your comment of the limit of 'IN' use to 100 records...that's probably a comment that's is applicable to smaller table results and not big ones like 40 million. It's a good idea to keep it's use to smaller result sets but several thousand is more like it. If it gets to a design criteria above that, that's when the create a temp or processing table with the key field as explained above is better. You may have 40 million client names to validate but the escential question is how many unique customer names. The unique values can be quickly checked and then you can decide which way is best.
Regards,
Terry
[/font]
June 19, 2008 at 1:55 pm
See that's the thing. There's nothing in the orders file that I can limit to. I just want to know that there is at least one record in there that matches on customer number
June 19, 2008 at 1:58 pm
I think you may be right... creating a temp table with the disctinct matches may be the best way. I was trying to get it in one statement and that's probably just not practical.
Thanks
June 19, 2008 at 2:00 pm
Easy.
You need what?
List of customers which have existing orders?
Just ask SQL Server for it!
select * from customer
WHERE EXISTS (select 1 from orders where orders.cust-number = customer.cust-number)
_____________
Code for TallyGenerator
June 19, 2008 at 2:05 pm
Then maybe something like:
SELECT *
FROM Customer c
INNER JOIN (SELECT DISTINCT cust_number
FROM Orders) o ON o.cust_number = c.cust-number;
Now, if you also want to see the Customers without any orders you would change the join to an outer join (LEFT OUTER JOIN). Performance will probably not be the best - and if not, you could probably try using a temp table:
CREATE TABLE #tOrders (cust_number int PRIMARY KEY CLUSTERED);
INSERT INTO #tOrders
SELECT DISTINCT cust_number FROM Orders);
SELECT *
FROM Customer c
INNER JOIN #tOrders o ON o.cust_number = c.cust-number;
Jeff
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 19, 2008 at 2:16 pm
Sergiy (6/19/2008)
Easy.You need what?
List of customers which have existing orders?
Just ask SQL Server for it!
select * from customer
WHERE EXISTS (select 1 from orders where orders.cust-number = customer.cust-number)
That's the ticket! Exactly what I was trying to do.
Thanks for all the input from everyone
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply