June 1, 2006 at 9:30 pm
Hi,
I've started working on this new project. I've a big query below that is joining around 20 table. Developer before me had written it and i've to correct it.How can i make possible improvement in this query? If i remove some tables or some AND conditions from query then i got thousands of records. I've already created good indexes on all the tables in query. There are no table scans in the execution plan. There are NO book marks. There are some Index scans in the plan but can't take care of that because there are more than 20 tables i'm trying to join here.I will really appreciate your feedback.
Thanks
SELECT DISTINCT
Orders.Order_ID, Orders.Order_DateOrdered, Orders.Order_Taxes, Orders.Order_Commission,
Orders.Order_Comments, Orders.Order_OrderNumber, Orders.Order_CustomerNumber,
Orders.Order_Supplier, Orders.Order_CCUsed, Orders.Order_OrderedBy , Orders.Order_Shipping,
Orders.Order_AddressInfo,Orders.Order_FullAddressInfo,GiftPurchase.GiftPurchase_Quantity, GiftPurchase.GiftPurchase_Price,
GiftPurchase.GiftPurchase_Shipping, GiftPurchase.GiftPurchase_Taxes, UserProduct.UserProduct_Name,
UserProduct.UserProduct_Description,UserProduct.UserProduct_ModelType, UserProduct.UserProduct_DeliveryOption,
UserProduct.UserProduct_ID,
Shipping.Shipping_ShippedBy, Shipping.Shipping_DateShipped, Shipping.Shipping_ShipmentArrivalDate,
Shipping.Shipping_Tracking, Shipping_TransactionFee, Shipping.Shipping_ShipmentMethod, Shipping.Shipping_Cost,
Request.Request_Event, ShippingCollection_TS, ShippingCollection_ID,ShippingCollection_MerchantOrderNumber,
ShippingCollection_OrderNumber, SupplierOrder_OrderNumber , SupplierOrder_ID, Store_ID,Person_ID, Person_Forename,
Person_Surname, Person_Email
FROM
Orders, GiftPurchase, UserProduct, SupplierOrderOrders, SupplierOrder, Shipping, Request, ShippingCollection, Store,
Supplier,Reqfill, Transactions, Person, Event , Purchaser , GiftSupplierFilled, AllocatedGift,
AllocateGiftToSupplier
WHERE GiftPurchase.GiftPurchase_Request = UserProduct.UserProduct_Request AND
Request.Request_ID = UserProduct.UserProduct_Request AND Orders.Order_ID= GiftPurchase.GiftPurchase_Order
AND SupplierOrderOrders.SupplierOrderOrders_Orders = Orders.Order_ID
AND SupplierOrderOrders.SupplierOrderOrders_SupplierOrder = SupplierOrder.SupplierOrder_ID
AND SupplierOrder.SupplierOrder_Supplier = Supplier_ID
AND GiftPurchase.GiftPurchase_Shipping = Shipping.Shipping_ID AND
ShippingCollection.ShippingCollection_Shipping = Shipping.Shipping_ID
AND Supplier_ID = 703
AND Supplier_Store = Store_ID
AND Request_ID = Reqfill_Request
AND Request_Event = Event_ID
AND Purchaser_Person = Person_ID
AND Purchaser_Transaction = Transaction_ID
AND Reqfill_Transaction = Transaction_ID
AND Orders.Order_ID = GiftSupplierFilled.GiftSupplierFilled_Order
AND UserProduct.UserProduct_ID=GiftSupplierFilled.GiftSupplierFilled_UserProduct
AND UserProduct.UserProduct_ID = AllocatedGift.AllocatedGift_UserProduct AND
Transactions.Transaction_ID = AllocatedGift.AllocatedGift_Transaction
AND AllocatedGift.AllocatedGift_ID = AllocateGiftToSupplier.AllocateGiftToSupplier_AllocatedGift
AND GiftSupplierFilled.GiftSupplierFilled_AllocateGiftToSupplier = AllocateGiftToSupplier.AllocateGiftToSupplier_ID
AND Person_Surname like '%Takaba%'
ORDER BY ShippingCollection_ID DESC, Request_Event, SupplierOrder_ID, Orders.Order_ID, UserProduct.UserProduct_ID;
June 2, 2006 at 1:13 am
You can make a tremendous improvement for this query! As it is right now, you are selecting the cartesian product (all possible combinations) for the 18 tables you have. That would easily bring the query optimizer to it's knees. Multiply the number of records in every 18 tables you have and post the number here. I'll bet the product is some billions and more. I'll write down some things for you to do now. Remember to use fully qualified named (Order.OrderID, not just OrderID) in the future, for your own sake. It will be much easier to debug sometime in the future. If succeeding to implement the things written below, I would estimate that this query would perform at least 100 times better.
1. Rewrite for using INNER JOINs
SELECT Orders.Order_ID
...
FROM Orders
INNER JOIN GiftPurchase GP ON GiftPurchase.GiftPurchase_Order = Orders.Order_ID
INNER JOIN ...
I tried to do that for you, but since there are many unqualified names, it wasn't possible for me, since I can't see the table layouts.
2. Place everything in a VIEW, except
AND Supplier_ID = 703
AND Person_Surname like '%Takaba%'
Then call the view from your stored procedure as
SELECT * FROM MyView WHERE MyView.Supplier_ID = 703 AND MyView.Person_Surname LIKE '%Takaba%'
This way, the query optimizer in SQL Server doesn't have to recompile the joins every time!
3. If possible, make the view indexed.
That's my opninion.
N 56°04'39.16"
E 12°55'05.25"
June 2, 2006 at 7:54 am
I agree that you should use the new JOIN syntax to make the code easier to read, but it won't necessarily make any difference to the query plan - it certainly isn't needed to prevent a physical Cartesian product! The optimiser has been using WHERE conditions as join predicates since it was first developed. It might help, by giving the optimiser a good start so that it will have time to generate a decent plan before it gets fed up and starts the query (yes, that's how it works!).
But you have circular joins, so putting them in the 'wrong' join could prevent the optimiser from generating a good plan. They are presumably needed to limit records correctly, but they make the optimisers job much harder. they are marked ---**** in the accompanying code.
I think you can also get rid of the Events and Stores tables, unless joins to them are needed to check referential integrity of the FK that points to it (I hope it's not!). You also use SupplierOrderOrders to join Supplier to SupplierOrders, which is then joined to orders. Doesn't SupplierOrders have a foreign key to Supplier? And shouldn't SupplierOrderOrders be used to join SupplierOrders to Orders? Just a question, as I obviously don't know your schema.
I've attached a proposed rewrite of your code - it's not checked or tested in any way, so it might not even compile, but I hope it gives an idea of how to proceed.
Another thing is that half your tables are used only for joining, which might mean that they could be eliminated (like the Events and Stores tables can if my suggestion above is correct), if you can find another join path which defines the correct recordset. This might be the case if you are joining two table to the same PK on a third table - in which case the (indexed) foreign keys could possibly be joined directly to each other Or there might be denormalised foreign keys on your table - it looks as though there may be, since the joins around SupplierOrderOrders seem a bit odd, as I suggest.
Finally, the DISTINCT could be expensive, so you might need to look at why it is needed - could be more joins I'm afraid! Can you post an execution plan?
Also, consider whether the query could be broken down into smaller views by subject matter, then joined. It might make it easier to optimise, and once the units are optimised (and prefereable each one has a good WHERE clause attached), they can be reused with greater confidence in other boig queries. It might just make preformnce worse though...
distinct
.Order_ID, ord.Order_DateOrdered, ord.Order_Taxes, ord.Order_Commission, ord.Order_Comments, ord.Order_OrderNumber, ord.Order_CustomerNumber, ord.Order_Supplier, ord.Order_CCUsed, ord.Order_OrderedBy , ord.Order_Shipping, ord.Order_AddressInfo, ord.Order_FullAddressInfo,
.GiftPurchase_Quantity, gp.GiftPurchase_Price, gp.GiftPurchase_Shipping, gp.GiftPurchase_Taxes,
.Request_Event,
dbo.Supplier su
join dbo.SupplierOrder so
on so.SupplierOrder_Supplier = su.Supplier_ID
join dbo.SupplierOrderOrders soo
on soo.SupplierOrderOrders_SupplerOrder = so.SupplierOrder_ID
join dbo.Orders ord
on soo.SupplierOrderOrders_Orders = ord.Order_ID
join dbo.GiftPurchase gp
on ord.Order_ID= gp.GiftPurchase_Order
join dbo.Request re
on gp.Request_ID = re.Request_ID
join dbo.UserProduct up
on re.Request_ID = up.UserProduct_Request
join dbo.Reqfill rf
on rf.Reqfill_Request = re.Request_ID
join dbo.Transactions tr
on tr.Transaction_ID = rf.Reqfill_Transaction
join dbo.Purchaser pu
on pu.Purchaser_Transaction = tr.Transaction_ID
join dbo.Person pe
on pu.Purchaser_Person = pe.Person_ID
join dbo.AllocatedGift ag
on ag.AllocatedGift_Transaction = tr.Transaction_ID
and ag.AllocatedGift_UserProduct = up.UserProduct_ID ---****
--if you had a supplier ID to join on, that would be helpful.
join dbo.GiftSupplierFilled gsf
on gsf.GiftSupplierFilled_UserProduct = up.UserProduct_ID
and gsf.GiftSupplierFilled_Order = ord.Order_ID --*****
on gp.GiftPurchase_Shipping = sh.Shipping_ID
join dbo.ShippingCollection sc
on sc.ShippingCollection_Shipping = sh.Shipping_ID
su.Supplier_ID = 703
pe.Person_Surname like 'Takaba%' --don't use a leading wildcard - it means you can't use an index.
BY sc.ShippingCollection_ID DESC,
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 2, 2006 at 10:24 pm
Hi Stax68 and Peter,
Thanks a lot for both of your reply.
Stax I worked on your suggestions.
(i)I checked the schema but the whole database is a mess. SupplierOrders doesn't have a foreign key to Supplier and that's true for all these 18 tables.There are no foreign key constraint defined on any of these 18 tables used in these joins. So I guess we can not eliminate joins based up on the foreign key assumptions you told me.
(ii) Yes I was able to remove join re.Request_Event = ev.Event_I but it didn't give much drastic improvement.
(iii) In the original query userproduct table was costing the maximum and that was 28%.It was doing cluster index scan because userproduct is a big table for around 600000 records.
After I changed the query to the way you mentioned below
pe.Person_Surname like 'Takaba%'(removed the % sign before t)
the cost for userproduct table was reduced to 5% from 28%.Makes me wonder how such a small change in person's table column criteria did so much drastic improvement in the userproduct table. If you have answer then please do explain to me.
Also I would also like to get some benifit from peter's suggestion. If i put this code in to view will that be helpful? As peter is mentioning that if i put everything like..
-------------------------------------------------------------------
Place everything in a VIEW, except
AND Supplier_ID = 703
AND Person_Surname like '%Takaba%'
Then call the view from your stored procedure as
SELECT * FROM MyView WHERE MyView.Supplier_ID = 703 AND MyView.Person_Surname LIKE 'Takaba%'
This way, the query optimizer in SQL Server doesn't have to recompile the joins every time!
----------------------------------------------------------
Questions i've is that are the views that much helpful? If i've view then is it stored just similar to stored procedure and don't need to recompile them after once they are created? And in my case putting the code which i changed the way Stax68 mentioned in a view will be more faster?
Please let me know.
Thanks for your help on this.
June 3, 2006 at 4:14 am
The reason a leading wildcard slows things down is because you can't use an index. The index is arranged in (an) alphabetical order, so to find things quickly you need to know how the word starts. If you don't know that, SQL has to look in each data value to see if it matches your pattern.
Re. the view. This will probably give some performance improvement, and it's certainly good practice, but it may not make a huge difference. Fully (schema + table) qualifying column and table names would meanyou can reproduce the advantages of the view simply by running exactly the same statement more than once. It's to do with execution plan reuse. While you're developing the code, this is fairly irrelevant since the plan would be discarded when the view changed, but once it's live, go ahead.
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply