November 30, 2007 at 12:46 pm
I have this fat query runnig against our production database and taking huge resources to execute:
At the query execution, i noticed from task manager - isql taking 1GB of memory and the CPU utilization is 99% for this session (from where the query is ran).
I was wondering, if someone please take a look at this query and suggest, what can be done to make it run better. The SQL version is 2000 and server is configured with 10 GB of memory. The hardware platform is quiet robust and have multiple file groups defined on the database.
Query is as follows:
SELECT dimRptBasis.SEQ1, dimDataSrc.SEQ4, dimDataSrc.SEQ3, dimDataSrc.SEQ2, dimDataSrc.SEQ1, dimEmployee.SEQ2, dimEmployee.SEQ1, dimAccount_Expense.SEQ5, dimAccount_Expense.SEQ4, dimAccount_Expense.SEQ3, dimAccount_Expense.SEQ2, dimAccount_Expense.SEQ1, dimBusOrg.SEQ11, dimBusOrg.SEQ10, dimBusOrg.SEQ9, dimBusOrg.SEQ8, dimBusOrg.SEQ7, dimBusOrg.SEQ6, dimBusOrg.SEQ5, dimBusOrg.SEQ4, dimBusOrg.SEQ3, dimBusOrg.SEQ2, dimBusOrg.SEQ1, dimBusPartner.SEQ3, dimBusPartner.SEQ2, dimBusPartner.SEQ1,
dimCostCenter_Expense.SEQ5, dimCostCenter_Expense.SEQ4, dimCostCenter_Expense.SEQ3, dimCostCenter_Expense.SEQ2, dimCostCenter_Expense.SEQ1, dimTime.TIMEKEYNAME1, dimTime.TIMEKEYNAME2, dimTime.TIMEKEYNAME3, dimCategory.SEQ1,
dimProduct.SEQ4, dimProduct.SEQ3, dimProduct.SEQ2, dimProduct.SEQ1, dimProject.SEQ2, dimProject.SEQ1, tblFACTExpense.SIGNEDDATA FROM "dbo"."tblFactExpense", "dimRptBasis", "dimDataSrc", "dimEmployee",
"dimAccount_Expense", "dimBusOrg", "dimBusPartner", "dimCostCenter_Expense", "dimTime", "dimCategory", "dimProduct", "dimProject" WHERE ("dbo"."tblFACTExpense"."RptBasis"="dimRptBasis"."ID1")
AND ("dbo"."tblFACTExpense"."DataSrc"="dimDataSrc"."ID1") AND
("dbo"."tblFACTExpense"."Employee"="dimEmployee"."ID1") AND
("dbo"."tblFACTExpense"."Account_Expense"="dimAccount_Expense"."ID1") AND
("dbo"."tblFACTExpense"."BusOrg"="dimBusOrg"."ID1") AND
("dbo"."tblFACTExpense"."BusPartner"="dimBusPartner"."ID1") AND
("dbo"."tblFACTExpense"."CostCenter_Expense"="dimCostCenter_Expense"."ID1")
AND ("dbo"."tblFACTExpense"."TIMEID"="dimTime"."TIMEID") AND
("dbo"."tblFACTExpense"."Category"="dimCategory"."ID1") AND
("dbo"."tblFACTExpense"."Product"="dimProduct"."ID1") AND ("dbo"."tblFACTExpense"."Project"="dimProject"."ID1")
November 30, 2007 at 1:06 pm
First: Make sure you take a look at the execution plan and you are seen no table scans...
Second: Check that you are actually retrieving what you need and no more.
... that will get you started ....
* Noel
November 30, 2007 at 2:09 pm
After looking I cannot see anything glaring as I don't know enough however the first thing I would do is write this way and see if improves performance. But the other suggestions is def next step.
SELECT
dimRptBasis.SEQ1, dimDataSrc.SEQ4, dimDataSrc.SEQ3,
dimDataSrc.SEQ2, dimDataSrc.SEQ1, dimEmployee.SEQ2,
dimEmployee.SEQ1, dimAccount_Expense.SEQ5,
dimAccount_Expense.SEQ4, dimAccount_Expense.SEQ3,
dimAccount_Expense.SEQ2, dimAccount_Expense.SEQ1,
dimBusOrg.SEQ11, dimBusOrg.SEQ10, dimBusOrg.SEQ9, dimBusOrg.SEQ8,
dimBusOrg.SEQ7, dimBusOrg.SEQ6, dimBusOrg.SEQ5, dimBusOrg.SEQ4,
dimBusOrg.SEQ3, dimBusOrg.SEQ2, dimBusOrg.SEQ1, dimBusPartner.SEQ3,
dimBusPartner.SEQ2, dimBusPartner.SEQ1,
dimCostCenter_Expense.SEQ5, dimCostCenter_Expense.SEQ4,
dimCostCenter_Expense.SEQ3, dimCostCenter_Expense.SEQ2,
dimCostCenter_Expense.SEQ1, dimTime.TIMEKEYNAME1,
dimTime.TIMEKEYNAME2, dimTime.TIMEKEYNAME3, dimCategory.SEQ1,
dimProduct.SEQ4, dimProduct.SEQ3, dimProduct.SEQ2,
dimProduct.SEQ1, dimProject.SEQ2, dimProject.SEQ1,
tblFACTExpense.SIGNEDDATA
FROM
dbo.tblFactExpense
INNER JOIN
dbo.dimRptBasis
ON
dbo.tblFACTExpense.RptBasis = dimRptBasis.ID1
INNER JOIN
dbo.dimDataSrc
ON
dbo.tblFACTExpense.DataSrc = dimDataSrc.ID1
INNER JOIN
dbo.dimEmployee
ON
dbo.tblFACTExpense.Employee = dimEmployee.ID1
INNER JOIN
dbo.dimAccount_Expense
ON
dbo.tblFACTExpense.Account_Expense = dimAccount_Expense.ID1
INNER JOIN
dbo.dimBusOrg
ON
dbo.tblFACTExpense.BusOrg = dimBusOrg.ID1
INNER JOIN
dbo.dimBusPartner
ON
dbo.tblFACTExpense.BusPartner = dimBusPartner.ID1
INNER JOIN
dbo.dimCostCenter_Expense
ON
dbo.tblFACTExpense.CostCenter_Expense = dimCostCenter_Expense.ID1
INNER JOIN
dbo.dimTime
ON
dbo.tblFACTExpense.TIMEID = dimTime.TIMEID
INNER JOIN
dbo.dimCategory
ON
dbo.tblFACTExpense.Category = dimCategory.ID1
INNER JOIN
dbo.dimProduct
ON
dbo.tblFACTExpense.Product = dimProduct.ID1
INNER JOIN
dbo.dimProject
ON
dbo.tblFACTExpense.Project = dimProject.ID1
November 30, 2007 at 2:15 pm
Even if it's "auto-generated" code, someone someday is going to have to troubleshoot it... take some time out to format it! And, the use of table aliases would clear a great deal of the clutter as would removing the unnecessary double quotes and parenthesis from this, well, mess! Following a standard as to when to "break" a line would be a huge help, as well.
Since most of the table names begin with "dim", I can only assume that this is for some sort of OLAP processing so I don't know if everything I say in the following is 100% true.
1. I checked all of the joins... they are all viable and no cross-joins seem to be present. However, you have to make sure that ID1 is unique in the "dim" tables are you're gonna spawn a lot of rows that you hadn't inteneded to spawn almost as bad as if mini-cross-joins were built in.
2. Take Noeld's advice... at least run an estimated execution plan on it. Look for table scans and try to figure out some indexes that would help.
3. Try using the Index Tuning Wizard on this bad boy.
4. Make sure that all the tables have a Primary Key... I'm thinking that will probably be on the "ID1" column of most of your tables... If so, you may want to make it a CLUSTERED primary key.
5. Since I think the "dim" tables are for OLAP, when you create your indexes and primary key (especially if clustered), you might wanna try using a Fill Factor of 100... no sense searching blank space if you don't have to.
6. There's a pretty good chance of this building a rather large working table in TempDB... make sure that TempDB is the proper size to handle the working table and all the other casual traffic TempDB is going to see.
7. Unless you are using it for some type of filtering, I see no "dimReportBasis" columns in the SELECT list... are you 100% sure you need to join to it?
Please don't post any more unformatted code of this length... next time, I'm not even going to look at something like this. If you want help, help us... make the code easy to read. If you really don't care what it looks like, then we won't either... we'll just go to the next post 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
November 30, 2007 at 6:18 pm
Would like to thank everyone for there suggestions. I started looking into this and keep you posted, if i find anything intresting.
With regards to the post, please accept my sincere apology for the unformated code. i have taken a note here and promise this wont happen again.
Thanks again,
Cali
November 30, 2007 at 7:38 pm
Thanks, Cali. I appreciate the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 2, 2007 at 1:54 am
I have some updates to share with you folks. Here are couple of things i have done to make the query run faster - but tough luck. Here are the approachs i took:
First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...
Second approach: I saved the query to a sql table and used index tuning advisor to recommend as to what needed to make this query run faster. unfortunately index advisor doesnt recommend anything exciting.
Third approach: I did look at the query execution plan and i dont see full table scan. but what i see, lot's of hash - match inner joins with a Cost of 10%. additionally i see index seek scans as well. but dont know for sure, what hash match are related too?
If you guys could suggest something, that will be wonderful.
Thanks,
Cali
December 2, 2007 at 4:41 am
Did I read correctly when you said isql was taking 1GB of RAM?? That's the client tool you use to run the query, not SQL Server itself.... How much RAM was the sqlservr.exe process consuming?
What indices do you currently have on the tables. Any field that you use for joining should have some sort of an index. If this is your main use for the tables then you could make the indices clustered. If not, then try covering indices...
I haven't looked at the query in depth (others have already complained about the formatting... Crystal Reports makes ugly looking code too - I hate debugging it!) but you could try an indexed view if you're willing to suffer a slight drop in insert/update/delete performance. I could only see inner joins so this would lend itself to an indexed view (or at least trying one). How many rows (roughly) are we talking about in the various tables anyway?
December 2, 2007 at 10:38 am
Hello Ian,
Yes the sql query tool takes over 1 GB of ram (precisely it took 1.8 GB to complete) in addition to sqlservr.exe which takes 2.1 GB. The rows returned are close to 42 million.
The dim and tblfactexpense tables has clustered indexes and the application is microsoft olap.
appreciated any input to makes this thing runs fast.
thanks,
cali
December 3, 2007 at 1:12 am
Did you try any of the things I posted?
Also, why are you returning 42 million rows instead of a summary of the rows?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2007 at 8:11 am
California (12/2/2007)
First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...
Sorry I wasn't looking for a significant increase just to make sure I managed to correctly line up with inner joins for readability. But under the hood they should have had the same execution plan.
If there is that much difference then review the execution plan of each NOTE: you can get the estimated execution plan which will save you waiting for an entire run. If the proposed (see updated below) is different then tell me how please. It could also be yours used the cache more becuase mine was just run once those execution plans in memory can be funny that way. But the real question is how many rows do you need returned? If you are returning 10 Million ros depending on the data the client GUI may be the bottleneck itself becuase of the memory needed to display.
Here is the final corrected version I would use personally as a starting point
SELECT
dimRptBasis.SEQ1
, dimDataSrc.SEQ4
, dimDataSrc.SEQ3
, dimDataSrc.SEQ2
, dimDataSrc.SEQ1
, dimEmployee.SEQ2
, dimEmployee.SEQ1
, dimAccount_Expense.SEQ5
, dimAccount_Expense.SEQ4
, dimAccount_Expense.SEQ3
, dimAccount_Expense.SEQ2
, dimAccount_Expense.SEQ1
, dimBusOrg.SEQ11
, dimBusOrg.SEQ10
, dimBusOrg.SEQ9
, dimBusOrg.SEQ8
, dimBusOrg.SEQ7
, dimBusOrg.SEQ6
, dimBusOrg.SEQ5
, dimBusOrg.SEQ4
, dimBusOrg.SEQ3
, dimBusOrg.SEQ2
, dimBusOrg.SEQ1
, dimBusPartner.SEQ3
, dimBusPartner.SEQ2
, dimBusPartner.SEQ1
, dimCostCenter_Expense.SEQ5
, dimCostCenter_Expense.SEQ4
, dimCostCenter_Expense.SEQ3
, dimCostCenter_Expense.SEQ2
, dimCostCenter_Expense.SEQ1
, dimTime.TIMEKEYNAME1
, dimTime.TIMEKEYNAME2
, dimTime.TIMEKEYNAME3
, dimCategory.SEQ1
, dimProduct.SEQ4
, dimProduct.SEQ3
, dimProduct.SEQ2
, dimProduct.SEQ1
, dimProject.SEQ2
, dimProject.SEQ1
, tblFACTExpense.SIGNEDDATA
FROM
dbo.tblFactExpense tblFactExpense
INNER JOIN
dbo.dimRptBasis dimRptBasis
ON
tblFACTExpense.RptBasis = dimRptBasis.ID1
INNER JOIN
dbo.dimDataSrc dimDataSrc
ON
tblFACTExpense.DataSrc = dimDataSrc.ID1
INNER JOIN
dbo.dimEmployee dimEmployee
ON
tblFACTExpense.Employee = dimEmployee.ID1
INNER JOIN
dbo.dimAccount_Expense dimAccount_Expense
ON
tblFACTExpense.Account_Expense = dimAccount_Expense.ID1
INNER JOIN
dbo.dimBusOrg dimBusOrg
ON
tblFACTExpense.BusOrg = dimBusOrg.ID1
INNER JOIN
dbo.dimBusPartner dimBusPartner
ON
tblFACTExpense.BusPartner = dimBusPartner.ID1
INNER JOIN
dbo.dimCostCenter_Expense dimCostCenter_Expense
ON
tblFACTExpense.CostCenter_Expense = dimCostCenter_Expense.ID1
INNER JOIN
dbo.dimTime dimTime
ON
tblFACTExpense.TIMEID = dimTime.TIMEID
INNER JOIN
dbo.dimCategory dimCategory
ON
tblFACTExpense.Category = dimCategory.ID1
INNER JOIN
dbo.dimProduct dimProduct
ON
tblFACTExpense.Product = dimProduct.ID1
INNER JOIN
dbo.dimProject dimProject
ON
tblFACTExpense.Project = dimProject.ID1
December 3, 2007 at 8:30 am
True or False:
This query was captured as the SQL sent from MS Analysis Services ? As such you have no control over how the SQL is constructed and can't convert old style join syntax to INNER JOIN.
If True:
Where does isql and its CPU/memory utilization factor into your problem ?
December 3, 2007 at 8:55 am
PW (12/3/2007)
True or False:This query was captured as the SQL sent from MS Analysis Services ? As such you have no control over how the SQL is constructed and can't convert old style join syntax to INNER JOIN.
Never saw anything about Analysis Services in the post just the query he posed which I altered to INNER JOIN version of the same thing.
If True:
Where does isql and its CPU/memory utilization factor into your problem ?
Any method of display records will require memory to manage the dataset, the client cursor (depending on server side or client side will have a different impact), display the values and will require using the OS page file even in cases where it may not use all of the system memory. This along with other processes running will impact overall speed and performance of the data being displayed. Especially if the system has to juggle data in an out of the page file and the system is swamped. I have been on machines with 1 GB of physical ram on the client and when looking at the server it showed the connection sleeping but the client machine was plodding along several mintues after to just display the data.
Have to understand the need for a lot of records in one go before I can determine what courses of action may help. First place is clean up the query and determine what is happening in the execution plan. But this is a straight forward query of joins such that either the joins are slow due to inadequte indexes for the join to manage, the server is mismanaging the query in the execution plan (known issues with parallel processing in SQL 2000 where MAX DOP 0 could have huge performance gains), there is a network bottleneck (data onto, across or off the wire can be the issue), or the client workstation is being swamped by trying to visually manage too much data. For the later I would suggest output to a file them check the file might show the results they are after if the are just trying to testing for an extract, if they need to display anything visually then do they really need all records, or if not then complete out the query and see what you finally get. But overall this one really boils down to indexes, and infrastructure as best I can tell.
December 3, 2007 at 9:04 am
California (12/2/2007)
I have some updates to share with you folks. Here are couple of things i have done to make the query run faster - but tough luck. Here are the approachs i took:First approach: I ran the SQL Query what Antares686 suggested and unfortunately this is taking way longer than my original query.I.E - 2 hours 40 minutes to complete where as my unformated query takes around 1 hour 40 minutes...
Second approach: I saved the query to a sql table and used index tuning advisor to recommend as to what needed to make this query run faster. unfortunately index advisor doesnt recommend anything exciting.
Third approach: I did look at the query execution plan and i dont see full table scan. but what i see, lot's of hash - match inner joins with a Cost of 10%. additionally i see index seek scans as well. but dont know for sure, what hash match are related too?
If you guys could suggest something, that will be wonderful.
Thanks,
Cali
Hash match usually means that the indexing isn't all that wonderful, so SQL server is building a hash table to match things up. In other words - it's guessing the smaller of two tables and build a unique value list from it to use that to match things up.
Since you're in 2005 - have you tried building covering indexes? One for each table, adding the join criteria being used to the key portion and including the non-join fields (this makes an actual difference in 2005).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 3, 2007 at 9:06 am
See yesterday's reply from the OP:
It is "microsoft olap".
I recognize the query (the object name double quoting & old style joins) from having to deal with similar problems in queries originating from cube processing.
I'm questioning isql and memory usage because I think it's a red herring that has us focusing on the wrong issue:
I'm assuming:
Slow query in cube building process was causing concern
The SQL text of the query was captured in profiler & copied to isql for disgnostic purposes
Isql was swamped with large resultset and then clouded the whole issue by causing CPU and memory issues that are due to use of isql.
Need the OP to clarify what's going on and how isql fts into this.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply