February 12, 2009 at 4:21 am
I have a problem where the optimiser chooses a poor plan. The query is on a view and the problem is the choice of indexes in one of the tables in the view. Is there any way I can provide a hint in the query using the view to specify the index to use in the main query rather than within the view as there may be other queries that use the view when that hint would be unhelpful.
What I have is basically a list of cash transactions with an account_id and a ledger_date and there is an index on both of these. There is then a table of account groups which maps each group to a range of accounts. There are usually 3-10 accounts per group.
The query is looking for transactions for a single account group over a range of dates (the range can be quite large) and sometimes for no reason that I can fathom it seems to think it is a good idea to scan the range of dates and then filter by account rather than the other way around.
So I want to do something like this...(but the hint is ignored presumably because this index doesn't actually exist on the view)
select sum(amount)
from vwCashTransactions with (index (idx_account))
where groupaccountid = @P1 and ledgerdate between @P2 and @P3
Now I could do something hideous to get round it such as this but I though it would be clearer to other people what I was doing if I could do it with hints or something else.
select sum(amount)
from vwCashTransactions
where groupaccountid = @P1 and coalesce(ledgerdate,ledgerdate) between @P2 and @P3
Any other bright ideas?
Thanks
Ken
February 12, 2009 at 4:24 am
I would never mess with index hints, let MSSQL choose the best plan for you.
Try updating statistics: poor query plans are often due to outdated statistics or bad indexes.
-- Gianluca Sartori
February 12, 2009 at 4:31 am
Sorry should have mentioned , been there done that. It does kind of make a difference but only for a day at most and there isn't enough activity on the table to make day old statistics invalid. I find it odd that it takes a BETWEEN range condition over an equality one. It does seem to get much worse when we have some light days and the number of rows at the top end of the query date range are low, like it guesses that all days in the range have this small number of rows. I really don't want to have to go down the road of updating statistics every day on a table where the row distribution doesn't change that much :crazy:
February 12, 2009 at 5:57 am
You might not want to go down that road, and I don't blame you, but you might have to. Are you doing full scans on the statistics or just using the default sampling? A full scan might make a difference.
Just so you know, we have a couple of tables that we update statistics on several times a day. Partly this is because of bad design that we're not allowed to "fix" because it's a financial system and they'd would have to go through a ninety day test cycle. Partly it's because the data is just volatile enough and just odd enough, that it needs VERY good statistics. Most of the rest of our databases run just fine with statistics updated weekly.
Also, have you checked the index fragmentation levels?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 12, 2009 at 6:07 am
OK thanks, that surely can't be right though! We have a bit of bad design here and there too but this particular section looks OK and it isn't an unreasonable query. I'm not even sure it will help in all cases either. I have seen it happen where the end date in the range is today and if we have no transactions for today it does it by date even though the start date of the range is a month or more ago. No matter how many times I recalc the stats it won't make any difference until we actually have some transactions!
The whole thing just seems broken and I don't understand why I have to put so much work into something so simple!
Maybe I'll go with the coalesce hack for now after all :unsure:
Oh and I defragment every week too if it is more than 25% fragmented. Fragmentation doesn't make any difference to query plan choice though does it?
February 12, 2009 at 6:19 am
You could post the execution plan & query to see if anyone can spot something. I'd do that in a seperate post though so you get more eyeballs.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 12, 2009 at 6:25 am
Have you tried some kind of query refactoring to force the query optimizer to do what you expect?
Sometimes the same query with a different syntax takes a different query plan...
select sum(amount)
from (
select amount, ledgerdate
from vwCashTransactions
where groupaccountid = @P1
) as a
where ledgerdate between @P2 and @P3
-- Gianluca Sartori
February 12, 2009 at 6:50 am
Actually no, but now that I've been forced to think about it in a logical way to explain it here, I might have been making too many assumption about what it is doing. I can rarely if ever replicate the problem and the only reason I know what it is doing is by pulling the query plan out of the DB. So actually the problem is likely that the query plan has aged out and the first person to call this statement does it with a date range in the future or just for one day with no transactions, this is then cached for the rest of the day unless I recalc the stats and a new plan is generated till tomorrow when the same thing happens!
So the question now boils down to is there a way to invalidate the query plan for one statement? (I can't go the recompile option because it isn't supported in our data access layer, don't ask!)
Thanks for listening, maybe I should start learning to talk to myself 😀
Ken.
February 12, 2009 at 6:57 am
I talk to myself all the time, to the horror of all my co-workers.
OK. You could try a query hint in this case, the OPTIMIZE FOR hint might help out. Experiment with it. In 2008 you can also OPTIMIZE FOR UNKOWN, which can work really well.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 13, 2009 at 3:07 am
It would help if you posted the view definition.
Maybe something is seriously wrong there?
SELECTSUM(Amount)
FROMvwCashTransactions
WHEREGroupAccountID = @P1
AND LedgerDate BETWEEN @P2 AND @P3
N 56°04'39.16"
E 12°55'05.25"
February 13, 2009 at 4:35 am
OK here you go all the gory details, it all looks OK to me. This is view...
ALTER VIEW [dbo].[vwConsolidatedCLT]
AS
SELECT CLT.TransID, CLT.LedgerDate, CON.ClAccountID, CLT.ClAccountID AS subclaccountid,
CLT.LineNumber, CLT.MovementType, CLT.MovementSource, CLT.CCYCode, CLT.Amount,
CLT.Balance, CLT.Reference, CLT.Narrative, CLT.DisplayToClient, CLT.Capital,
CLT.ProductCapital, CLT.UserID, CLT.DateCreated, CLT.LedgerSource, CLT.LedgerSourceID,
CLT.FromCCYCode, CLT.ToCCYCode, CLT.BaseCCYCode, CLT.FXRate, CLT.RestrictSweepUntil,
CLT.AdjustmentUserID, PD.ProductType,COALESCE(PD.ProductDisplayName,PD.ProductType) as ProductDisplayName,
CLT.ExcludeFromCalc
FROM ClientAccount.dbo.CashLedgerTransactions as CLT
INNER JOIN ClientAccount.dbo.Consolidate as CON
ON CON.SubClAccountID = CLT.ClAccountID
LEFT OUTER JOIN Discovery.dbo.ProductDetails as PD
ON PD.ClAccountID = CON.SubClAccountID
and this is the query...
SELECT [TransID], [LedgerDate], [ClAccountID], [subclaccountid], [LineNumber], [MovementType], [MovementSource],
[CCYCode], [Amount], [Balance], [Reference], [Narrative], [DisplayToClient], [Capital], [ProductCapital],
[UserID], [DateCreated], [LedgerSource], [LedgerSourceID], [FromCCYCode], [ToCCYCode], [BaseCCYCode], [FXRate],
[RestrictSweepUntil], [AdjustmentUserID], [ProductType], [ProductDisplayName], [ExcludeFromCalc]
FROM ClientAccount..vwConsolidatedCLT
WHERE [ClAccountID] = @0 AND [LedgerDate] > @1 AND [LedgerDate] <= @2 AND
([MovementType] Like @3 OR [MovementType] = @4) AND [ExcludeFromCalc] = @5
I'm heading for the OPTIMIZE FOR hint for now.
The density for each index is similar so the incorrect assumption is that the number of rows from consolidate returns more than the distinct ledgerdates in the range. From the stats on consolidate the largest number eq to any value is 96, but the majority are less than 24.
The stats on the cashledger transaction table indicate an average of 30K-50K rows per ledgerdate and 2K-4K per account.
So a worst case account query could get about (96*50K) 384K rows which would mean that any date range more than 12 days would be better by account, the average case would be (24*3K) 72K rows which would make a date range of more than 3 days less efficient. It is usual for the date range to be a month or two.
February 13, 2009 at 4:53 am
Ken Gaul (2/12/2009)
OK thanks, that surely can't be right though!
It does tend to happen with date columns, especially when the column in question is defaults to getdate and queries are looking for the current or previous day's data.
I had one system where it too 2 weeks of inserts to trigger an auto-update of the stats, but 2 days for the stats to get stale enough to generate a really, really bad plan.
Could you perhaps post the actual execution plan (saved as a .sqlplan file, zipped and attached)? There are some regulars who know their way around the plan and may be able to give you a better idea why the optimiser's making a mistake.
Fragmentation can affect the plan choice. A badly fragmented index makes scans more expensive. I haven't investigated details though.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 13, 2009 at 5:24 am
Yup can do, I've put the good and the bad plan in there for comparison.
February 13, 2009 at 5:34 am
Those are estimated plans. They don't have the information that I need. Please can you capture the actual plan and post that.
There's no way to identify a stats issue from the estimated plan. It does look like that may be the issue, the estimated rows of 1 on the index seeks looks suspicious, but without the actual rows property (that's only present on the actual plan) I can't say for sure.
Run the query with the execution plan option on and save the resulting plan.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 13, 2009 at 5:37 am
Ken Gaul (2/13/2009)
SELECT [TransID], [LedgerDate], [ClAccountID], [subclaccountid], [LineNumber], [MovementType], [MovementSource],
[CCYCode], [Amount], [Balance], [Reference], [Narrative], [DisplayToClient], [Capital], [ProductCapital],
[UserID], [DateCreated], [LedgerSource], [LedgerSourceID], [FromCCYCode], [ToCCYCode], [BaseCCYCode], [FXRate],
[RestrictSweepUntil], [AdjustmentUserID], [ProductType], [ProductDisplayName], [ExcludeFromCalc]
FROM ClientAccount..vwConsolidatedCLT
WHERE [ClAccountID] = @0 AND [LedgerDate] > @1 AND [LedgerDate] <= @2 AND
([MovementType] Like @3 OR [MovementType] = @4) AND [ExcludeFromCalc] = @5
Where are you calling that from? A stored procedure?
What are @1, @2, ...? Variables? Procedure parameters? Constants?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply