March 26, 2007 at 12:00 pm
I have a view that joins 5 tables - Sales and payment have about 1M rows - customer a few hundred thousand - the other 2 tables only about 100 rows - all joins based on clustered indexes......the view limits data to the last 15 days....SalesDate is a varchar(8) with less than 0.5% of data on any selected date. Stats were just updated....
When the date is hardcoded, the view does an index seek on SalesDate and then uses the clustered indexes on the other tables to join appropriate data elements. When the select uses the date formula, the execution plan shows that a full scan is performed on all tables.....
Does any one have any ideas as to why the data function causes the optimizer to bypass the index - the transformation is on the correct side of the equation?
SELECT
TRD......,
CD.......,
PD.......,
Cur......,
EV.......
FROM
dbo.Tbl_SalesTransactions TRD
JOIN dbo.Tbl_CustomerData CD ON TRD.CID = CD.CID
JOIN dbo.Tbl_PaymentData PD ON TRD.PID = PD.PID
JOIN dbo.Tbl_Currencies Cur ON TRD.ISOCurrencyCode = Cur.ISOCurrencyCode
JOIN dbo.Tbl_ExecutingVendor EV ON TRD.EVID = EV.EVID AND EV.Active ='Y'
WHERE
(TRD.XLINK = '' OR TRD.XLINK IS NULL)
AND TRD.SalesType IN (1,11)
AND TRD.TxFlag<>'C'
--AND TRD.SalesDate >= CONVERT(CHAR(8),DATEADD(DAY,-15,GetDate()),112)
--AND TRD.SalesDate >= '20070311'
Thanks,
Harley
March 26, 2007 at 1:28 pm
There have been statements in similar incidents of implicit conversion might be at the root of this.
Try making the data varible, set the variable and use the variable to see what happens.
Try this also
AND TRD.SalesDate >= CONVERT(VARCHAR(8),DATEADD(DAY,-15,GetDate()),112)
the other way might be causing TRD.SalesDate to implicitly convert to CHAR(8) according to the other side which would slow things a bit.
I always suggest using the correct data type to represent your data and I would say use DATETIME instead.
March 26, 2007 at 4:20 pm
What's datatype for TRD.SalesDate?
_____________
Code for TallyGenerator
March 26, 2007 at 4:51 pm
Sergiv,
TRD.SalesDate is a char(8).........
March 26, 2007 at 4:59 pm
Antares,
The system has been around for many years - and the decision re the data type was apparently made at inception - the data is imported from another system in a fixed length format so was set to a char(8) to match the input.
I tried changing the conversion from a char(8) to a varchar(8) in the view and the change had no impact on the query path........I don't have the option of using a variable since there is no way to create and set an initial value to one in a view.......
Thanks for the ideas,
Harley
March 26, 2007 at 5:23 pm
Then you need just make the index on this column clustered.
_____________
Code for TallyGenerator
March 26, 2007 at 5:43 pm
The clustered index is on SalesID which is a sequential ID that is the primary means of accessing the records......there are several other indexes on various fields including SalesDate, InsertDate, CloseDate, etc. - I'm afriad I don't have the latitude to change the clustered index.....
March 26, 2007 at 6:03 pm
Bad for you.
Do you have at least one query selecting sequential SalesID's?
WHERE SalesID between ### and ###
?
If not then there is no use for that clustered index in your system.
It's just useless overhead.
Clustered index is required on a column which is mainly used for range selections. Normal indexes are typically ignored for this kind of selections.
It's long to explain why, but if you read some books about mechanics behind indexing and draw yourself a picture what it takes to read normal index in query WHERE SalesDate > @SomeDate you'll see that is longer and takes more resources than table scan.
When you have parameter hardcoded optimizer can estimate amount of rows to scan, and if it's below 1% it will choose to use index.
But when you supply variable with any possible value in it optimizer will choose more safe plan, best in most cases, not only for some specific values you are interested in.
_____________
Code for TallyGenerator
March 27, 2007 at 9:54 am
What is the final rowcount estimate for each way?
March 27, 2007 at 5:12 pm
Jeff,
Displaying the estimated query plan using the convert shows 4150 rows - doing the same with a hardcoded date shows 1 row - the actual number of rows for today using the conditions listed is 0 rows.........
Harley
March 27, 2007 at 6:01 pm
Harley, do simple exercise:
CREATE TABLE dbo.SalesChanges (
SalesID int,
ActionID tinyint, -- Say, 1 = CREATE, 2 = Insert, 3 = Close, etc.
ActionDate
)
CREATE CLUSTERED INDEX CX_ActionDate ON dbo.SalesChanges(ActionDate)
CREATE INDEX IX_SalesID ON dbo.SalesChanges(SalesID)
CREATE INDEX IX_ActionID ON dbo.SalesChanges(ActionID)
Insert into this table all your dates from original table.
Then do SELECT from these 2 tables joined :
WHERE ActionID = 1 AND ActionDate > GETDATE() - 15
See what query plan will say you.
_____________
Code for TallyGenerator
March 28, 2007 at 8:14 am
So which one is more realistic for an average run of the query? You might force the rowcount estimate using a fast X query hint.
Try running the query with the convert, but add
OPTION (Fast 1)
at the end of the query. Look at the plan and see if it is better than the one you get without the rowcount hint.
March 28, 2007 at 9:27 am
Dumb question and usually the obvious first I ask but what is you maintenance like. Do you rebuild the indexes periodically, update usage, update stats and such to be sure the optimizer has the correct current data for index decisions?
March 28, 2007 at 10:14 am
Thanks for everyone's input --
Sergiv - I'm afraid the result won't matter - there are serveral indexed date fields on the table and I need to be able to run similar views using each date field as the primary selection criteria. Since I can only have one clustered index, several views would still function poorly.
Jeff - I tried the fast 1 option - the estimated rows then go to 1 for each WHERE statement - but the query path with the calculated date still performs full table scans.
I also tried an index hint on SalesDate - the optimizer uses that index but still scans the other table. When the date is hardcoded (with or without the hint), the path uses index seeks on all tables.
Antares - the tables are reindex about every 60 days - autostats is on and the update stats was just run. I checked the estimated records returned, ran update usage and then checked them again - they were still the same.
This seems to be an anomally in the the function of the query optimizer - it makes a poor decision in determining a path when the date is not hardcoded.
March 28, 2007 at 10:20 am
Try sp_updatestats as well. Large tables tend to fall behind with AUTO STATS because that is based on a percentage of change before it actually runs, there was a thread about this a year or more back.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply