December 3, 2007 at 12:50 pm
declare @xdate varchar(8)
select @xdate = '20071130'
SELECT
a.Field1, a.Field2, a.Field3, max(c.Field1)
FROM
TableA a
INNER JOIN TableB b ON a.FieldX = b.FieldX
LEFT JOIN TableC c on a.FieldY = c.FieldY AND a.FieldZ = c.FieldZ
WHERE
a.Verified IN ('X', 'F', 'N', '')
AND a.XDate = '20071130'
--AND a.XDate = @xdate
GROUP BY
a.Field1, A.Field2, A.Field3
>Table =10M rows, TableB=10M rows, TableC=19M rows
>There is a 1 to 1 relationship between TableA and Table B - FieldX is the PK in both tables
>TableC has a compound index on FieldA, FieldB - it is not the PK in the table
>There is an index in place on TableA.XDate - approx 20,000 records per day out of 10M total
When the query is run with a hardcoded date, the query plan indicates that the XDate index is used on TableA to limit the rows selected. The optimizer then joins properly to TableB on the PKs - but 77M rows are pulled from TableC (which is 4x the number of actual rows in the table - one row for each item in the IN statement???) The 500 summary records return in about 8 minutes.
When the hardcoded date is removed and the date is passed in through a variable, the query uses the indexes on all 3 tables properly and it returns the same 500 summary records in 2-3 seconds.
I'm interested in thoughts as to why the query runs in 3 seconds when a variable is used - and 8 minutes when the date is hardcoded. I have found that several queries in the system are suffering the same strange behavior.
Please don't critique the design - it has been in place for years and I'm not able to make any changes. The system has a rule-based front-end and many of the queries are dynamic SQL generated by the rules engine.
Thanks,
Harley
December 3, 2007 at 1:01 pm
Is is the same for all dates? Meaning hard coding different dates.
It's possible that there's a different level of selectivity and a different plan is used.
December 3, 2007 at 1:06 pm
Steve,
Dates from mid-Nov back all behave the same and seem to operate properly when run either way - the more recent dates are subject to the odd pattern - the strange thing though is that the same date, say '20071130', behaves badly when hardcoded and performs excellent when passed in a variable....
By the way, I enjoy your columns.
Thanks for the response,
Harley
December 3, 2007 at 1:49 pm
Could it be because of the execution plan re-use? I.e., you execute the query first for mid-november and the plan is cached. Then when you run it for the recent dates as parameters SQL is re-using the plan but if you run for hard-coded dates it generates a new non-optimal plan every time.
December 3, 2007 at 4:51 pm
that is very strange. Can you look at the plans for one good and one bad execution and see a difference?
Or post them?
December 3, 2007 at 5:02 pm
Steve,
The plans are very different - index seek on xDate in both - then loop join to TableB - then
> the poor plan scans TableC and shows 77M records retrieved
>the other plan shows a loop join to TableC and 500 records retrieved.......
It may have been bad stats - we have been updating stats nightly with default sampling - updating the stats with 100% sampling seems to have brought the plan back to normal with identical performance with either a hardcoded date or a date passed in a variable.
Thanks,
Harley
December 3, 2007 at 5:14 pm
Thanks for the update. You might have some weird selectivity there with the data.
Good to know it's working.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply