June 15, 2015 at 11:21 am
Thanks Lynn. I updated the post to reflect the correct name of that table. You caught my refactoring error.
Some of the tables that this query is dealing with are of a decent size (1 million and 10 million+) so I will look into what you pointed out. What initially didn't make sense to me, though, was the discrepancy between the time it took to iteratively run this in a dynamic query vs the time to run it with static parameters. I guess what you're saying is that this query just needs to be further optimized? Like I said in the original post, this runs in 2-5 seconds when I'm running the history function directly which seems completely reasonable to me.
June 15, 2015 at 2:34 pm
No promises, but here is one possible rewrite:
CREATE FUNCTION [dbo].[SubQuery] (
@OnDate DATE,
@EarliestDated DATE,
@LatestDated DATE,
@Number FLOAT,
@Year INT,
@Substitution INT
)
RETURNS TABLE
AS
RETURN (
SELECT
Object.*,
Sums.ModifiedSize AS Amt,
Year(Object.Date) as Year
FROM (
SELECT
Transactions.Object,
sum(ModifiedPar) AS ModifiedSize
FROM (
SELECT
FilteredS.Object,
CASE TypeIndicator
WHEN 'P'
THEN isnull(Amt, @Substitution) * - 1
WHEN 'S'
THEN isnull(Amt, @Substitution)
ELSE 0
END AS ModifiedSize
FROM (
SELECT
o.*
FROM
Object o
cross apply (select min(o1.Number), max(o1.Number), min(o1.Date), max(o1.Date) from Object o1)dt(MinNumber,MaxNumber,MinDate,MaxDate)
WHERE
o.DatedDate >= @EarliestDated and o.DatedDate <= isnull(@LatestDated,cast('9999-12-31' as date)) AND
--AND (
-- @LatestDated IS NULL
-- OR DatedDate <= @LatestDated
-- )
o.Number between isnull(@Number,dt.MinNumber) and isnull(@Number,dt.MaxNumber) AND
--AND (
-- @Number IS NULL
-- OR Object.Number = @Number
-- )
o.Date >= dateadd(year,isnull(@Year,year(MinDate)) - 1900, 0) AND o.Date < dateadd(year,(isnull(@Year,year(MaxDate)) - 1900) + 1, 0)
--AND (
-- @Year IS NULL
-- OR year(Object.Date) = @Year
-- )
AND FinalSize IS NOT NULL
) FilteredS
LEFT JOIN (
SELECT
*
FROM
transactions
WHERE
TypeIndicator != 'D' AND
TransDate <= isnull(@OnDate,cast('9999-12-31' as date))
--AND (
-- @OnDate IS NULL
-- OR TransDate <= @OnDate
-- )
) Transactions
ON FilteredS.Object = Transactions.Object
) AS Transactions
GROUP BY
Transactions.Object
) Sums
LEFT JOIN Object
ON Object.Object = Sums.Object
);
June 15, 2015 at 4:09 pm
Lynn Pettis (6/15/2015)
No promises, but here is one possible rewrite:
Thanks for taking the time to do this, but for some reason it actually worsened performance when I ran it using both methods. The dynamic query took 1:43 and the static query took 2-5 seconds. I was, however, able to get the static query down to 1 second by removing the year() function per your suggestion earlier.
I'm still leaning towards this issue having something to do with the CTE/tally table/cross apply in the outer query. The execution plan shows the same operation taking up the majority (~85%) of the cost regardless of whether or not it takes 1 second or 1:30 to run.
Here's the outer function I am currently working with (changing "where" to "top" reduced the dynamic query to 20 seconds, which is still 5-10x longer than running the static query). I was not able to successfully move the cross apply to nums since the subquery has a dependency on num. Also tried converting this to a multistatement TVF and declaring variables locally, as I've read helps in some cases, but that also resulted in worsened performance.
ALTER FUNCTION [dbo].[history] (
@Since DATE,
@EarliestDated DATE = '6/1/2014',
@LatestDated DATE = NULL,
@Number FLOAT,
@Year INT,
@Substitution INT = 5000000
)
RETURNS TABLE
AS
RETURN (
WITH N0 AS (
SELECT 1 AS n
UNION ALL
SELECT 1
),
N1 AS (
SELECT 1 AS n
FROM N0 t1,
N0 t2
),
N2 AS (
SELECT 1 AS n
FROM N1 t1,
N1 t2
),
N3 AS (
SELECT 1 AS n
FROM N2 t1,
N2 t2
),
nums AS (
SELECT TOP (DATEDIFF(day, @Since, GETDATE()) + 1) ROW_NUMBER() OVER (
ORDER BY (
SELECT 1
)
) AS num
FROM N3
)
SELECT DATEADD(day, num - 1, @Since) AS thedate,
*
FROM nums
CROSS APPLY (
SELECT count(*) AS objects,
sum(totalamt) AS totalamt,
sum(size) AS totalsize
FROM dbo.SubQuery(DATEADD(day, num - 1, @Since), @EarliestDated, @LatestDated, @Number, @Year, @Substitution)
GROUP BY Number,
Year
) thetable
)
June 15, 2015 at 4:38 pm
Maybe this:
ALTER FUNCTION [dbo].[history] (
@Since DATE,
@EarliestDated DATE = '6/1/2014',
@LatestDated DATE = NULL,
@Number FLOAT,
@Year INT,
@Substitution INT = 5000000
)
RETURNS TABLE
AS
RETURN (
WITH
N0(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), -- 10 rows
N1(n) AS (SELECT 1 FROM N0 a cross join N0 b), -- 100 rows
N2(n) AS (SELECT 1 FROM N1 a cross join N1 b), -- 10,000 rows
Dates(TheDate) AS (SELECT TOP (DATEDIFF(day, @Since, GETDATE()) + 1) dateadd(day,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1,@Since) FROM N2 a cross join N1 b) -- up to 1,000,000 dates (probably over flow the date data type)
select
d.TheDate,
thetable.*
from
Dates d
CROSS APPLY ( SELECT
count(*) AS objects,
sum(totalamt) AS totalamt,
sum(size) AS totalsize
FROM
dbo.SubQuery(d.TheDate, @EarliestDated, @LatestDated, @Number, @Year, @Substitution)
GROUP BY
Number,
Year
) thetable
);
I still think the subquery function needs to be rewritten and perhaps some indexing to support it.
June 16, 2015 at 8:26 am
Lynn Pettis (6/15/2015)
Maybe this:I still think the subquery function needs to be rewritten and perhaps some indexing to support it.
Gave this a try, but really didn't change anything. It seems like no matter what I do, when I call the function from a parameterized query, it takes exponentially longer. I added some indexes (eliminated that 80% key lookup from query plan) which dropped the static call to 1s consistently, but had no impact on the parameterized call. I'd expect the 2 calls to be the same or close, especially with the same query plan?
All of these changes have had some impact on the static call, but still the only one that really affected the parameterized call positively was changing that "where" clause to "top".
I'm running 2008r2 RTM, wondering if that could have something to do with it? At this point I'm ready to write my application to just make the static call, but that will make things much more difficult when I eventually add a TVP for additional filtering.
June 16, 2015 at 12:10 pm
So it seems this is definitely an issue of parameter sniffing.
While option(recompile) did nothing, option(optimize for unknown) brought the dynamic query down to 3 seconds. Still slower, but I can live with this.
Is there some more correct way to fix this, or would simply leaving that as part of the query be the best solution?
July 7, 2015 at 3:25 pm
phosplait (6/16/2015)
So it seems this is definitely an issue of parameter sniffing.While option(recompile) did nothing, option(optimize for unknown) brought the dynamic query down to 3 seconds. Still slower, but I can live with this.
Is there some more correct way to fix this, or would simply leaving that as part of the query be the best solution?
Just curious if you were to use a couple CTEs that create virtual tables with all the parameter values as records, whether that might eliminate the sniffing problem. It would add a join to every query where a parameter appears. It would go something like this:
WITH DATE_PARMS AS (
SELECT 'Since' AS PARM_NAME, @Since AS DATE_VALUE
UNION ALL
SELECT 'EarliestDated', @EarliestDated
UNION ALL
SELECT 'LatestDated', @LatestDated
),
FLOAT_PARMS AS (
SELECT 'Number' AS PARM_NAME, @Number AS FLOAT_VALUE
),
INT_PARMS AS (
SELECT 'Year' AS PARM_NAME, @Year AS INT_VALUE
UNION ALL
SELECT 'Substitution', @Substitution
)
...
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
July 8, 2015 at 4:33 am
phosplait (6/16/2015)
So it seems this is definitely an issue of parameter sniffing.While option(recompile) did nothing, option(optimize for unknown) brought the dynamic query down to 3 seconds. Still slower, but I can live with this.
Is there some more correct way to fix this, or would simply leaving that as part of the query be the best solution?
The first posting of the History function looks like this:
CREATE FUNCTION [dbo].[history] (
@First DATE,
@DatedStart DATE = '6/1/2014',
@DatedEnd DATE = NULL,
@Number FLOAT,
@Year INT,
@Sub INT = 5000000
)
...
SELECT DATEADD(day, num - 1, @Sub) AS thedate,
which of course throws an error. I might be wrong, but this suggests that you're in the middle of designing the process, rather than query tuning a process which has been identified as a bottleneck. If this is the case you might want to reconsider the design and configure the inner iTVF as a bogstandard query and reconfigure the outer iTVF as a new function returning a bunch of dates. I'm confident you'd have a far less messy structure to work with. Rejigging your inner iTVF results in something like this:
SELECT
[Year] = YEAR(o.Date),
o.Object,
Amt = SUM(x.ModifiedSize)
FROM [Object] o
INNER JOIN transactions t
ON t.object = o.object
CROSS APPLY (
SELECT ModifiedSize = CASE t.TypeIndicator
WHEN 'P' THEN isnull(Amt, @Substitution) * - 1
WHEN 'S' THEN isnull(Amt, @Substitution)
ELSE 0 END
) x
WHERE o.DatedDate >= @EarliestDated
AND (@LatestDated IS NULL OR o.DatedDate <= @LatestDated)
AND (@Number IS NULL OR o.Number = @Number)
AND (@Year IS NULL OR YEAR(o.Date) = @Year)
AND o.FinalSize IS NOT NULL
AND t.TypeIndicator != 'D'
AND (@OnDate IS NULL OR t.TransDate <= @OnDate)
GROUP BY
YEAR(o.Date),
o.Object
which is simple, easy to adapt, and easy to mark as a problem catch-all query to which the usual caveats and cures apply. The point to remember is this: if you run the process with a single date and then with the widest date range which makes sense with your data, the two execution plans are unlikely to be the same.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply