April 11, 2013 at 11:31 pm
Today I got a mail from one of our developers. It is a little complicated to explain and I cannot post the DDL. So, let me post a piece of the code ('cleaned up') and explain what was going wrong.
CTE definition 'cleaned up'
;with t2 as
(
select
tb.product,
r.category,
p.date,
sum(p.value) as value,
ROW_NUMBER() over (partition by tb.product, r.category order by SUM(p.value)) as cnt
FROM
--tables
)
select t3.product, t3.category, SUM(t3.value) * -1 as Value
from
(
select t2.product, t2.category, t2.Value * @Ratio1 as value from t2 where cnt = @Count1
union all
select t2.product, t2.category, t2.Value * @Ratio2 as value from t2 where cnt = @Count2
) as t3
--where t3.category = 'ABC'
group by t3.product, t3.category
order by t3.product
Here t2 is a CTE which does an aggregation (SUM) and a row_number() generation. When the developer ran this query, he was getting incorrect output for column Value. When he added the filter by category, he got the correct output.
My analysis:
I removed the GROUP BY and ran the query, got the correct values for the column value (say Val1 and Val2).
When I added the GROUP BY, the output was wrong, i.e. SUM(t3.value) was not showing
Val1 + Val2.
I got a hunch that this had something to do with UNION ALL on CTE, so I modified the query as below to eliminate the UNION ALL op.
select t3.product, t3.category, SUM(t3.value) * -1 as Value
from
(
select t2.product, t2.category,
CASE
WHEN cnt = @Count1 THEN t2.value * @Ratio1
WHEN cnt = @Count2 THEN t2.value * @Ratio2
ENDas value
from t2 where cnt = @Count1 OR cnt = @Count2
) as t3
--where t3.category = 'ABC'
group by t3.product, t3.category
order by t3.product
This worked just fine, gave the expected output. What I would like to know is what could have caused a wrong aggregation when the query had a UNION ALL. I know the CTE gets evaluated twice but the CTE is deterministic. Any help here would be really great. I will post the CTE definition below.
https://sqlroadie.com/
April 11, 2013 at 11:33 pm
--removed duplicate post
https://sqlroadie.com/
April 12, 2013 at 1:27 am
This is an interesting one Arjun! Can you post the actual plan please?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 12, 2013 at 1:39 am
Yes, very interesting. I will need more time to prepare the sample tables. I am not supposed to post the original plan, can get fired for that.
https://sqlroadie.com/
April 14, 2013 at 9:53 pm
I tried to reproduce the problem using the following query, which does pretty much the same thing (warning: the query will not make any logical sense). However, I could not reproduce it. I'm still able to reproduce the problem on our development server, using the original query.
An interesting observation is the original query worked fine when I specified MAXDOP as 1 in the final select.
;WITH cte
AS
(
SELECT
o.name [tablename],
SUM(ac.max_length) [ColLength],
ROW_NUMBER() OVER (PARTITION BY o.name ORDER BY SUM(ac.max_length)) AS cnt
FROM
sys.objects o
INNER JOIN sys.COLUMNS c
ON c.OBJECT_ID = o.object_id
INNER JOIN sys.all_columns ac
ON ac.NAME = c.NAME AND ac.OBJECT_ID = c.OBJECT_ID
GROUP BY
o.name,
ac.max_length
)
SELECT
tablename, SUM(ColLength) * -1 [ColLength]
FROM
(
SELECT tablename, Collength * 2 [ColLength] FROM cte WHERE cnt = 1
UNION ALL
SELECT tablename, ColLength * 4 [ColLength] FROM cte WHERE cnt = 2
) t
GROUP BY tablename
https://sqlroadie.com/
April 17, 2013 at 3:53 am
Update on this:
I and my teammate have concluded that this a SQL bug - not easily reproducible. The initial query, however, always produces wrong result. We ran the same query with MAXDOP(1) and higher values. When MAXDOP(1) was specified, we got correct results. However, for each other number specified, we got a different value.
There is a similar problem which was reported by a use to MS, however, in that case, the CTE definition was non-deterministic as the ORDER BY in his ROW_NUMBER() was on duplicate values. We do plan to report this to MS, once we find time to produce sample data. Cheers!
https://sqlroadie.com/
April 17, 2013 at 6:19 am
Are the tables partitioned in any way?
April 17, 2013 at 6:22 am
I think yes; I'm quite new to this project, so, I will have to check before I confirm. I will update tomorrow.
https://sqlroadie.com/
April 18, 2013 at 3:28 am
I checked this up and yes, the tables which are used in the CTE definition (very first query in my first post) are partitioned. The partition is on a column BatchID. This column appears in the ON clause of a JOIN in the CTE.
https://sqlroadie.com/
April 18, 2013 at 6:26 am
Can you reproduce the error by setting up tables that look similar in structure and data type and recreate the issue?
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply