Query using a view as a subquery is failing to apply correct criteria

Question

Query using a view as a subquery is failing to apply correct criteria

cmerr5705

Old Hand

Points: 394
More actions
April 4, 2017 at 4:19 pm

#401897

Hello! First time poster and relatively new Sql Server user here!
Just came across a logic failure that I don't understand, hoping someone can help me understand what's happening here.

select sq.* from
(select a, b, c, costperunit
from myVIEW
where costperunit > 0) as sq;
Clearly, the outer select statement doesn't serve a purpose here, I'm actually doing an insert, but for testing purposes the extra select duplicates the issue. When I run the statement above, I get a divide by zero error. If I run the inner statement of just: (select a, b, c, costperunit from myview where costperunit > 0) the query runs fine!!
I know why I get the divide by zero error, because in the underlying view I'm dividing cost by volume to get cost per unit, and there are cases with zero volume. But for some crazy reason the parser is ignoring my inner view's filtering logic when this outer where-clause is applied.
So let me start from the beginning, I have a table of volume and cost (this table is joined to other tables, but not for the core data, more for filtering and looking up attributes, etc):
A, B, C, D, Volume, Cost
I have a view that does a simple sum and drops a couple columns:
select A, B, C, SUM(Volume) as Vol, SUM(Cost) as Cst FROM Table GROUP BY A, B, C
This leads me to the view giving me problems. This one simply divides cost/volume to get a per unit cost, and I have volume checks in place to ensure I don't divide by zero:
select A, B, C, (Cst/Vol) as CostPerUnit FROM View1 WHERE Vol > 0 AND Cst > 0
That's it. This view runs fine by itself, but blows up if I query it and (redundantly I know) verify that costperunit is indeed > 0.
Finally, why am I using views? It's a design decision. Our group migrated from Access and to keep people comfortable we are treating Views = Access Queries (they're visible, can be built in graphic editor, etc)
I've googled and found nothing. I played around with parser hints and forcing options etc, but nothing has helped so far. Obviously I can fix this by moving the final "where" check upstream, but I won't be comfortable with sql server until I understand why this is happening.
Thanks for any help!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

The optimizer can do (almost) anything it wants with your query as long as boolean and algebraically it still gives the same output or effect. Predicates can be pushed up or down, joins rearranged, etc. In this case in one query you are getting filtering before the division and in the other you are not. That does not make the second version incorrect.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

cmerr5705 Old Hand Points: 394 More actions · Answer 2

Thanks for the reply. I guess after years of MS Access experience using nested queries, I had never run across the case where a "WHERE" got processed in an order I didn't expect. The results of the first query always pass to the second, and so on.

Is this some peculiarity of working with views perhaps?
If I were to change from a WHERE to a HAVING, would that have any effect?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

Here's another example from my Common TSQL Mistakes session (originally from Plamen Ratchev):

Use tempdb set nocount on go

IF OBJECT_ID(N'Accounts', N'U') IS NOT NULL DROP TABLE dbo.Accounts;CREATE TABLE dbo.Accounts ( account_nbr INT NOT NULL PRIMARY KEY, account_type VARCHAR(20) NOT NULL CHECK (account_type IN ('Personal', 'Business Basic', 'Business Plus')), account_reference VARCHAR(30) NOT NULL);

INSERT dbo.Accounts VALUES(1, 'Personal', 'abc'); INSERT dbo.Accounts VALUES(2, 'Business Basic', '101'); INSERT dbo.Accounts VALUES(3, 'Personal', 'def'); INSERT dbo.Accounts VALUES(4, 'Business Plus', '5');

SELECT account_nbr, account_type, account_reference FROM dbo.Accounts;

-- errors SELECT account_nbr, account_reference AS account_ref_nbr FROM dbo.Accounts WHERE account_type LIKE 'Business%' AND CAST(account_reference AS INT) > 20;

Error: Conversion failed when converting the varchar value 'abc' to data type int.

--also errors SELECT account_nbr, account_ref_nbr FROM (SELECT account_nbr, CAST(account_reference AS INT) AS account_ref_nbr FROM dbo.Accounts WHERE account_type LIKE 'Business%') AS A WHERE account_ref_nbr > 20;

Error: Conversion failed when converting the varchar value 'abc' to data type int.

--CASE ROCKS!! SELECT account_nbr, account_reference AS account_ref_nbr FROM dbo.Accounts WHERE account_type LIKE 'Business%' AND CASE WHEN account_reference NOT LIKE '%[^0-9]%' THEN CAST(account_reference AS INT) END > 20;DROP TABLE dbo.Accounts;

/* KEY TAKEAWAYS

1) Remember that the optimizer can do almost anything it wants with your query as long as algebraically and logically it gives you the same output/effect

2) Learn and use the power of the CASE keyword!*/

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

drew.allen SSC Guru Points: 76989 More actions · Answer 4

One way to get around this is to ensure that you never divide by zero by making any zero denominator NULL.

SELECT a, b, c, cost/NULLIF(vol,0) as costperunitvol FROM view1 WHERE vol > 0

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

cmerr5705 Old Hand Points: 394 More actions · Answer 5

Thanks Drew. I know I could use various techniques to fix this particular situation, but I'm concerned that this issue will raise its head in other queries as a subtle error and I'll never realize it. My group's use case is a lot of nested queries/views, each building upon the other, and we operate on the assumption that query #1's logic is executed fully before you get to query #2, etc.

I guess I'm trying to build my knowledge up to a set of rules we can follow: ie if you're joining nested queries and using where clauses, you have to do x, y, and z, etc.

drew.allen SSC Guru Points: 76989 More actions · Answer 6

cmerr5705 - Wednesday, April 5, 2017 12:37 PM
My group's use case is a lot of nested queries/views, each building upon the other, ...

That is a recipe for poorly performing queries. It leads to redundant calls to tables and inefficient processing of records.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

drew.allen - Wednesday, April 5, 2017 12:59 PM
cmerr5705 - Wednesday, April 5, 2017 12:37 PM
My group's use case is a lot of nested queries/views, each building upon the other, ...
That is a recipe for poorly performing queries. It leads to redundant calls to tables and inefficient processing of records.
Drew

+100

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

cmerr5705 Old Hand Points: 394 More actions · Answer 8

That could be true, but, this is not a real time transactional system, our small team is focused on reporting and analysis. We're working with tables of less than 1M rows mostly, so not a lot of heavy lifting involved.

Say we have six months of sales history at a part number and general ledger account level. Some of those accounts are just high level journal entries with no part number. We want to calculate unit revenue but we want to include 100% of the revenue and spread it across the "real" part numbers. How do you do that?

1. One query that links to a part number table to give you the real part numbers, sum volume and quantity for those parts. Insert this data in to final table

2. One query to give sum up the revenue of the journal entries (yes, re-querying the same table as #1)

3. Now we have to allocate that revenue across the real part numbers, so we need a query using query #1 as a source that says this "x" part number is 5% of the total

4. Now we need a final query that joins #2 and #3, mutiplying #2 total revenue by #3 each part's share of that revenue, and perform an insert into our final table

That's an example of our business process, and I do realize I could use an Over-Partition on #1 to eliminate one of these queries, but this is just a basic example.

drew.allen SSC Guru Points: 76989 More actions · Answer 9

cmerr5705 - Wednesday, April 5, 2017 1:23 PM
That could be true, but, this is not a real time transactional system, our small team is focused on reporting and analysis. We're working with tables of less than 1M rows mostly, so not a lot of heavy lifting involved.
Say we have six months of sales history at a part number and general ledger account level. Some of those accounts are just high level journal entries with no part number. We want to calculate unit revenue but we want to include 100% of the revenue and spread it across the "real" part numbers. How do you do that?
1. One query that links to a part number table to give you the real part numbers, sum volume and quantity for those parts. Insert this data in to final table
2. One query to give sum up the revenue of the journal entries (yes, re-querying the same table as #1)
3. Now we have to allocate that revenue across the real part numbers, so we need a query using query #1 as a source that says this "x" part number is 5% of the total
4. Now we need a final query that joins #2 and #3, mutiplying #2 total revenue by #3 each part's share of that revenue, and perform an insert into our final table
That's an example of our business process, and I do realize I could use an Over-Partition on #1 to eliminate one of these queries, but this is just a basic example.

From your description, it sounds like ALL of it can be calculated efficiently in a single query. Of course, without sample data, it's hard to come up with a solution. Something like the following (using a single read of the tables)
; WITH part_totals AS ( SELECT part_id, SUM(revenue) AS part_revenue FROM journal_entries GROUP BY part_id ) , grand_totals AS ( SELECT pt.part_id, pn.part_name, pt.part_revenue, SUM(pt.part_revenue) OVER() AS grand_total FROM part_totals AS pt INNER JOIN parts AS pn ON pt.part_id = pn.part_id ) SELECT gt.part_id, gt.part_name, gt.part_revenue, 100.0 * gt.part_revenue / gt.grand_total AS part_pct FROM grand_totals gt ;

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

cmerr5705 Old Hand Points: 394 More actions · Answer 10

Thanks Drew, I see how much more efficient that would be. I'll have to ponder this, I was really hanging on to using views because that will allow the Access members of my team to almost seamlessly migrate over. Most of them will use the graphical query tool until they get comfortable writing sql by hand. If I go the CTE route most of our procedures will end up all text (which granted they're already full of deletes, inserts, etc so you have to know the text version anyways)

Regarding my original problem of unexpected query results, I actually tried moving my inner query to a CTE and still got the same error.

I ran an explain plan, and you can see exactly what is happening. The parser is combining the WHERE from my inner query with the WHERE from my outer query all in to one clause:

|--Filter(WHERE:(round([Expr1015]/[Expr1012],(2))>(0.000000000000000e+000) AND [Expr1012]>(0.000000000000000e+000) AND [Expr1015]>(0.000000000000000e+000)))

So it's easy to see in cases with zero volume why I'm getting a divide by zero error. I guess I would have expected the right half of that statement to be processed much earlier, and the left half to be processed near the end of the chain, but that's not the case.

I've spent all day trying to find a rule or behavior that would cause this to happen, or some way to prevent it. I think I'm giving up and going with what Kevin said to begin with: WHEREs can occur in any order, regardless of nesting, whether it's a subquery, a view, a CTE, two tables, etc. But I do think the special case here is each query was referencing the same base fields (vol + cost), and that's why the parser combined them. So any time there's a situation like this, I need to accommodate by using a workaround such as the NULLIF() you guys proposed, or write intermediate results to a temp table, or make sure my WHEREs refer to physically different fields.