January 23, 2008 at 8:31 am
Having done most of my programming in conventional procedural languages (and not much SQL), I have a real aversion to "rape and paste" programming. But I find some aspects of SQL seem to require it, and wonder if I am just missing something. There are at least two areas where I am frustrated:
1. Duplicating subqueries: OK, CTE's have made redundant references to the same subquery expression unnecessary, and I appreciate it.
2. Duplicating tests and conditions:
Example 1: Repeating the same test (conversion, whatever) in an order by clause because I am unable to refer to a column alias.
Example 2: Repeating the same test where multiple output columns depend on the contents of some other column. For example, the source table uses the same Amount field to represent (under different conditions) either Hours or Dollars. To move the Amount value to either of the Hours or Dollars output column (and leave the other null) I need to repeat the test (or its inverse).
Both examples would seem to call for either a Case statement, local variable or an If - Else clause or some sort of macro expansion, but they seem to not apply within a single SQL statement (Or, in the case of a Case statement, across output column boundaries.)
What is the accepted approach to avoiding redundancy within a query? UDFs?
January 23, 2008 at 1:40 pm
For example 1 you can use also CTEs or subqueries.
Example:
SELECT a+b AS c FROM dbo.Table
ORDER BY a+b
--> SELECT c FROM (SELECT a+b FROM dbo.Table) SubQ ORDER BY c
--> WITH A AS (SELECT a+b FROM dbo.Table) SELECT c FROM A ORDER BY c
I dont understand what you want to improve with your 2nd example.
Do you have a more detailed explanation, e.g what how you would solve this in a .NET program?
Thanks!
Best Regards,
Chris Büttner
January 23, 2008 at 2:23 pm
While you may be able to do away with some amount of redundancy in your code, blind refactoring to wring all of that out will usually be done at the expense of performance. Meaning - most methods that would remove all of the duplicated T-SQL code will often times run (quite a bit to a WHOLE LOT) slower than the "less elegant" code.
Functions are an example of that. They can make your code very pretty, but used incorrectly, they will suck the performance right out of your high-powered server. Of course - there are also a few scenarios where they perform just fine.
Between CTE's, sub-queries, and or the judicious use of constructs like temp tables, you should still be able to keep it down to a manageable level.
So - in short - I would start getting used to the fact that you might have some amount of duplicated code, and some amount of "multiple places in code to update" when logic changes.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 24, 2008 at 4:55 am
Christian: re example 2:
Instead of:
...
,case when (long-complex-conditional-expression)
then valuefield else 0 end as [$]
,case when not (long-complex-conditional-expression)
then valuefield else 0 end as [Hrs]
I would like to avoid the duplication with something like
set @isCondition when (long-complex-conditional-expression)) -- evaluated for each row
if @isCondition then
,valuefield as [$]
,0 as as [Hrs]
else
,0 as [$]
,valuefield as [Hrs]
end
or whatever alternate syntax would work...
Actually, the query in question involved an expression with a (long-complex-literal-list), which, in some other universe, could have been assigned to a local string or array variable. (If it wasn't in a view.) But while that would avoid the danger of having "...in @theList..." not match the " not in @theList...", the query would seem to still have to do lookup twice. (Or is the engine smart enough to see that the second lookup could be avoided?) (Second choice, if the list was in, e.g. a #temp table, would the engine avoid the second lookup?)
Guess I just have to admit "I'm not in Kansas anymore", and (as Matt suggested) and just get over it!
Thank you both.
January 24, 2008 at 9:31 am
I'm a big fan of derived table for this purpose.
consider the following code that runs agains the AdventureWorks database
Declare @StartDate datetime, @EndDate datetime
Select @StartDate = '1/1/2003', @EndDate = '12/31/2003'
Select *,
Commission = TotalSales *
case
When PctAttained < 25 Then 0
When PctAttained < 50 Then CommissionPct * 0.50
When PctAttained < 75 Then CommissionPct * 0.75
ElseCommissionPct
End
From
(
Select TotalSales
, P.FirstName
, P.LastName
, SP.SalesPersonID
, SP.TerritoryID
, SP.SalesQuota
, SP.CommissionPct
, PctAttained = TotalSales / SalesQuota * 100.00
, OrderYear
, OrderMonth
From
(
Select
SOH.SalesPersonID,
sum(SOH.SubTotal) TotalSales,
Year(OrderDate) OrderYear,
month(OrderDate) OrderMonth
From Sales.SalesOrderHeader SOH
Where OrderDate Between @StartDate and @EndDate
--AND SalesPersonID = 280
group by
Year(OrderDate) ,
month(OrderDate) ,
SOH.SalesPersonID
) tab1
INNER JOIN Sales.SalesPerson SP ON tab1.SalesPersonID = SP.SalesPersonID
INNER JOIN Person.Contact P ON tab1.SalesPersonID = P.ContactID
) tab2
Where PctAttained >= 50
order by SalesPersonID, OrderYear, OrderMonth
While its a contrived example, I'm calculating a "Pct Attained" column, giving it a name, and referencing it in an outer query, so if the calc for the field ever changes, I change it in one place in the inner query, and that calculation automatically propagates to all outer levels. In the non-factored version, I would need to repeat the same calculation 4 times for the case statement and the where clause. You can take this concept with any complex case logic that you would want to encapsulate.
The downside of this is that if you have logic that is truly repeated across procedures, you still have to cut and paste, because as previously mentioned, functions can absolutely kill performance, but with some planning and testing, some of your common code could be made into views. Also, procedures that populate temp tables are another decent way to encapsulate logic.
January 24, 2008 at 9:46 am
Thanks Jeremy, nice example! (Would I lose anything by using CTE's just to keep from getting so confused?)
January 24, 2008 at 9:50 am
As with most things Jim - "it depends". A lot of thing can be written using either notation with no difference in execution (derived table vs CTE), but there are exceptions on both sides.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 24, 2008 at 10:07 am
Matt Miller (1/24/2008)
As with most things Jim - "it depends". A lot of thing can be written using either notation with no difference in execution (derived table vs CTE), but there are exceptions on both sides.
Unfortunately with SQL you have 1 major component that you don't have in other types of programming. The data.
SQL like most languages has multiple ways of handling the same thing. Unfortunately unlike most languages sometimes what works best depends more on what the data looks like now .. and what it will look like in the future .. than on any hard and fast rules.
You will see in a lot of posts where even the most experienced posters will tell you "test the code on your own data". Mostly because the results can differ fairly significantly from one data set to another.
Kenneth
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
January 24, 2008 at 10:25 am
Jim Russell (1/24/2008)
Would I lose anything by using CTE's just to keep from getting so confused?
I'm not yet a big fan of CTEs. I personally find this more confusing
With Tab1(SalesPersonID
,TotalSales
,OrderYear
,OrderMonth)
as
(
Select
SOH.SalesPersonID,
sum(SOH.SubTotal) TotalSales,
Year(OrderDate) OrderYear,
month(OrderDate) OrderMonth
From Sales.SalesOrderHeader SOH
Where OrderDate Between @StartDate and @EndDate
--AND SalesPersonID = 280
group by
Year(OrderDate) ,
month(OrderDate) ,
SOH.SalesPersonID
),
Tab2(TotalSales
, FirstName
, LastName
, SalesPersonID
, TerritoryID
, SalesQuota
, CommissionPct
, PctAttained
, OrderYear
, OrderMonth)
AS
(
Select TotalSales
, P.FirstName
, P.LastName
, SP.SalesPersonID
, SP.TerritoryID
, SP.SalesQuota
, SP.CommissionPct
, PctAttained = TotalSales / SalesQuota * 100.00
, OrderYear
, OrderMonth
From
Tab1
INNER JOIN Sales.SalesPerson SP ON Tab1.SalesPersonID = SP.SalesPersonID
INNER JOIN Person.Contact P ON Tab1.SalesPersonID = P.ContactID
)
Select *,
Commission = TotalSales *
case
When PctAttained < 25 Then 0
When PctAttained < 50 Then CommissionPct * 0.50
When PctAttained < 75 Then CommissionPct * 0.75
ElseCommissionPct
End
From
Tab2
Where PctAttained >= 50
order by SalesPersonID, OrderYear, OrderMonth;
But I guess it comes down to what you like looking at and what you find more maintainable.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply