Duplicate code (rape and paste) in SQL Queries

Question

Post reply

Duplicate code (rape and paste) in SQL Queries

Jim Russell-390299

SSCrazy Eights

Points: 9333
More actions
January 23, 2008 at 8:31 am

#178419

Having done most of my programming in conventional procedural languages (and not much SQL), I have a real aversion to "rape and paste" programming. But I find some aspects of SQL seem to require it, and wonder if I am just missing something. There are at least two areas where I am frustrated:
1. Duplicating subqueries: OK, CTE's have made redundant references to the same subquery expression unnecessary, and I appreciate it.
2. Duplicating tests and conditions:
Example 1: Repeating the same test (conversion, whatever) in an order by clause because I am unable to refer to a column alias.
Example 2: Repeating the same test where multiple output columns depend on the contents of some other column. For example, the source table uses the same Amount field to represent (under different conditions) either Hours or Dollars. To move the Amount value to either of the Hours or Dollars output column (and leave the other null) I need to repeat the test (or its inverse).
Both examples would seem to call for either a Case statement, local variable or an If - Else clause or some sort of macro expansion, but they seem to not apply within a single SQL statement (Or, in the case of a Case statement, across output column boundaries.)
What is the accepted approach to avoiding redundancy within a query? UDFs?

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 1

For example 1 you can use also CTEs or subqueries.

Example:

SELECT a+b AS c FROM dbo.Table

ORDER BY a+b

--> SELECT c FROM (SELECT a+b FROM dbo.Table) SubQ ORDER BY c

--> WITH A AS (SELECT a+b FROM dbo.Table) SELECT c FROM A ORDER BY c

I dont understand what you want to improve with your 2nd example.

Do you have a more detailed explanation, e.g what how you would solve this in a .NET program?

Thanks!

Best Regards,

Chris Büttner

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

While you may be able to do away with some amount of redundancy in your code, blind refactoring to wring all of that out will usually be done at the expense of performance. Meaning - most methods that would remove all of the duplicated T-SQL code will often times run (quite a bit to a WHOLE LOT) slower than the "less elegant" code.

Functions are an example of that. They can make your code very pretty, but used incorrectly, they will suck the performance right out of your high-powered server. Of course - there are also a few scenarios where they perform just fine.

Between CTE's, sub-queries, and or the judicious use of constructs like temp tables, you should still be able to keep it down to a manageable level.

So - in short - I would start getting used to the fact that you might have some amount of duplicated code, and some amount of "multiple places in code to update" when logic changes.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 3

Christian: re example 2:

Instead of:

...

,case when (long-complex-conditional-expression)

then valuefield else 0 end as [$]

,case when not (long-complex-conditional-expression)

then valuefield else 0 end as [Hrs]

I would like to avoid the duplication with something like

set @isCondition when (long-complex-conditional-expression)) -- evaluated for each row

if @isCondition then

,valuefield as [$]

,0 as as [Hrs]

else

,0 as [$]

,valuefield as [Hrs]

end

or whatever alternate syntax would work...

Actually, the query in question involved an expression with a (long-complex-literal-list), which, in some other universe, could have been assigned to a local string or array variable. (If it wasn't in a view.) But while that would avoid the danger of having "...in @theList..." not match the " not in @theList...", the query would seem to still have to do lookup twice. (Or is the engine smart enough to see that the second lookup could be avoided?) (Second choice, if the list was in, e.g. a #temp table, would the engine avoid the second lookup?)

Guess I just have to admit "I'm not in Kansas anymore", and (as Matt suggested) and just get over it!

Thank you both.

Jeremy-475548 Hall of Fame Points: 3065 More actions · Answer 4

I'm a big fan of derived table for this purpose.

consider the following code that runs agains the AdventureWorks database

Declare @StartDate datetime, @EndDate datetime

Select @StartDate = '1/1/2003', @EndDate = '12/31/2003'

Select *,

Commission = TotalSales *

case

When PctAttained < 25 Then 0

When PctAttained < 50 Then CommissionPct * 0.50

When PctAttained < 75 Then CommissionPct * 0.75

ElseCommissionPct

End

From

(

Select TotalSales

, P.FirstName

, P.LastName

, SP.SalesPersonID

, SP.TerritoryID

, SP.SalesQuota

, SP.CommissionPct

, PctAttained = TotalSales / SalesQuota * 100.00

, OrderYear

, OrderMonth

From

(

Select

SOH.SalesPersonID,

sum(SOH.SubTotal) TotalSales,

Year(OrderDate) OrderYear,

month(OrderDate) OrderMonth

From Sales.SalesOrderHeader SOH

Where OrderDate Between @StartDate and @EndDate

--AND SalesPersonID = 280

group by

Year(OrderDate) ,

month(OrderDate) ,

SOH.SalesPersonID

) tab1

INNER JOIN Sales.SalesPerson SP ON tab1.SalesPersonID = SP.SalesPersonID

INNER JOIN Person.Contact P ON tab1.SalesPersonID = P.ContactID

) tab2

Where PctAttained >= 50

order by SalesPersonID, OrderYear, OrderMonth

While its a contrived example, I'm calculating a "Pct Attained" column, giving it a name, and referencing it in an outer query, so if the calc for the field ever changes, I change it in one place in the inner query, and that calculation automatically propagates to all outer levels. In the non-factored version, I would need to repeat the same calculation 4 times for the case statement and the where clause. You can take this concept with any complex case logic that you would want to encapsulate.

The downside of this is that if you have logic that is truly repeated across procedures, you still have to cut and paste, because as previously mentioned, functions can absolutely kill performance, but with some planning and testing, some of your common code could be made into views. Also, procedures that populate temp tables are another decent way to encapsulate logic.

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 5

Thanks Jeremy, nice example! (Would I lose anything by using CTE's just to keep from getting so confused?)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

As with most things Jim - "it depends". A lot of thing can be written using either notation with no difference in execution (derived table vs CTE), but there are exceptions on both sides.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Kenneth.Fisher SSC-Insane Points: 20406 More actions · Answer 7

Matt Miller (1/24/2008)
As with most things Jim - "it depends". A lot of thing can be written using either notation with no difference in execution (derived table vs CTE), but there are exceptions on both sides.

Unfortunately with SQL you have 1 major component that you don't have in other types of programming. The data.

SQL like most languages has multiple ways of handling the same thing. Unfortunately unlike most languages sometimes what works best depends more on what the data looks like now .. and what it will look like in the future .. than on any hard and fast rules.

You will see in a lot of posts where even the most experienced posters will tell you "test the code on your own data". Mostly because the results can differ fairly significantly from one data set to another.

Kenneth

Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

Jeremy-475548 Hall of Fame Points: 3065 More actions · Answer 8

Jim Russell (1/24/2008)
Would I lose anything by using CTE's just to keep from getting so confused?

I'm not yet a big fan of CTEs. I personally find this more confusing

With Tab1(SalesPersonID

,TotalSales

,OrderYear

,OrderMonth)

as

(

Select

SOH.SalesPersonID,

sum(SOH.SubTotal) TotalSales,

Year(OrderDate) OrderYear,

month(OrderDate) OrderMonth

From Sales.SalesOrderHeader SOH

Where OrderDate Between @StartDate and @EndDate

--AND SalesPersonID = 280

group by

Year(OrderDate) ,

month(OrderDate) ,

SOH.SalesPersonID

),

Tab2(TotalSales

, FirstName

, LastName

, SalesPersonID

, TerritoryID

, SalesQuota

, CommissionPct

, PctAttained

, OrderYear

, OrderMonth)

AS

(

Select TotalSales

, P.FirstName

, P.LastName

, SP.SalesPersonID

, SP.TerritoryID

, SP.SalesQuota

, SP.CommissionPct

, PctAttained = TotalSales / SalesQuota * 100.00

, OrderYear

, OrderMonth

From

Tab1

INNER JOIN Sales.SalesPerson SP ON Tab1.SalesPersonID = SP.SalesPersonID

INNER JOIN Person.Contact P ON Tab1.SalesPersonID = P.ContactID

)

Select *,

Commission = TotalSales *

case

When PctAttained < 25 Then 0

When PctAttained < 50 Then CommissionPct * 0.50

When PctAttained < 75 Then CommissionPct * 0.75

ElseCommissionPct

End

From

Tab2

Where PctAttained >= 50

order by SalesPersonID, OrderYear, OrderMonth;

But I guess it comes down to what you like looking at and what you find more maintainable.