Joe Celko tackles the subject of the Stored Procedure and its place in database design. What he writes is food for thought, even for experienced database developers.
In level one, we named data elements for what they are and classified them. In level two, we modeled the data elements with data types and simple row or column constraints in SQL to give us valid rows for each occurrence of the entity. In level three, we put these rows into tables for entities, relationships and auxiliary data. In level four we had the base tables, connected them together, added secondary indexes and then build VIEWs.
This should give you a wonderful, but false, feeling of being done. In a very simple database, this is true. But in a real database there are more structures to consider. Among these other schema level things are: cursors, triggers and stored procedures. There are more things like collations, translations, privileges and suchlike. I am going to deal with only those three things - cursors, triggers and stored procedures - I named and only in general ways. T-SOL and other products can get highly proprietary in spite of the ANSI/ISO Standards. The reason is simple; these were things that were built on top of the preexisting file systems that early SQL products used. These procedural constructs were ways of making up for the lack of declarative code in the old products. The vendors have users who are locked into this "code museum" today and cannot abandon their client base.
Procedure Headers
SQL allows procedural code modules to be kept in the schema. While there is a SQL/PSM (Persistent Stored Module) language in Standard SQL, you will probably use a proprietary vendor language like T-SQL. These proprietary languages are usually members of the Algol family; that mean they have an IF-THEN-ELSE, a WHILE loop and BEGIN-END blocks with scoping rules.
Most of these proprietary languages were never meant to be used for applications development. The rules of thumb for T-SQL are not to write any procedure over 50 lines (one page) and not to use PRINT. But in fact, you could avoid all set oriented programming,treat each table like a file and code as if the database is a procedural file system. You can also drive nails into your body if you like pain.
The purpose of stored procedures is much like VIEWs. They guarantee that all the users will do the same task the same way all the time. VIEWs encapsulate a query and give it a name, thus creating a virtual table. A procedure encapsulates UPDATEs, INSERTs, DELETEs and SELECTs in a similar way, but it adds parameters.
A parameter is the "place holder" in the procedure header and an argument is the actual value passed to the procedure. T-SQL parameters used to be limited to simple scalar values. Today, they can be table valued and XML strings. Let's start with the simple scalar parameters.
The compiler reads an argument and checks its data type, range and other things to be sure it is valid. The compiler is doing a lot more work than you might think. Consider this bit of T-SQL:
SELECT ISDATE ('2010-01-01'); -- TRUE
SELECT CAST ('2010-01-01' AS DATE); -- no problem
SELECT ISDATE ('010-01-01'; --TRUE
SELECT CAST ('010-01-01' AS DATE); -- error
Likewise, floating point notation works funny. If you cast the exponential notation as DECIMAL or FLOAT, you have no surprises.
SELECT CAST (62.3E8 AS FLOAT); -- returns 6230000000
But now put the exponential notation in a string and try to cast it to DECIMAL or FLOAT, you do have a surprise, thus:
SELECT CAST ('62.3E8' AS DECIMAL(18,5)); -- error
SELECT CAST ('62.3E8' AS FLOAT); -- returns 6230000000
If you were parsing a string, you have to allow for the exponential notation with code like this:
SELECT CAST(CAST ('62.3E8' AS FLOAT) AS DECIMAL(18,5)); -- 6230000000.00000
Now try passing these test values as parameters and see how they behave.
CREATE PROCEDURE Test
(@in_test_date DATE,
@in_test_decimal DECIMAL(18,5))
AS
BEGIN
SELECT @in_test_date, @in_test_decimal;
END;
EXEC Test '2010-01-01', 62.3E8;
Returns (2010-01-01, 6230000000.00000) as desired.
You cannot pass an expression as an argument, but you can pass a local variable, which is previously set with the value of an expression. That means this fails:
EXEC Test '2010-01-01', 62.3 * POWER(10, 8) ;
But this will succeed:
BEGIN
DECLARE @local_decimal DECIMAL (18,5);
SET @local_decimal = 62.3 * POWER(10, 8);
EXEC Test '2010-01-01', @local_decimal ;
END;
You will see new SQL programmers try to pass arguments as XML or as CSV (Comma Separated Values) list strings. They have added the overhead of a parser (the XML parser or whatever code they wrote for splitting up the CSV string) and endangered their data integrity. SQL Server can handle over 2,000 arguments, which is more than enough power for any realistic situation.
Another underused feature in the parameter list is the default value. The syntax is quite simple. The parameter declaration is followed by an equal sign and an appropriate constant value.
If an argument is not provided, the default value is used instead.
CREATE PROCEDURE Test
(@in_test_date DATE = '2010-01-01',
@in_test_decimal DECIMAL(18,5) = 0.00000
)
AS
BEGIN
SELECT @in_test_date, @in_test_decimal;
END;
Unless you specifically assign the argument to the parameter, they are assigned in left to right order,
EXEC Test; -- returns (2010-01-01, 0.00000)
EXEC Test '2010-12-25' -- returns (2010-12-25, 0.00000)
EXEC Test @in_test_date = 789; -- returns (2010-01-01, 789.00000)
Finally, a parameter can be used for output. This means it has to be a local variable in the calling module, so the return value(s) have some place to go. Here is an example that shows the syntax:
CREATE PROCEDURE Test
(@in_test_date DATE OUTPUT,
@in_test_decimal DECIMAL(18,5) OUTPUT)
AS
BEGIN
SET @in_test_date = '2010-12-25';
SET @in_test_decimal = 789;
END;
BEGIN
DECLARE @local_date DATE;
DECLARE @local_decimal DECIMAL(18,5);
SELECT @local_date, @local_decimal; --returns (NULL, NULL)
EXEC Test @local_date OUTPUT, @local_decimal OUTPUT
SELECT @local_date, @local_decimal; -- returns (2010-12-25, 789.00000)
END;
Basics of Software Engineering
So much for the header; what about the body of a procedure? Well, let's not jump into that right away. We want to worry about principles first, then worry about the particular tools. We need a high level view of how to approach coding -- Software Engineering.
The basics of Software Engineering do not change in SQL. But they are realized differently. Most of us (with the exceptions of those rare programmer who started in LISP, APL, FP, Haskell or other exotic languages) learned a block structured programming language that evolved from Algol-60. The same principles that apply to procedural code still apply to SQL procedures.
In the late 1970's, we found that we could write better code (i.e. faster, provably correct, easier to maintain, etc) in languages that had local scoping rules and code modules with one entry and one exit point. We eliminated the GO TO statement and used a simple set of control structures. This was the Structured Programming revolution.
Cohesion
Cohesion is how well a module does one and only one thing; that it is logically coherent. The modules should have strong cohesion. You ought to name the module in the format "<verb><object>", where the "<object>" is a specific logical unit in the data model and “<verb>” is a single clear action. There are several types of cohesion. We rank them going from the worst form of cohesion to the best.
- Coincidental cohesion is when parts of a module are grouped arbitrarily. A Coincidental module is a train wreck of unrelated actions. It is a “Lady GaGa, Squids and Automobiles” module whose description would be a compound or complex sentence. The best example of this in SQL is the OTLT (One True Lookup Table) design flaw. This is a loook-up table with ALL the codes for the entire schema in it.
- Logical cohesion is when parts of a module are grouped because they logically do the same thing, even if they are different by nature. In SQL the most common example is a general module that does an update, insert or delete on any table – it works on Lady GaGa, Squids or Automobiles. Look for this to be implemented with dynamic SQL, XML and CLRs.
- Temporal Cohesionis when parts of a module are grouped by when they are processed. For example, a module which does all the initialization for the whole system.
- Procedural Cohesion is when parts of a module are grouped because they always follow a certain sequence of execution. For example, when a user logs on the database, we check user privileges and log the sign-in. Instead of putting these things in separate modules under the control of a superior, we write a monolithic block that does it all.
- Communicational cohesion is when parts of a module are grouped because they operate on the same data element. Imagine a series of UPDATE statements that affect the same column in one procedure, based on what the previous module did.
- Sequential cohesion is when parts of a module are grouped because the output from one part is the input to another part like an assembly line. In SQL, look for the use of temporary tables as a replacement for scratch tapes in a magnetic tape file system.
- Functional cohesion is when a module always does a single well-defined task, like a mathematical function. This is what we want in a module and it is the basis for functional programing languages. There are no side effects or external data.
You can look up the detailed definitions, if you missed them in your intro Software Engineering classes.
Coupling
Coupling is how dependent modules are on each other. If modules have to be used in a certain order, then they are strongly coupled. If they can be executed independently of each other and put together like Lego blocks, then they are loosely or weakly coupled. There are several kinds of coupling, which are ranked from worse to best:
- Content coupling is when one module modifies or relies on the internal workings of another module. Control jumps in and out of module. In SQL, the use of UDFs and CLRs are the prime example. You don't know what is going to happen until run time.
- Common coupling is when two modules share the same global data. That sounds like tables in SQL, but it refers to having two or ways to do the same task on common data. Doing the same task soon becomes doing almost the same task.
- External coupling occurs when two modules share an externally imposed data format, communication protocol, or device interface. Ideally, we do not want to worry about the external environment. This is why a tiered architecture does not do any formatting or packaging in the database.
- Control coupling is one module controlling the execution by passing flags. You have no idea what the module will do until its master tells it. This is one of the reasons that BIT flags are not a good SQL programming.
- Stamp coupling is when modules share a composite data structure and use only a part of it. This can be done with VIEWs in SQL.
- Data coupling is when modules share simple data elements. Think about passing parameters; these are the only data elements which are shared.
- Message Coupling is the loosest type of coupling. Modules are not dependent on each other, instead they use a public interface to exchange parameter-less messages. This is more of an OO approach but you see it in triggers, exception handlers and other SQL features.
This is covered briefly in a chapter on writing stored procedures in my book on SQL PROGRAMMING STYLE. In the meantime, you can read DeMarco, Yourdon, Constantine, Myers or several other of the pioneers of Software Engineering. This is far more basic than SQL programming. This is what you are supposed to know before you write any code in any language.
Good SQL Procedures
As a generalization, a good stored procedure has high cohesion and loose coupling which it gets by not using the flow of control structures unless absolutely needed. This is a surprise to procedural programmers. The reason is to put as much "programming stuff" into the pure SQL so that the optimizer can do its thing with the code.
How does that work? That is the next level on the stairway.