How to lay out SQL Code

Question

How to lay out SQL Code

Tony Davis

SSCarpal Tunnel

Points: 4405
More actions
March 1, 2008 at 8:09 am

#184392

Comments posted to this topic are about the item How to lay out SQL Code

Viewing 15 posts - 1 through 15 (of 90 total)

You must be logged in to reply to this topic. Login to reply

Tom Garth SSCertifiable Points: 6193 More actions · Answer 1

Be careful what you wish for! or so they say. To me it's a matter of who makes the rules. I wouldn't want to force my GREAT structuring on anyone else, and I wouldn't want someone else forcing theirs on me. I agree that some standards might be a good thing. The problem would be with the number of idiots that end up in positions where they might have a say in it.

I would be interested in knowing some of the standards that the other readers have. Here are some of mine.

For the last 5 years I haven't worked a project as part of a team, but alone, and have had full control for all coding conventions. However there are other programmers in our organization, and having common standards helps us out when we are asked to assist with another's project.

When laying out a new database, I use simple descriptive names for table names, and 2, 3, or 4 character prefixes plus an underscore in all field names. The complexity of the application determines the number of prefix characters for me to use.

So a simple example would be:

Customers (CM_ID, CM_Number, CM_Name, CM_TaxID)

Addresses (CM_ID, CA_ID, CA_Code, CA_Line1, CA_Line2, CA_City, CA_State, CA_Zip)

The added prefixes do a lot for me when writing the stored procedures and application code.

Then stored procedure names will use the same prefixes like:

CM_Select

CA_GetBy_CM_ID (@CM_ID)

and those names are reused as procedure names in the code for the interface. In my case that is VB 2005.

I always give a foreign key the identical name of the unique index that it links to. This will make it's purpose fairly obvious to someone looking at the structure, and again, helps me to remember when writing code.

FROM Customers CM

LEFT JOIN Addresses CA

ON CM.CM_ID = CA.CM_ID

I don't remember exactly how I settled on these conventions, but I've used them long enough to know that returning to a project at a later date is not painful, and I've had good feedback from other programmers when they have worked on my projects.

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Jeff Moden SSC Guru Points: 1004504 More actions · Answer 2

Tom,

Just curious... Are you saying that "CM" is the abbreviation for "Customers" and that "CA" is the abbreviation for {customer}"Addresses"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Robert Nabolotnyj SSC Veteran Points: 246 More actions · Answer 3

A distinction that I’ve observed between programmers that understand and implement abstract concepts and those that haven’t is obvious in your naming convention.

Consider “Name” as an object having certain properties, FirstName, MiddleName, LastName, Suffix as is the case of a real entity, a person, or just Name as in a fictitious entity such as a business. Prefixing an identifier to a columnname makes that prefix superfluous. The alias prefix uniquely identifies the column and is universally understood.

Also think of Address as an object with its own properties. By assigning the primary key from the customer table to the primary key of the address table you’ve introduced a potential for managing redundant data. If, in addition to Customers, you had Contacts at the same Address, each contact will require a separate record. Rather, you should create a unique primary key without any reference to the Customer table.

Then create an Intersect table that contains the primary key from the Customer table and the primary key from the Address table. You are then able to join the Customer and the Contact to the same Address.

So, now that you have a unique Address table you can write Sql procedures and program code in the form of classes that uniformly handle that abstarct object.

So whether the Address is for a business customer or for a person, the rules are the same.

I personally use the CamelCase convention for column names in both programs and Sql and do not use any underscores. Underscores make the name unncessarily long.

Tom Garth SSCertifiable Points: 6193 More actions · Answer 4

Tom,
Just curious... Are you saying that "CM" is the abbreviation for "Customers" and that "CA" is the abbreviation for {customer}"Addresses"?
--Jeff Moden

Sorry Jeff. I forgot to mention it. CM for Customer Master and CA for Customer Addresses.

Thanks,

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Tom Garth SSCertifiable Points: 6193 More actions · Answer 5

Robert,

I wasn't trying to present a specific application, but rather a part of the approach to DB design that I use.

When I or another programmer sees something like Fields("CM_Name") in code, there is little doubt what it is referencing, compared to Fields("Name").

Please understand that mostly I am designing every piece of a given project. My methods are partly geared to that aspect of my own work. My projects may take 1 to 6 months of work, or in some cases may require that I am in and out in a week. Abstraction is a wonderful concept when there is time to implement it. In some cases however, you could be spinning your own wheels for no good reason. In other words the project requirements don't demand a high level of complexity.

The client gets what they ask for. It works well and is easily maintainable, and stays within their budget. My methods help me to achieve these goals, and still be flexible.

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Nathan Allan Newbie Points: 6 More actions · Answer 6

Hi Tony,

I am certainly on board with there at least being recommended conventions, realizing that most of the current generation will ignore them.

Over the years, my colleagues and I have evolved what I consider to be a consistent and readable set of conventions. By nature, some of these will be subjective, but on the other hand there is more science to conventions than many admit.

Here are some of the conventions we have adopted for DDL:

-All lowercase keywords. e.g. select ... from ...

-Pascal casing for other identifiers. e.g. ... from MyTable where Age < 19 ...

-Acronym based alias names for tables. e.g. ... from SaleItemTransaction SIT where SIT.Amount ...

-(Sub)clauses always on their own line or on the same line (but not both).

-Double indent for projection, as though there is an implicit "project" clause

-Block indent for multi-line sub-expressions (parens for block)

-No casing exception for functions (Pascal cased) and no space between the function name and the parens

-Always whitespace before sub-expressions.

--Note that exists is a keyword, not a function, and should thus read: ... where exists (select ...)

-Lists in the form: (Item, Item) (one space after each separator, but not adjacent to the parens)

I would have liked to give more examples, but this comment system seems keen to remove my whitespace.

As for DML:

-Same casing and clause rules as DDL

-Always use the "constraint" keyword for FKs and PKs and name each of them

-Do not repeat the name of the table in column names.

-Only include a foreign table name in a column name if a "has a" relationship exists (not "is a").

The convention to not attempt to globalize column names is controversial I know. SQL, however, defines the column name space as being local to tables, and experience has shown that attempting to globalize column names is problematic at best for several reasons. As an example, I would advocate the following names:

Contact(ID); Person(ID); PersonAddress(PersonID, TypeID); PersonAddressType(ID)

rather than:

Contact (ContactID); Person(ContactID); PersonAddress(ContactID, PersonAddressTypeID); PersonAddressType(PersonAddressTypeID);

You will note that the advocated convention conveys more accurate information and more tersly. I find that expressions involving this convention are clearer too:

select * from PersonAddress join Person on Person.ID = PersonAddress.PersonID

Thanks for raising the topic Tony; there is certainly much to say on this matter. It was hard to suppress a long-winded explanation on each point, and I'll stay entirely away from imperative code since each system has its own language.

Best regards,

--

Nathan Allan

fuller.artful Right there with Babe Points: 795 More actions · Answer 7

First off, this is a good discussion to raise, Tony.

Second, within any organization a standard practice is or should be mandatory. In practice, that means Select Max(Power) From Employees Where Team = 'Database'. I most certainly impose standards in the case where I have Max(Power), and in such cases I take them all the way down to the templates. I want sproc headers to look a certain way, and all the code too. Where I do not have Max(Power), then I am more than willing to adhere to the organization's conventions. Thanks to SQL Refactor, I have no trouble at all reformatting code to suit the conventions of my boss (although intelligent camel case is a tough nut to crack without human intervention).

There is one issue where I do take a strong position, however, and that concerns prefixes. I hate them. I have moved to a convention which I have named "Hungarian suffix notation." The common practice is to prefix various appellations with indicators of data type and so on. To me, this only decreases the signal-to-noise ratio of the code. It means that in order to deduce the meaning and affected entities of an object, I have to read past several characters, sometimes including an underscore, before I get to the meat of the object in question.

In writing this last paragraph, I realize that actually I have two standards that I'm willing to fight for. The second involves the action-object convention versus its opposite, the object-action convention. I stand with both feet firmly in the latter camp: I hate names such as GetCustomerInfo, regardless of whatever noisy prefix you wish to prepend. I want this object to be named CustomerInfoGet, and the reason is very simple -- the objects so named sort "naturally" by the objects they affect. Suppose for example that I wanted to see a list of the sprocs and udfs that touch the Customer table in some way. Following the action-object convention, I would need to look for words beginning with Get, Set, Update, Insert, Delete... and whatever else you might imagine. In the object-action convention, these things sort naturally: I might expect a list to resemble something like this:

CustomerInfoDelete

CustomerInfoInsert

CustomerInfoSelect

CustomerInfoUpdate

This, regardless of the separate discussion about prefixes versus suffixes. I still hold to suffixes, and perhaps a tad religiously. For example, for udfs (which can be scalar or table), I indicate the difference in the suffix:

CustomerBalanceUDFS -- returns the balance for a given Customer as a scalar UDF

CustomerInfoUDFT -- returns a table UDF

The names sort alphabetically by their principal object (table). The suffixes tell me what sort of beast they are. In cases where multiple tables might be affected or at least touched by a given sproc or udf, I tend to mention the tables in order of significance, or if that proves too ungainly to use a name related to the business process the procedure models.

When I'm the boss, expect such conventions. When you're the boss, I'll adhere to yours, insofar as they are consistent. If they are inconsistent, I will mention this.

Arthur

Arthur Fuller
cell: 647-710-1314

Only two businesses refer to their clients as users: drug-dealing and software development.
-- Arthur Fuller

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

Did I see my name mentioned? next to some amazon gift cert? woot!

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Ian Yates SSCoach Points: 19738 More actions · Answer 9

I have to agree with fuller.artful and Nathan Allan - both have good standards. I too am a sole coder at the moment but it's still very nice to have conventions followed. The grouping of stored procs by Noun-Verb is much easier to use than Verb-Noun. Since I code 90% in Delphi I tend to use Pascal case 🙂

SqlNightOwl SSCrazy Points: 2333 More actions · Answer 10

Nathan Allan is a guy I could work with. I like and already use most but not all of his DML style. For constrains on columns I use CK_Table_Column, if it's on the table (enforcing relations between two columns) then I use CK_Table_SomeGenericDescription (i.e. EndOnGTStartOn).

I try to end most date columns with an "On" and if it's a time I use "At". Examples are: PostedOn, CreatedOn, StartingAt etc. Exceptions: BirthDate and others where date is a common part of the vernacular.

I really like using schema's to group objects but I do use mnemonics like sls for Sales, mkt for Marketing, hr for well HR, etc. This gives me sls.Orders and sls.OrderItem. Yep, I try to keep tables as singular nouns unless I hit a reserved word then I may go with a plural noun.

I try to use a prefix of "Is" or "Has" for bit columns and try to use a name where the "true" side is the most likely desired result. Examples IsActive, IsEnabled as opposed to IsInactive or IsDeleted.

For Relationships, I try to use UML-like notation of Is or Has such as Parent_has_Child. But in cased where the "parent" is a lookup table I'll reverse the roles. Two examples of this are Customer_has_Orders and Customer_has_CustomerType.

Now comes the sticky wicket, how should those pesky stored procs be named. I've seen some very complicated prefix schemes. IMHO, they all suck. A usp_ActionObject (ups_GetCustomer) does nothing for me even if I used a schema. If I reverse the flow and use Customer_get then I can get all the Customer related objects together and layering a schema makes it all that much better.

I like suffixing views with "_v". That's just because I like to distinguish base objects from views and you can perform CRUD on views if you follow specific rules.

I've talked to a lot or application developers that like the idea of abstracting columns so that you have Customer.ID and Customer.Name. OK but then you always wind up aliasing them in your code. I believe abstraction should be left in the application layer and not in the database. I'd go with Customer having the CustomerId and then instead of Name I'd use Customer.

Finally, I believe any coding convention should be spelled out and written down. Then you can hand it to a new developer (app or db) and they will know what's expected. I don't believe it should be so rigid that it can't change but, it shouldn't be so flexible that it could mean anything. Remember the convention is supposed to help maintain consistency. It benefits both the employer and employed.

I look forward to others input.

--Paul Hunter

Cliff O'Reilly Grasshopper Points: 13 More actions · Answer 11

I may be missing something, but it seems to me that only one of the posts so far have been about How To Lay Out SQL Code. Instead people seem to be writing about Naming Conventions. These are two different things.

Naming Conventions are highly contentious, mostly because they are influenced by, and influence, application development as well as DBA-type work. I believe that clarity is the only determining factor that should be applied. All others should be decided by the specific project. For example if a database is being created to manage a small book collection and will not have any application front end then the naming convention would not need to be the same as a project to develop a CRM with hundreds of tables, views, stored procedures and with a complex application on the front.

The very nature of a naming convention is determined by the data, environment and use and cannot be a blanket technique applied to all situations.

As a previous post mentioned, an Entity-oriented approach (with one eye on Relational Theory) is probably best, however all the answers given so far all work for the specific developer so why change them?!

The important thing (again, as someone already said) is to agree a standard and stick to it!

As for SQL Code layout I think that clarity is even more important. I have lost count of the number of times I have had to reformat SQL before I can understand it. My main problem is distinguishing specific sections within statements. For example working out where keywords are (FROM, GROUP BY etc) compared with column names etc.

Faced with a long string of text the important things to understand are what is the code doing and which section is doing what. Here is an example (ignore naming convention and rubbish syntax).

Example 1:

select id, min(name), max(date1), max(date2), max(time1), max(time2), max(location), max(place), max(how), max(when), max)[exact date time]), count([number of orders]), avg([customer count]), count(customers), max(comment) from t_somecustomerstuff where id>7657 group by id

Example 2:

SELECT id, MIN(name), MAX(date1), MAX(date2), MAX(time1), max(time2), MAX(location), MAX(place), MAX(how), MAX(when), MAX)[exact date time]), COUNT([number of orders]), AVG([customer count]), COUNT(customers), MAX(comment)

FROM t_somecustomerstuff

HAVING id>7657

GROUP BY id

For me the judicious use of capitalised keywords and line breaks creates a much more readable script. Sections are more easily defined and the time it takes to understand the code is reduced significantly.

I'm sure there are more techniques that would improve this still further.

Cliff.

fuller.artful Right there with Babe Points: 795 More actions · Answer 12

You're right. I drifted into naming conventions and I think led a few others down my garden path. Sorry. Regarding layout, I most definitely agree with your "KEYWORD on a new line" approach. I also devote a whole line to each new column in the select statement. Your example with all the aggregrates is a great illustration of why I do this. Finally (and coincidentally I just had this discussion with a colleague, who hates my layout method), I place all the commas in a select list at the beginning of the line, not at the end. Then it becomes a breeze to re-order the columns, or insert or delete one, etc. You don't have to go find the end of the last line and lose its comma, and it's much simpler to see a line that's missing a comma. But again, my friend Joe hates that approach, and thanks to SQL Refactor I just click a button and presto, he has his way. Working for him, I would continue to write the code my way, and then at the end of the day click my mouse and deliver the code his way. 🙂

Arthur

Arthur Fuller
cell: 647-710-1314

Only two businesses refer to their clients as users: drug-dealing and software development.
-- Arthur Fuller

Cliff O'Reilly Grasshopper Points: 13 More actions · Answer 13

That's a very good point about putting each column on a new line. I use that method when I'm writing SQL (for an INSERT statement perhaps), but if I'm just reading the script then I don't usually bother putting each column on a new line.

I guess this brings up a new point which is: are there specific layout approaches that apply only in certain coding scenarios? It would seem that way and that obviously complicates the answer to this question.

By the way I always put commas at the end of the line, but only for me own readability. I think it is better from a programming standpoint to put it at the beginning for the reasons you mention.

Tom Garth SSCertifiable Points: 6193 More actions · Answer 14

I have to agree with fuller.artful and Nathan Allan - both have good standards. I too am a sole coder at the moment but it's still very nice to have conventions followed. The grouping of stored procs by Noun-Verb is much easier to use than Verb-Noun.

I forgot to mention grouping. Yes, absolutely, the right naming conventions lead to natural groupings for the objects. Saves tons of time when you're sorting through hundreds of stored procedures

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers