August 2, 2003 at 6:12 am
Naming Conventions
There are a lot of books and reference material out there that talk about variable naming conventions for application programmers, but hardly any about Database object nomenclature. The few that do exist say no more than be consistent with whatever style you decide to follow.
Do DBAs out there follow their own, home brewed standards? Is there a Database equivalent of the Hungarian notation or Pascal naming conventions that programmers using particular programming languages follow?
These are the conventions I follow:
1. Master Tables are named in the singular (eg: Customer, not Customers)
[Exception: Orders as ORDER is a reserved keyword in SQL]
2. Detail Tables named in the plural. (Eg OrderDetails)
3. No spaces or special characters allowed (only A-Z, a-z, and underscore "_")
4. Relationships named as follows: FK_ParentTableName_ChildTableName
(eg: FK_Country_Customer)
5. Primary Keys named as follows:
PK_TableName (eg: Customer Table --> PK_Customer)
6. Indexes named as follows:
IX_ColumnName (Index on the CountryID column of the Country table = IX_CountryID)
7. Unique Indexes: IX_UN_ColumnName
"UN" Used for UNIQUE indexes
8. Use Mixed casing capitalizing the first letter of each new word that makes up a name. Eg: ThisIsOneLongDatabaseName
9. Avoid abbreviations (except common ones such as Amt/No etc)
If you've stayed with me thus far, great! I find it amazing that there is so much said and written about naming conventions in programming languages and so little when it comes to databases.
Please mention what you do and don't and maybe this thread can help a lot of people out there ( like me π ) evolve best practices and standards of their own.
August 4, 2003 at 12:39 am
Hi vivian123,
quote:
Do DBAs out there follow their own, home brewed standards? Is there a Database equivalent of the Hungarian notation or Pascal naming conventions that programmers using particular programming languages follow?
not that I know of!
quote:
These are the conventions I follow:1. Master Tables are named in the singular (eg: Customer, not Customers)
[Exception: Orders as ORDER is a reserved keyword in SQL]
2. Detail Tables named in the plural. (Eg OrderDetails)
3. No spaces or special characters allowed (only A-Z, a-z, and underscore "_")
4. Relationships named as follows: FK_ParentTableName_ChildTableName
(eg: FK_Country_Customer)
5. Primary Keys named as follows:
PK_TableName (eg: Customer Table --> PK_Customer)
6. Indexes named as follows:
IX_ColumnName (Index on the CountryID column of the Country table = IX_CountryID)
7. Unique Indexes: IX_UN_ColumnName
"UN" Used for UNIQUE indexes
8. Use Mixed casing capitalizing the first letter of each new word that makes up a name. Eg: ThisIsOneLongDatabaseName
9. Avoid abbreviations (except common ones such as Amt/No etc)
obviously you already have some kind of naming convention.
Best thing IMHO is to document them and apply them consequently.
But like you, I'm interesting what the others may say
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
August 4, 2003 at 3:47 am
Vivian,
I've seen such standards documented at various sites, but not for public consumption. Why not? Beats me.
But, one SQL organisation here in Oz called SSW has a number of links on their web site. One points to a SQL coding standards document at http://www.nyx.net/~bwunder/dbChangeControl/standard.htm
Whilst I don't agree with everything in the document, and - who knows - maybe SSW doesn't either, it is quite comprehensive and something that should be encouraged at all sites.
BTW. I must disagree with you about the singular and plural for master and detail tables respectively. Methinks it should be plural in all cases (some designers may say singular in all cases - depending on methodology). Who knows when a new entity will be introduced that reduces a master table to be considered a detail table?
PS. Can I use a copy of your standards when it's complete?
Cheers,
- Mark
Cheers,
- Mark
August 4, 2003 at 3:51 am
quote:
PS. Can I use a copy of your standards when it's complete?
kind of a lazybone today ?
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
August 4, 2003 at 4:13 am
Tonight
Cheers,
- Mark
Cheers,
- Mark
August 4, 2003 at 4:15 am
quote:
Tonight
ooh, I see, in this case....
Cheers,
Frank
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
August 4, 2003 at 5:28 am
I'm coming at this as a development manager not a DBA. I think that the database names, specifically the column and table names, are much more important than the naming conventions inside the applications. Most programs tend to be localized, whereas for any database that has a long (and changing) life, the column names tend to spread to related tables and be used across very diverse and barely related applications. Consistent naming makes it more likely that programmers will use the columns for the right purpose.
One of the hardest thing to convince DBA/Programmers of when they are designing a table is that their names chosen are not obvious to the entire world. I just had almost a shouting match with one particular developer the other day because in two tables, where data flowed over time from #1 to #2, the same field name (OperatorCd) was used in different ways. In one it was the origin of the row, in the other it was who was going to be responsible for reviewing it. So the same basic data as it went through transformations had the same column name, but with different meanings.
We have adopted standards that are heavily oriented to three concepts:
- The same concept can only get one name; the exception is specializations (e.g. OldCustCd, NewCustCd) where each is distinguished. Work hard on the modeling aspects, ensure each business concept is very clearly defined, and never pollute tables with different entities just to save a table (this is modeling more than naming).
- Use consistent abbreviation and spelling. For exmaple, if you are going to abbreviate "Distributor" when it is part of a name, it must always be done exactly the same. We have a table of abbreviations -- you can NOT abbreivate part of a name unless it is in there (or you add it there) and it is unique. So District and Distributor (for example) have unique and identifiable names.
- Use suffixes for data type concepts. Data types are not important but not the most important. A great example in many business systems is money -- what is a unit price vs. an extended amount, which are at retail (MSRP) price, selling (discounted to customer) price, cost price (discounted to supplier). So anything with money for us must have:
Final suffix:
Cst = unit price
Amt = extended, total price
Secondary suffic:
AP = cost to us
AR = cost to our customer
Rtl = MSRP (Retail)
So if I see MagazineRtlCst I know that it is the MSRP price, for a specific item.
We also find this useful in things like:
TK = Technical key (part or all of key, don't show it to the user)
Str = arbitrary string, not validated by any kind of master table
Cd = not arbitrary string, validated against some table (or could be)
Nm = Descriptive string that is the friendly name (but not key) of a row
Nbr = arbitrary number, not validated against another table, e.g. PONbr
Qty = Ordinal count of something, copies, boxes, cartons, etc.
Pct = Percentage of something, expressed as a percentage (100% is represented as 100)
We have a bunch of other rules about structure and names, but the little bit above sums up a lot of the philosophy. Someone ought to be able to glance at a name and know something about it, and more importantly some mis-use ought to jump out at them (e.g. adding together xxxPct fields).
We have found that it is MUCH easier for people to look at older database problems and know what they mean if we follow this, and have several programmers working on the same project and end up with more consistent code.
The other things that fall into these category is how to represent key concepts. Some of the biggest deals are:
- Flags -- yes, no; booleans (T/F) or int (0/1) or char (Y/N). Pick one, stick with it everywhere. Creativity here is bad.
- Nulls -- decide whether to use them, then do. Never allow null to be "unknown" in one place and blank to mean "unknown" in another. Creativity is bad here too.
- Settings (e.g. Quoted_Identifier) -- pick one. Everyone pick one. Don't let different people use different settings.
Incidentally, some comments on the above:
- Master singular, detail plural -- won't work, one relationship's detail is another's master. We haven't been consistent here, I regret, but in my opinion pick one. I prefer singular.
- Index names -- Not a bad idea, but not nearly as relevant as these are a physical design issue that programmers should pay negligable attention to. They come, they go. Programmers write bad code when they try to write (initially) to indexes. I try not to even show them to them. Save it for tuning phases.
Finally, my own question to add:
I'm finding it VERY hard to read some SQL code from some programmers. We have some who are horizontally inclined, e.g.
Select x.f1, x.f2, y.f3, y.f3, etc.
From Table x
Inner Join Table2 y on y.f1=x.f1 and y.f2=x.f2
where something
group by etc.
We have a couple others who never put anything more than one expression on a line:
Select
x.F1,
x.F2,
y.F3,
etc.
From Table x
Inner Join
Table2 y
on y.f1=x.f1
and y.f2=x.f2
where
something
group by etc.
Drives me crazy (simple selects can run on to 3 pages), but apparently some prefer. I'm thinking of insisting on a coding standard with regard to this as well, but hate to suppress style issues that have no real impact.
Thoughts?
August 4, 2003 at 1:16 pm
As for the vertical selects, etc. versus the horizontal selects, I prefer using the vertical selects for inserts and updates and horizontal selects for selects.
Here's why.
When you're doing an insert or an update and you have blanks or variable names that are different from the column names, it's nearly impossible to line them up, especially if you are saving it to a text file to deploy to the live server. I've done both and haven't "truly" standardized, so I'd like to know which works best for others. I agree with you that the data should be somewhat self-describing. I've worked with programmers who used AR$ to hold the customer name and the invoice amount in the same script. WOW! How does anyone debug that?
When I'm trying to figure out a problem with an insert statement, it's helpful when debugging by cutting and pasting because sometimes SQL Server actually places the cursor RIGHT next to the offending column and then I can count down using the arrow key to figure out where the problem exists in the other part of the sql script.
I use the tab key pretty extensively, but I also limit my row length to 80 characters, so the "reader" doesn't have to scroll right to read my code. The only exception is a select statement which I'll run off the right side of the screen. So, basically, my "standards" aren't really "standard", but it works for me and the 4 other guys I code with.
I REALLY like the abbreviation table concept. That would have really helped out with our coding. We have some fields where Amount is AMT or Amount, or Number becomes No, or Num, or Nbr, or NMBR, or even Number and it's maddening when you're trying to work with it because you can't guess the field name. We WILL be implementing the abbreviation tables in our work. Could you post your tables, or is that considered "intellectual property"?
August 4, 2003 at 1:26 pm
Well, I don't quite know how to post anything here that looks like a table. Here's something all run together that you might be able to make sense out of. There are two pieces, the first are abbreviations for use inside column names, the second are suffixes for use appended to the end of table names.
I am less than happy with some of this, e.g. the ambiguity between a suffix for a business concept and a suffix for a key that collide in the same field. But anything is better than nothing and these have helped a lot. Just for example being able to visually distinguish a dateTime that includes time from one that does not.
Sorry for the messy structure.
PS. I should note that these are likely to be quite industry specific.
Acct
Account, Accounting
Adv
Advance
Allot
Allotment(s)
AP
Accounts Payable (i.e. agency cost is APCst)
AR
Accounts Receivable (i.e. Customer Cost is ARCst)
Auth
Authorized, Authorization
Avg
Average
Balance
Balance
Bipad
Bipad
Cat
Category
Closed
Closed
Curr
Current
Cust
Customer
Del
Delivery
Dev
Device
Disc
Discount
DP
Distributor payables (as in subsystem, or related to subsystem)
Dist
Distribution, distributor, distribute
Div
Division
Dstct
District
Entry
Entry (not Entered, e.g. Allotment Entry Date)
Freq
Frequency
Hist
History
Inv
Invoice
Invtry
Inventory
Issue
Issues or Issue
Mag
Magazine
Max
Maximum
Mgr
Manager
Min
Minimum
Mth
Month
Nat
National (e.g. NatDist = National Distributor)
Orig
Original (very inconsistently used at present, strive toward βorigβ)
POS
Pay On Scan
Prod
Product
Pub
Publisher
RDA
Retail Display Allowance
Rcv
Received, Receiving
Redist
Redistribution, Redistributed
Rtl
Retail
Rte
Route
Rtn
Return
Ship
Ship
Short
Shortage
Srv
Service
Stmt
Statement
TA
Transportation Allowance
Title
Title
Trans
Transaction(s)
Wk
Week
Yr
Year
------------------------------------------------------------
Suffixes
Suffix
Description
Amt
Extended or total cost, dollars, 2 decimals. If appropriate modify as needed with AP, AR or Rtl e.g. ARAmt.
APCst
A/P (Levy) Cost (unit price β varying number of decimals).
ARCst
A/R (Customer) Cost (unit price, 4 decimals).
Cd
Code pointing to some master table (real or understood). May be numeric or non-numeric. Generally this will be a foreign or primary key of some table (or portion thereof). This is one of the less consistent aspects of the naming convention, as it masks other aspects and should not be used if the alternative is yet more clear (for example TK is a specialized name for a key and should be used where appropriate instead).
Desc
Long description (as opposed to name, this would be more elaborate if a table had bother a name and explanation/description).
DcNbr
Decimal Number (i.e. with quantity left and right of decimal, not money).
Dt
Date only (time part must be zero if it exists).
Dtm
Date and time.
Flg
A Boolean indication (null may or may not be allowed, but by convention we always represent this as βYβ or βNβ as a single character field.
Len
Length (may also use another suffix for units).
Nbr
Arbitrary number, not pointing to some master table; integer value without more specific suffix, as an example a Ref(erence)Nbr is a specific number not counting or quantifying something but that also is not (significantly) a key to some other master table like a Customer Number would be (which is CustCd instead).
Nm
String which is an identification description of an entity, i.e. a description. A common use might be for a table of xyz to have xyzCd as a key, and xyzNm as the descriptive name (and it might have a xyzDesc if there is a more elaborate description in addition)
Pct
Percent[age], always stored multiplied by 100, i.e. 20% is stored as 20.00 not as 0.20.
Qty
Quantity (integer) or count, where other usages are not applicable.
RtlCst
Retail Cost (unit price, 2 decimals).
Str
Non-specific alpha-numeric data, not (generally) referring to any master table real or implied (or any practical one, for example we call CityStr but StateCd as there is a definitive and available validation source for State but while one is theoretically possible for City we do not have or expect to have one).
TK
Technical Key, i.e. an arbitrary number that is used to identify a row, but is not attached to meaningful information, and is generally not visible to the user (and generally should not be), usually system assigned.
Tm
Time only (date part must be zero if it exists); this is a time of day indication.
Vol
Volume (may also use another suffix for units).
Wgt
Weight (may also use another suffix for units).
August 5, 2003 at 8:54 am
The thing I find missing from this discussion is how to handle object names. I've had great success using the format [object]_[action].
Object is the object which is being acted upon e.g. Customer, Order, Receipt... The action refers to the type of action that will be performed e.g.
ins = insert
del = delete
upd = update
get = select
sav = combo of insert/update (see next)
Most of the time an update and insert procedure have the same list of parameters, so I've combined the insert and update scripts to form a save routine. e.g.:
create procedure Customer_sav
@CustomerID int = null output
, @Customer varchar(75) = null
...
if exists (select * from Customer
where CustomerID = @CustomerID)
begin
update Customer
set Customer = nullif(isnull(@Customer, Customer), '')
...
where CustomerID = @CustomerID
end
else
begin
insert Customer
( Customer
, ... )
values
( nullif(@Customer, '')
, ... )
end
This has two advantages. It saves creating/maintaining two procedures. It removes from the calling code having to figure out if an insert or update is required, simply call the save.
I don't use prefixes [sp_] or [usp_], since all of these object are collected/displayed together in both SQL EM and SQL QA.
I use the "get" to retrieve by the primary key and a "sea" [search] procedure to retrieve records by alternate criteria. e.g.
(note: I always explicitly name columns and never use *)
create proc Customer_sea
@CustomerID = null
, @Customer = null
, @State = null
select *
from Customer
where CustomerID = isnull(@CustomerID, CustomerID)
and Customer = isnull(@Customer, Customer)
and State = isnull(@State, State)
I use a verticle format to my procedures/views and use a preceeding comma (, ColumnName) rather than a trailing comma (ColumnName ,). This makes it easier to comment out lines of code.
For views I append the suffix [_v]. This is done more to denote that a view rather than a base table is being used in TSQL.
For the most part I like fully spelled out column names and rarely use abbreviations. This is more for consistency on my part. I rarely include the word "date" in a column name (Posted instead of DatePosted) as the data type identifies it as a date. The same goes for logical columns (Active instead of ActiveFlag). Also I would not use DatePosted and PostedFlag. These would be redundant. If the Posted column is null then it hasn't been posted and there is no benefit gained by having a boolean flag to duplicate the logic.
This is a good thread. I like collecting what others have implemented in their style and incorporating what they are using that will work for me.
pax
--Paul Hunter
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply