Abstract:
The purpose of this document is to detail standards for coding shell scripts. Following these standards will help create shell scripts that are easy to read, understand, and maintain.
The intent is that the database admins & developers across organizations would follow these standards to create a consistent and high standard of coding practices.
Keywords: Shell Script, Standards
Introduction to Shell Script Standards
A shell is a program that interfaces between the user and the Unix operating System; it takes the commands entered by the user and calls the Unix operating system to run those commands.
There are many types of shells. Some of the well-known ones are
- Bourne Shell
- C Shell
- Korn Shell
- Bash Shell (Bourne again Shell)
Unix commands placed in a file to be executed by a shell is called shell script.
Shell scripting is very powerful and fast. It is used in for various task. Some of the examples are as follows:
Some of the places where Shell Scripts are used are:
- Starting and Stopping a Server
- Taking backups of databases and various other administrative tasks.
- Utility scripts for performing various transformation on files in a data warehousing environment where traditional ETL tools (DataStage/Informatica) might not be efficient.
- For one time use only requirements, for example, Update or Delete records for fixing data quality issues.
Shebang Line
The first line of a shell script should always identify which shell is being used. This line is called the Shebang line.
Shebang Examples for Various Shells
Korn Shell:
#!/bin/ksh
Bash Shell:
#!/bin/bash
Awk Shells:
#!/bin/awk -f
Perl:
#!/usr/bin/perl
Header & Revision History
The header of the script should provide following details:
- Name of the Shell Script/Program
- Short Description about the script
- Detailed Description about the script with the details of required input parameters
- Name of the Developer and Date created.
- Revision History with any updates made to the script.
In organizations today, there are multiple developers who work on the same script at different times as part of various projects/tasks. It is necessary to track these changes in the revision history.
Sample Header
#!/bin/ksh #|-----------------------------------------------------------------------------------------------------------------| #| Program Name: InsertDQ_db.ksh #|-----------------------------------------------------------------------------------------------------------------| #| Description: This script inserts data into control tables for Data Quality Monitoring Process #|-----------------------------------------------------------------------------------------------------------------| #| Description: This script inserts data into control tables for Data Quality Monitoring Process #| 1) This script checks if the database is available #| 2) Executes Inserts statements against DQ_Monitoring Tables #| 3) Does an Audit count check to verify all records are loaded in table #| 4) Logs entry into process complete table for traceability purposes. #| #| Note: #| a) This script takes config file as a parameter and inserts into Control Table #| #|-----------------------------------------------------------------------------------------------------------------| #| Author: Imran Syed #| Date: 2017/01/01 #|-----------------------------------------------------------------------------------------------------------------| #|Revision History: #|Date | Developer Name: 2019/01/01 | Imran Syed #|Description of Change Made: Added additional tables for Inserts. #|-----------------------------------------------------------------------------------------------------------------|
Indentation
Indentation helps identify sections of code much easier increasing the readability and understandability of the scripts. If the scripts are not indented properly, it would be very challenging for a person to understand someone else's code and make modifications.
Code blocks should be indented. Whether the indentation is two spaces, three spaces, four spaces or a tab, it should be decided on as a group and then adhered to.
Annotations
The code should be well commented. The annotations should be very clear on why a specific piece of code was written, and what it is trying to achieve. The code should be self-documenting. Each section of code shall have appropriate comments, so that the reviewer gets a good understanding of what the code is trying to achieve than guessing at author’s intent.
Examples of Indentation and Annotation
#-------------------- Check for Config ----------------------------------------------------# if [[ ! -s ${CFGFILE} ]]; then echo “Error: Could not find Config file: ${CFGFILE}” echo “Exiting with Status Code of 10” exit 10 fi
Configuration Files
Configuration Files are used to pass parameters to shell scripts. It is highly discouraged to hard code any of the parameters in shell scripts. Hard coding parameters may result in code change if any of the parameters change.
Config files can be sourced in the main shell script to provide input parameters. Config files can be located in a directory where all config files are located for that department/organization on that server. Config files are used to pass details like source file directory, target file directory, database name, log file directory, and many others.
Credentials can be stored in vaults. When connecting to Teradata Database, Teradata vault could be invoked in the shell script to get the credentials.
Sample Config File
DB_Name= ‘DQ_DB_DEV’ ControlDir =/apps/dev/customer/controlThan LogDir=/apps/dev/customer/log
Validate All Parameters Passed Through Config Files:
Validate all the parameters that are passed to shell script by sourcing configuration file. For example, if the directory path is passed, check if it is a valid directory location. If the source file name is passed, check to see if the file exists.
All parameters sourced from configuration file need to be validated if possible.
Examples
To check if the config file exists:
if [[ ! -s ${CFGFILE} ]]; then echo “ERROR: Could not find config file: ${CFGFILE}” echo “Exiting with Status Code of 10” exit 10 fi
To check if the log directory exists:
if [[ -d ${LOGDIR} ]];then echo “Log Directory exists” else echo “Invalid Log Directory: ${LOG_DIR}” echo “Exiting with Status code of 10” exit 10 fi
Usage Function
It is a good practice to write scripts that are largely self-explanatory. Developers should be able to look at the script and figure out what is being accomplished. The usage function shall be invoked when the script is called with the option of “-h”, i.e. “scriptname -h”. The usage function shall clearly explain which parameters are required and which parameters are optional. Use the getopts function to implement Usage functionality.
USAGE Function Example
#-------------------------------Help Function -----------------------------------------------------------# function help { USAGE=”USAGE:${PROG} -h[Help] -c Config File “ Print $USAGE cat <<-!EOF -h see this help -c Config File <fullpath> (required) !EOF exit 0 }
Parsing Command Line Parameters
Use the getopts function to parse the command line parameters.
while getopts hc:p: OPT 2>/dev/null; do case $OPT in h) help ;;c) c) CFGFILE=$OPTARG echo “CFGFILE : “${CFGFILE} ;; *) help ;; esac done
Shell script Logs
Every shell script creates a log file in the corresponding project log directory with all the details of execution. The name and location of the log file could be read from configuration file.
Many times, shell scripts are called using ETL tools (DataStage/Informatica). In those cases it would be ideal to log the entries both in the ETL tool logs and also a specific log file generated by a shell script. This could be achieved by using a small custom function.
Custom Function Example
#-------------------------------Function to Echo and Log to Log File------------------------------------------# echot() { echo “$*” | tee -a ${LOGFILE} }
Instead of calling the echo function, if you call echot function this way, you can send the log entries the ETL tools logs as well create a separate log file for the shell script.
Variable Names
The Korn Shell and POSIX standards (also referred to as the Open Systems Standards) have many more reserved names that the original Bourne shell. All these reserved variable names are spelled in upper case like PWD, SECONDS, etc.
Avoid using user defined variable names in all upper case. User defined variable names preferably should be prefixed with “v_” to distinguish easily reserved variables and user defined variables.
Always enclose variable names with in flower (also known as curly or squiggly) brackets {Reserved_Variable_Name} or {v_UserDefined_Variable_Name} for any variable name longer than one character unless it's a simple variable name.
Variable Examples
Not Recommended:
echo $USER_ID
Recommended:
echo ${v_USER_iD}
Names of local, non-environment, non-constant variables should be lowercase. Names of variables local to the current script which are not exported to the environment should be lowercase, while variable names which are exported to the environment should be uppercase.
Do not use variable names which are reserved keywords in C/C++/JAVA or the POSIX standards in order to avoid confusion and/or future changes/updates to the shell language.
Sourcing Other Shell Scripts
Use the keyword “source” instead of ‘.’ (dot) to source other shell scripts in the current shell script. Using the keyword “source” is easy to read when compared with the tiny dot, and a failure can be caught easily within the script.
Source Example
Not Recommended:
.${CFGFILE}
Recommended:
source ${CFGFILE}
Use Proper Exit Code
Exit codes must be explicitly defined in a shell script. If no exit codes are defined, the shell script will show the return code of the last command executed which can be misleading for the debugging of shell.
A shell script should exit with a return code of 0 when successfully completed. Exit codes between 1 and 4 can be used to exit with warnings. Fatal errors shall have an exit code greater than 4 and shall abort the script.
It would be preferable to have different sections of the code exit with different error codes, so, that it’s easy to pin point the failure cause.
Exit Code Examples
- When help function is invoked and completed successfully, it exits with return code 0
# Help Function help () { print $USAGE cat <<-!EOF -h see this help -c Config File <fullpath> (required) !EOF exit 0 }
When the configuration file is not found, the shell exits with exit code of 10
#-----------------------Check for Config------------------------------------------# if [[ ! -s ${CFGFILE} ]];then echot “Error: Could not find Config file: ${CFGFILE}” echot “Exiting with Status Code of 10” exit 10 fi
Avoid Using System Exit Codes
Avoid using system exit codes during coding. System exit codes are reserved exit codes used by Unix system and they should not be used for any other purpose. The system exit codes can be found at /usr/include/sysexits.h
Below is the List of System Exit Codes
# define EX_OK 0 /* successful termination */# define EX_BASE 64 /* base value for error messages */# define EX_USAGE 64 /* command line usage error */# define EX_DATEERR 65 /* data format error */# define EX_NOINPUT 66 /* cannot open input */# define EX_NOUSER 67 /* addressee unknown */# define EX_NOHOST 68 /* host name unknown */# define EX_UNAVAILABLE 69 /* service unavailable */# define EX_SOFTWARE 70 /* internal software error */# define EX_OSERR 71 /* system error (e.g., can’t fork) */# define EX_OSFILE 72 /* critical OS file missing */# define EX_CANTCREAT 73 /* can’t create (user) output file */# define EX_IOERR 74 /* input/output error */# define EX_TEMPFAIL 75 /* temp failure; user is invited to retry */# define EX_PROTOCOL 76 /* remote error in protocol */# define EX_NOPERM 77 /* permission denied */# define EX_CONFIG 78 /* configuration error */# define EX_DB 79 /* database access error */
Functions
Functions are used in order to break the code into readable and reusable sections. Do not use function names which are reserved keywords or function names in C/C++/JAVA or the POSIX shell standard in order to avoid confusion and/or future changes/updates to the shell language.
Function Examples
ksh-style function
function help {….}
Bourne -style functions
foo() {….}
Use Local Variables Instead of Global Variables
Local Variables are variables that are only available in the script in which they are defined. These variables are not available to any child process that may be started from the main shell script.
Global/Environmental variables are variables that are available in the main script and to any child process that are started from the main script.
There are times were environmental variables are required but unnecessary use of environmental variables in shell scripts should be avoided.
Most commonly environmental variables are used in .profile
Local Variable Example
# Defining local Variable i=5; # echo the value in variable. echo $5
Global Variable Example
# Defining global Variable export PATH=/dev/in:/test/in
Test Command
Use the new test command “[[expr]]” instead of the old test command “[expr]”
With the use of new test command [[ … ]], additional operators are available. Wild cards can be used in string-matching tests, and many of the errors from the old test command have been eliminated.
Test Command Example
if [[ $i = [Yy]* ]]
Here the contents of variable “i” are tested to see if the matches anything starting with Y or y.
Let Command
Use the let command “((….))” for numeric expressions. Use the let command “((….))” instead of “[ expr ]” or “[[ expr ]]” for numeric expressions
Let Command Example
Avoid using
i=5 if [ $i -gt 5 ]; then
Use
i=5 if (( $i > 5 )); then
Compare Exit Codes Using Arithmetic Expressions
Use the POSIX arithmetic expressions to test for exit/return codes of commands and functions.
Exit Code Example
if (( $? > 0)); then
Instead of
If [ $? -gt 0 ]; then
True Command
Use built-in commands in conditions for while endless loops. Make sure that your shell uses “true” built-in (ksh93) when executing endless loops.
True Command Example
while true ; do sleep 10 ; done
Temporary Files
Avoid creation of temporary files and store the values in variables where possible
Temporary File Examples
Not Recommended:
ls -1 > filename for i in $(cat filename); do sleep 10; done
Recommended:
x=”$(ls -1)” for i in ${x} ; do sleep 10; done
Command Name and Arguments Before Redirections.
Put the command name and arguments before redirections.
Command Name and Argument Example
Use
date > file
Instead of
> file date
Conclusion
By Implementing the above shell scripting standards, developers could create shell scripts that are consistent, easy to read, understand and maintain.