In this article, we are going to talk about nullable and non-nullable columns specifically related to Postgres databases. While the reasons to use nullable or non-nullable columns may seem obvious at first, there are pitfalls associated with each decision that either affect development velocity, affect application performance, or cause downtime.
Let's start with definitions. Nullable and non-nullable columns are terms used to describe the ability of a column in a database table to accept or not accept null values. 'NULL' represents unknown or missing data. 'NULL' is not the same as an empty string or the number zero. For example, you need to insert an email address of a contact into a table. If you don’t know whether the contact has an email address, you can insert NULL into the email address column. In this case, NULL indicates that the email address is unknown. NULL does not equal anything, even itself. The expression 'NULL == NULL' returns 'NULL' since two unknown values should not be equal. To check if a value is 'NULL' you use the 'IS NULL' boolean operator. You use this as below, and it returns true, if NULL, or false, if not.
email_address IS NULL
Non-nullable columns have an added constraint that restricts INSERT or UPDATE operations from inserting NULL. If an attempt is made to insert a NULL, the result will be an error. To control whether a column can accept NULL, you will use the 'NOT NULL' constraint.
CREATE TABLE table_name( ... column_name data_type NOT NULL, ... );
Adding NOT NULL columns to a new table
The CREATE TABLE statement below creates a new table, named invoices, with the not null constraint
CREATE TABLE invoices( id SERIAL PRIMARY KEY, product_id INT NOT NULL, qty numeric NOT NULL CHECK(qty > 0), net_price numeric CHECK(net_price > 0) );
If you use NULL instead of NOT NULL, the column will accept both null and non-null values. If you don’t explicitly specify NULL or NOT NULL, the column will accept NULL by default.
Adding NOT NULL constraint to existing columns
Add the NOT NULL constraint to a column of an existing table with this code:
ALTER TABLE table_name ALTER COLUMN column_name SET NOT NULL;
Issues using nullable and non-nullable columns
If you don’t explicitly specify the NOT NULL constraint, the column will accept NULL by default. As a result, nullable columns are more widely used than non-nullable columns. There are some downsides with using nullable columns specifically pertaining to query performance. When a column is non-nullable, the database engine can make certain assumptions about the data in that column, which can lead to faster query execution. The query optimizer knows that a column cannot have a NULL value and can exclude special tests for such values like the NOT IN vs NOT EXISTS case.
It is always best to enforce data integrity as near to the data as possible. This ensures data integrity in spite of what happens in the application layer. By requiring that certain columns always have a value, you can ensure that the data in those columns is accurate and consistent. For example, a table might have a non-nullable column for a person's first and last name, to ensure that every record in the table has a complete name. In addition to ensuring completeness, you also get the performance gains I speak about earlier by using non-nullable columns.
The biggest risk with adding non-nullable columns is doing so without causing downtime to your application. When the column has a non-null constraint, the table is entirely rewritten, which is an expensive operation depending on the table size. This operation applies a lock on the table, which prevents insert and update operations causing potential downtime. To add non-nullable columns to a database table without downtime, you can use a technique called "online schema migration." This involves using a database tool or script to modify the table's schema without locking the table or disrupting read and write operations.
Online Schema Migration
Let’s take the example of a simple to-do list application. You want to allow users to decide if a task is small or large.
1. Begin by creating a new column in the table with the desired data type and the NULL constraint. This will allow the column to accept null values initially.
ALTER TABLE `tasks` ADD COLUMN `type` VARCHAR(50) NULL;
2. Next, run an UPDATE statement to populate the new column with data. This can be done in batches to avoid affecting the performance of the database. For this example, we will update the entity to always set the type column to the value `small`. This ensures all new records have a proper value set. You can do this in the same commit as the previous step or in separate commits.
UPDATE tasks SET `type` = 'small WHERE `type` IS NULL;
3. Once the new column has been populated with data, use the ALTER TABLE statement to remove the NULL constraint from the column, making it non-nullable.
ALTER TABLE `tasks` CHANGE COLUMN `type` `type` VARCHAR(50) NOT NULL;
4. Finally, run a series of test queries to verify that the new column is working as expected and that the data in the column is accurate and consistent.
By following these steps, you can add non-nullable columns to a database table without downtime or disruption to your application. Online schema migration is a useful technique for making schema changes in a production environment without affecting the availability or performance of the database. If something goes wrong with the new changes, you can easily revert to the previous application version without any changes to the database structure.
You can revert the column by following these steps
- Make the column nullable in the database
- Stop using the column in the application and deploy it
- Remove the column from the database
Check Constraint
If you want to add the NOT NULL constraint in a large table for new records without causing locking issues you can do this by using a check constraint rather than SET NOT NULL
ALTER TABLE foos add constraint id_not_null check (bar_id is not null) not valid;
This will still require an ACCESS EXCLUSIVE lock on the table but it is very quick because Postgres doesn’t validate the constraint by doing a table scan. This will make sure the new rows or changes rows cannot insert or update null values onto that column.
After committing the ALTER TABLE above you can do
alter table foos validate constraint id_not_null;
This does not require an ACCESS EXCLUSIVE lock and still allows access to the table
Conclusion
Database schema design is an integral component of designing a system. Nullable and non-nullable columns are basic but fundamental decisions that affect data integrity, system performance, and uptime. While non-nullable constraints improve query performance and ensure data integrity, adding them to large production databases can cause downtime if you don’t do an online schema migration or use the `CHECK` constraint.