Blog Post

How to find free space in Azure PosgreSQL

,

Photo by NASA on Unsplash

I wanted to figure out how big (or approximately how big) my dump file would be. In QA, I have (relatively) large dbs (compared to prod). Enter the pgstattuple extension to help me determine how much free space is in my tables.

Step 1: Installing the pgstattuple Extension

First, you need to install the pgstattuple extension. This extension allows you to gather statistics about the physical storage of tables and indexes, including free space and dead tuples.

CREATE EXTENSION IF NOT EXISTS pgstattuple;

This will enable the pgstattuple extension, allowing you to analyze your tables’ physical layout. The function pgstattuple_approx() is handy for approximating free space and dead tuple size.

Step 2: Analyzing Table Bloat

Now, let’s dive into the query that helps us understand bloat and free space in tables:

SELECT
    n.nspname AS schemaname,
    c.relname AS tablename,
    pg_total_relation_size(c.oid) AS total_size,
    (pgstattuple_approx(c.oid)).approx_free_space AS free_space_bytes,
    (pgstattuple_approx(c.oid)).dead_tuple_len AS dead_tuples_bytes,
    pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len AS bloat_size,
    CASE 
        WHEN pg_total_relation_size(c.oid) > 0 
        THEN (100.0 * (pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len) / pg_total_relation_size(c.oid))::NUMERIC(5, 2)
        ELSE 0.00
    END AS bloat_ratio
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE
    c.relkind = 'r' -- Only tables
    AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY bloat_ratio DESC;

Explanation of the Query

  1. Table and Schema Names:
    • n.nspname AS schemaname: Retrieves the schema name.
    • c.relname AS tablename: Retrieves the table name.
  2. Table Size:
    • pg_total_relation_size(c.oid) AS total_size: This gives the total size of the table, including data, indexes, and TOAST (out-of-line storage for large values).
  3. Free Space and Dead Tuples:
    • (pgstattuple_approx(c.oid)).approx_free_space AS free_space_bytes: This returns the approximate free space within the table in bytes.
    • (pgstattuple_approx(c.oid)).dead_tuple_len AS dead_tuples_bytes: This returns the size of dead tuples (rows marked for deletion but not yet vacuumed).
  4. Bloat Size:
    • pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len AS bloat_size: The difference between the total table size and the live tuple size represents the “bloat” caused by unused space within the table.
  5. Bloat Ratio:
    • The CASE statement calculates the bloat ratio as a percentage:

    CASE WHEN pg_total_relation_size(c.oid) > 0 THEN (100.0 * (pg_total_relation_size(c.oid) - (pgstattuple_approx(c.oid)).approx_tuple_len) / pg_total_relation_size(c.oid))::NUMERIC(5, 2) ELSE 0.00 END AS bloat_ratio

    This gives you the percentage of space in the table wasted due to bloat. A higher ratio indicates a higher level of bloat.

  6. Filtering and Sorting:
    • WHERE c.relkind = 'r': Filters the results only to include regular tables (ignores indexes, views, etc.).
    • AND n.nspname NOT IN ('pg_catalog', 'information_schema'): Excludes system schemas.
    • ORDER BY bloat_ratio DESC: Orders the tables by their bloat ratio, with the most bloated tables appearing at the top.

Step 3: Interpreting the Results

The result of this query gives you a clear view of table bloat in your PostgreSQL database:

  • schemaname: The schema the table resides in.
  • tablename: The name of the table.
  • total_size: The total size of the table, including data, indexes, and TOAST.
  • free_space_bytes: The approximate amount of unused space in the table.
  • dead_tuples_bytes: The size of dead tuples (rows marked for deletion but not yet cleaned up).
  • bloat_size: The amount of space wasted due to dead tuples and fragmentation.
  • bloat_ratio: The percentage of the table’s size that is consumed by bloat.

Step 4: Addressing Table Bloat

If you find that some tables have a significant amount of bloat, it may be time to run a VACUUM or REINDEX operation. This is a topic for another time. Right now, I just wanted to find out how much free space there is to determine how big my dump file will be.

  • VACUUM: This cleans up dead tuples, allowing space to be reused.
  • REINDEX: Rebuilds the indexes to reclaim space consumed by fragmentation.

The post How to find free space in Azure PosgreSQL appeared first on sqlkitty.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating