In a recent post, I talked about SQL Server reads and how the data is stored in pages. One of the points I made in that post was that you should always use the smallest type that makes sense for a given set of data in order to minimize the IO required. I once worked on a system that constantly used data types that were greedier than needed (e.g INT instead of BIT, CHAR(200) where CHAR(80) or even better VARCHAR(80) would have sufficed) and had all sorts of issues with performance. One of the things we did to improve this was to go through the application/database and set everything to use the correct types. One of the things we noticed with this process was that initially page reads went up not down after correcting the data types, let’s look at why that was and how to fix it…
Run this to create a demo environment….
Hopefully you can already spot some potential issues with this…
- Username wont always be the same length so VARCHAR is a better fit than CHAR, also in this case I know we never use usernames longer than 50 characters.
- The 3 Boolean fields have all been given the type INT when they will only ever contain a 1 or 0, this should have used the BIT data type.
Before we correct these issues let’s insert some dummy data….
Lets then turn on STATISTICS IO and run a query selecting all the rows and see how many reads that causes…
(140220 rows affected)
Table ‘Users’. Scan count 1, logical reads 2200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So reading 140k records aused 2200 page reads.
Let’s now alter our column types the be more suitable for this data…
Now let’s run our select again and see if the read counts have dropped…
(140220 rows affected)
Table ‘Users’. Scan count 1, logical reads 4393, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
They almost doubled!! How can this be if we’ve shrunk the data types? This is because SQL Server wont reclaim the space used in every row for the old column type and is in fact storing our new BIT/VARCHAR data types along side the space already allocated for the old data types. The only way to then reclaim this space is to REBUILD the table if it’s a heap or rebuild the clustered index if it is not…
If we then try our select again…
(140220 rows affected)
Table ‘Users’. Scan count 1, logical reads 1191, physical reads 0, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
In our simple example we now have over 1000 less reads just by changing the types of 3 columns. On the system I was working on at the time IO dropped nearly 50% after correcting greedy data types as our largest most frequently run queries were constantly hitting these fields. One point to note is that unless you are using Enterprise edition REBUILDS must be done offline so make sure this is done in scheduled downtime if you’re doing this.