Rob Volk (blog|twitter) is our host for TSQL Tuesday this month and wants us to Explain a database subject like he’s five. Which, given what I know of Rob is oddly appropriate. So here is my favorite analogy for a table.
A Library.
Libraries contain books. Sometimes a few, sometimes a lot. There is a lot of information about each book. The author, when it was written, when it was published, the subject, etc.
If you have a pile of books (a heap) and want to find books by a specific author you have to go through every book and look for that author.
But what if you take those books and shelve them in some type of order? Say alphabetical by author and then title. Now, if you want to find all of the books by one author you can find them pretty quickly. Then you can scan through just those books. If you happen to need a specific title by a given author you can go directly there. This type of sort is the clustered index. It’s the physical location of each book. Note that because this is the physical order of the books we can only have the one clustered index.
Now let’s say you want all of the books about SQL Server databases. Again, you could just go through all of the books in the library. Or, you could take them down and re-sort them into a different order. Neither of those options is terribly efficient though.
So how about instead, we get an index card for each book. The first thing we will do is write down the author and title. It’s how we have the books physically sorted and will let us find the physical copy of the specific book we are looking for quickly and easily. Then we add the subject of the book to the card. Now we can sort all of those cards by subject and put them in a box.
At this point it’s easy to go to the box, find a given subject and pull a list of all the books on that subject. This is called a non-clustered index. It isn’t the physical copy of the books but it does have enough information to make it easy to find books A book could even be listed on several cards if they happen to cover multiple subjects. Unlike the clustered index, where we can only have one, we can have as many non-clustered indexes as we want. All we have to do is create new index cards for each one. And of course each index takes up space, and the more information we need to write down on the card the more space we take up.
From here I could go on about covering indexes: adding information to the index cards that we don’t need to sort on, but makes it so we don’t have to actually go back to each book. Or I could talk about how each shelf is like a storage area in a database called a page and how when the page gets full we have to split it and move some of the books to a new shelf. And so on and so on.