A Case for Document Storage One of the challenges for both database developers and administrators is doing more, often with less. Many companies continue to grow their database estate, both in width with more platforms, and in depth with more instances of the platforms they have. Some companies will look to shrink their staff, especially when adopting a cloud platform, while others may add more databases, but not increase staffing to match the additional load. In either case, what many have found over the years is that the cost of labor is high. Both for developers that write code against databases, and administrators that manage those platforms. While licensing can seem to be a large number, compared to the cost of labor, it isn't usually a significant number. Often it seems administrators would prefer more of the same database platform. Developers often seem to ask for new types of database platforms, often some type of NoSQL data store. I ran across an article that makes a case for adding in document storage data stores to your environment, instead of just choosing am RDBMS. Labor is one of the big reasons for doing this. The other one is that for a given workload, the hardware cost is lower. The article opens talking about the object/relational mapping problems. There is some truth to the time and effort to map an object in an application to a table (or set of tables) in an RDBMS. There is some knowledge required to do this, but I also think it's an important skill for many developers. The same type of object mapping to a serialized JSON document is shown as being easier, and it is. However, if you add or change your object, the application code to handle the document from the data store gets complex. Over time, you will have lots of "new" fields that don't exist in older documents. How do you handle those? It's not hard, but labor is required to write this code. And this code has to be maintained over time. The other argument is that less hardware is needed, made by noting all the data you may need can be co-located with your object. This is what we would call denormalization in an RDBMS and leads to data duplication? Whether that is a problem or not depends on the amount of duplication. Certainly the structure of an application that often works to send or retrieve singleton rows is easier in a document database. However, non trivial queries, which the author postulates are hard to write for developers, are likely hard to run for a document database. The load of querying across lots of rows, or updating them, is much higher in a document database. Depending on how often you update data, this can be an issue, and require more hardware. Which is better? The classic "it depends" applies here. Database modeling is important in both cases. As I've worked with people that move to NoSQL databases, I find they struggle to model in that world as much as many of us struggle to model in the RDBMS world. I also find that a NoSQL database often is going to require some sort of data warehouse or other structure that is built for reporting across documents. I'm not against the various types of NoSQL databases, but I also don't think they are a panacea of any sort that magically makes building and operating an application easier. Steve Jones - SSC Editor Join the debate, and respond to today's editorial on the forums |