One of the questions I constantly see asked about on SQLServerCentral is how can you scale out the database. How can you easily keep multiple copes if your data in sync with one another on different servers, and allow clients to read from any of those servers. The Always On features in SQL Server 2012 will allow readable secondaries, something that wasn't possible in database mirroring, giving you up to date reads of your data, which is in sync with your primary database.
However is that really necessary for many applications? The largest applications in the world, Google's search engine, Facebook, and more, are turning to NoSQL databases and storage to handle their loads. I ran across a nice piece on Ars Technica that talks about how these companies handle the large data storage challenges they have. The one very interesting thing in the piece is the way the consistency challenges are talked about. These phrases, "designed with less concern for consistency of data across the system", "jobs in progress will still hit stale data", and "(it) is entirely okay with serving up stale data", would scare most DBAs I know.
But should they? For some applications, it is important that we maintain consistency across nodes. The classic bank account example requires that a withdrawal and a deposit between accounts are totally committed or rolled back to ensure the proper balances. However that doesn't mean that every node in the banking system knows your balance. Only those times when a transaction affects the balance, do we need to be sure of the actual value. Deposits are sometimes not reflected immediately on another node, say a remote ATM. Withdrawals are sometimes approved, even with modern checking machines, when there are insufficient funds, resulting in bounced checks.
In many of our applications, we are told we need consistency, but I think that's a goal. It's like 100% uptime, which is rarely met, and almost never funded. Think about a report. A User might run the report, assuming that the data is accurate, and it may be, but 14ms later it might not be because of changes. If the report had been run 14ms earlier, it might have missed other changes. Most people understand that, even if they are asking for consistent behavior of the system.
With that in mind, should we be looking for more distributed architectures in our applications? I think that replication and service broker, both excellent techniques for scaling out, should become a more regular tool for our applications, and should receive more attention in future versions of SQL Server. We are only acquiring more and more data, and while hardware continues to get more powerful, we are outpacing the developments in storage bandwidth. We should start thinking about how to anticipate future challenges in application load, not reacting to them later.
Steve Jones
The Voice of the DBA Podcasts
We publish three versions of the podcast each day for you to enjoy.
- Watch the Windows Media Podcast - 30.8MB WMV
- Watch the iPod Video Podcast - 25.7MB MP4
- Watch the MP3 Audio Podcast - 5.1MB MP3
The podcast feeds are available at sqlservercentral.mevio.com. Comments are definitely appreciated and wanted, and you can get feeds from there. Overall RSS Feed: or now on iTunes!
Today's podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.
You can also follow Steve Jones on Twitter: