MySpace has been the Premier SQL Server site, and probably the largest SQL Server site, on the Internet. It’s a case study that is fascinating, and they’ve been surprisingly candid at times about how their systems are built.
Recently someone posted a link to a presentation from their lead DBA on how they’ve architected their 150 database servers. It’s over an hour long, but it’s very interesting.
I’ve seen some of this before, some of it in articles, some of it in presentations, but it’s always fascinating to hear how they moved through architectures as they ran into extreme scaling issues. The evolution of their architecture makes sense, and I think they have done a good job growing over time, despite the various pain points.
Thinking scalability is something you always want to consider, even though most of us will never hit those levels where we need thousands of spindles. A few of us will, but most of us won’t. Therefore while you want to think scale out early, you don’t want to get too wrapped up in it and don’t want to build too much of it early on. Scaling out is hard, takes administrative work, and slows development.
So how do you do it?
I’ve never had a huge system, but we have always talked scale as we moved along. Mostly we identified the places where scale would impact us and tried to be aware of where we could start to re-architect things if we want into scaling issues. In one financial company we worried about growth, and we were having issues, but upgrading from SQL 65 to 2000 and then going from a 2way to an 8 way box masked problems, and that was the best way to handle things.
However along the way we also identified that pricing was an issue for us. A constant stream of quotes for securities overwhelmed the system, so we planned for that to exist separately, used a separate connection in the app to get pricing, and then we could move that to a new database, instance, or server if needed. We never grew that far, but we thought about it and architected small changes in there.
This is a great video to watch, but it’s long, an hour presentation, and you have to pay some attention to it. The big takeaway? Scaling out requires most of the logic to live in the application, not the database. Not that you can’t have some logic in the database, but you need to be sure that you aren’t depending on a particular database, and you replicate that logic to all instances of the db.