Planning for growth -oltp - what strategies are people using?

  • We have an on-line store type of application.We currently have one instance with several databases. A few of the databases are central to the system others are specific to an area.

    We are trying to plan for growth.

    I was curious what other large SQL Server shops do to balance load as the systems get larger.

    Have people replicated core databases across different data centers?

    Is anyone using distributed partitioning across servers?

    What types of design decisions have people made and why

    We would be using SQL Server 2008

  • How large is large?

    Typically SQL scales better up than out. Good hardware, optimised queries, appropriate indexes will get you a long way.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • We drove secondary ads for Expedia.com. A single SQL Server handled things just fine. It was more important to balance 3 identical web servers that had a pretty good cache each.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We have about 25 million customers. The number of distinct visits to our sites each day varies between 25,000 and 30,000.

    We are only using about 600 Gig of database disk space right now - but expect to double in size in a few months, number of visits should be more than double. As we grow and plan, we are wondering about architecting for multiple data centers. Web servers , apps servers in several data centers - what about the database? We are trying to decide if it should be in one data center only (with failover), if we should partition data across servers (in multiple data centers eventually?). Should we replicate core databases to various data centers or have readonly copies of some data? We are wondering what has worked well for other people and what has not

    (I am also working in inefficiencies in the current queries, archiving old data, reviewing hardware needs, etc)

  • KarenD (7/7/2010)


    We have about 25 million customers. The number of distinct visits to our sites each day varies between 25,000 and 30,000.

    We are only using about 600 Gig of database disk space right now - but expect to double in size in a few months, number of visits should be more than double. As we grow and plan, we are wondering about architecting for multiple data centers. Web servers , apps servers in several data centers - what about the database? We are trying to decide if it should be in one data center only (with failover), if we should partition data across servers (in multiple data centers eventually?). Should we replicate core databases to various data centers or have readonly copies of some data? We are wondering what has worked well for other people and what has not

    (I am also working in inefficiencies in the current queries, archiving old data, reviewing hardware needs, etc)

    60,000 visits in 10 hours (assuming local logins, here) is less than two visits per second. Unless the hardware is horrible and the queries were written by an embedded cursor zealot, you shouldn't have much of a problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The generic approach is to upgrade to a SQL Server cluster (where multiple SQL Server instances concurrently access the suite of databases), and implement asynchronous database mirroring (to an failover site).

  • charlesbc (7/8/2010)


    The generic approach is to upgrade to a SQL Server cluster (where multiple SQL Server instances concurrently access the suite of databases)

    SQL Server doesn't support scaleout-clusters like you describe. There's a way to make multiple instances read one set of data files, but only if the database is read-only. Or are you talking about something like peer-to-peer replication to get multiple copies of the database on multiple servers?

    SQL Clustering is solely about HA, where if one cluster node fails the other will become active.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • A cluster scenario is not solely for the purpose of HA. Within a multi-node cluster deployment multiple active SQL Server instances have concurrent read and write access to a given database (or suite of databases) in a resource group. The transaction load against the resource group is thereby balanced across the collection of active SQL Server instances (while appearing to be a single SQL service to the application layer).

  • charlesbc (7/8/2010)


    Within a multi-node cluster deployment multiple active SQL Server instances have concurrent read and write access to a given database (or suite of databases) in a resource group. The transaction load against the resource group is thereby balanced across the collection of active SQL Server instances (while appearing to be a single SQL service to the application layer).

    Unless you're thinking about SQL 2008 R2 parallel datawarehouse edition (which, afaik, hasn't even been released yet and is solely for star/snowflake schemas), no. SQL Server has no built-in scale-out capabilities. Multiple instances cannot read and write to a single database in a cluster or any other setup.

    It sounds like you're talking about Oracle RAC, which has no equivalent in SQL Server.

    The only way multiple SQL instances can use the same files is to use something called Scalable Shared Databases, and that requires that all databases are read-only.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • charlesbc (7/8/2010)


    A cluster scenario is not solely for the purpose of HA. Within a multi-node cluster deployment multiple active SQL Server instances have concurrent read and write access to a given database (or suite of databases) in a resource group. The transaction load against the resource group is thereby balanced across the collection of active SQL Server instances (while appearing to be a single SQL service to the application layer).

    I don't think this is correct. SQL Server doesn't support "load balancing" in any of its current incarnations. In a multi-node cluster, each SQL Server instance has read and write access only to its own databases. If there's failover, those instances are activated on other servers, but aren't used until that point.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Jeff Moden (7/7/2010)


    Unless ...the queries were written by an embedded cursor zealot, you shouldn't have much of a problem.

    Weren't you just teaching how to do this? 😀 (ducking and running!!!)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I think you are confused about how mutli-instance clusters works. You can spread out the load of a cluster by having multi-instance clusters with different instances active on different nodes, but at no time can more than 1 instance have access to a database.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • SQL Server does have read only, concurrent access on a SAN (http://msdn.microsoft.com/en-us/library/ms345392.aspx)

    There is, as mentioned, no concurrent read/write access to multiple databases. Even the Parallel Data Warehouse spreads the load out in an interesting way, but you are not accessing multiple databases from separate instances.

    In Active/ Active, node 1, with db1, completely controls that databases. Node 2, has db2 on it and completely controls that. Node 2 cannot access db1 until Node 1 fails.

  • GilaMonster (7/8/2010)


    It sounds like you're talking about Oracle RAC, which has no equivalent in SQL Server.

    Agreed. Exactly what I was thinking.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply