March 19, 2008 at 12:13 pm
Hello,
I've been discussing with some collegues about a DB design question and I would like to have your input. We are setting up an architecture which will host an application with potentially hundreds of users. The question is how to best manage the users taking into account the following:
- data maintenance operations (backup/restore) should have a minimum impact on the users which are not concerned with the maintenance.
- performance is very important as we are talking about large amounts of data (about 200Go per user - we store files in the DB)
3 options were suggested:
- make one shared DB and map each user to a schema, this means duplicating all the objects per schema and I am against it because it would mean thousands of objects to manage in the DB.
- one DB which hosts everybody and then use partitioning etc to split data across filegroups. If necessary some users data could be moved to dedicated DB's. This would mean user management at application level and is my favourite option.
- one DB per user meaning we might end up with hundreds of DBs. The arguments for this option are: 'isolated maintenance' which should impact only one customer at a time (if they are not on the same filegroup), and also the security that there is no way one user could see the data of another one....
This last option is the one I am trying to drive them away from but I need arguments other than huge administrative effort, any idea of what could be the drawback (advantages?!) in terms of performance? Does SQL Server store one execution plan per DB meaning that even if there are 10 identical stores procedures in 10 Db,s the server will store 10 times the plan in the cache? Any other ideas ?
Thanks a lot!
March 20, 2008 at 1:03 am
Pavle (3/19/2008)
Does SQL Server store one execution plan per DB meaning that even if there are 10 identical stores procedures in 10 Db,s the server will store 10 times the plan in the cache?
Correct. Plan's are specific to a DB.
Will you need referenctial integrity? It's not possible to do foreign keys across databases.
Rad through this. It's an argument against splitting VLDBs into several smaller ones.
Split a VLDB into filegroups or smaller databases for backups?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 20, 2008 at 4:16 am
Thanks for replying.
Correct. Plan's are specific to a DB.
-> so I suppose that having hundreds of DB's with the same exact DB structure but different data could kill the server for this very reason
Will you need referenctial integrity? It's not possible to do foreign keys across databases.
We will need referential integrity but not across DBs. I forgot to mention that the idea was to have every DB identical in structure but different in data.
Rad through this. It's an argument against splitting VLDBs into several smaller ones.
thanks for that link, I found the following comments interesting:
You have multiple transaction logs to manage instead of one. This means you need to be doing log backups of ALL the databases, vastly increasing the number of backup files to manage. -
This is why I think of the administrative work as gigantic, not only because of log files but also because of data files that DBAs will need to manage, I've never come across such an environment (a few servers hosting hundreds of DBs) but then I don't have that much experience. I was wondering if there is a way that i'm not aware of to manage so many DBs easily or is it just not recommended.
Point-in-time recovery becomes very hard as you have to restore ALL the databases to a single point-in-time. Now, this may not be too much of a problem if the data in the VLDB is essentially read-only, and gets updated en-masse every so often from your OLTP system - but for changing data it's a nightmare.
Our DB will not be read only but I don't think the point in time restores will be a common task (except ofcourse in case of a disaster recovery scenario)
All in all the main drawback I see are:
- the administrative cost (or are there similar system running fine?)
- performance (ie memory consumption)
Any other ideas, use cases, best practices for VLDBs ?
thanks
March 20, 2008 at 6:58 am
I would add to the Administrative costs a very high cost to implement changes. Tune one procedure, deploy it 200 times. Modify one column, rebuild 200 tables in 200 databases...[shudder]
I can't see this as a viable solution. Especially when you consider that most of your load can be distributed to multiple disks by partitioning storage. Instead of looking at splitting between lots of databases, what about splitting between servers. You may then see 5 or even 10 copies of your database, but on 5 or 10 different servers. You'll still run into similar issues like above, but not nearly as severe.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 20, 2008 at 7:28 am
Yep thanks,
a friend of mine suggested solving the 'customer isolation' problem by making one filegroup per user (and ending up with one DB split across hundreds of filegroups).
This could be spread across servers to keep for example each server with a DB with no more than 200 filegroups. There would be no more backup/restore/maintenance issues but I wonder:
- has anybody used a large number of them for VLDBs ?
- how does this fit on SAN and are there any considerations to take into account (specific to a SAN)
March 20, 2008 at 11:41 pm
Ooooh fun!
Initial of thoughts - single database, not 200, files in the file system/on a shared volume or clustered shared volume, database is a pointer to the file system...
Next thought - SQL 2008, file system/filestream might be valuable here.
Final thought - put the actual file content "in the cloud" and keep nothing but a pointer to "the cloud" in your database, make the cloud do all the work. This might be timely, http://www.microsoft.com/sql/dataservices/default.mspx, though other folks like Amazon (S3) offer similar services today.
Joe
March 21, 2008 at 6:49 am
Yes, it is possible to run and maintain hundreds of DBs on a server where all are used by the same application code - I have worked on a system like this before. But, I didn't work on the admin side, so my knowledge about hardware architecture used there is limited. The user logins drive which DB a given user ends up connected to.
For maintenance and procedural/table structure version updates, rollouts were done via scripts and a programmatic process executes those scripts across all DBs. To simplify things there was one DB that contains a list of all DBs (and what sever each is on; there were a few servers set-up with bunches of client DBs) that are used by the application and mappings for which user goes with which DB (basically a master list). While that explanation is very oversimplified, its the general idea. Having a "master list" DB and a simple "upgrade utility" that can read the master list and execute files from a directory against each DB in the master list dramatically reduces maintenance overhead.
One thing on the hardware side, make sure TempDB does not share the same drive as any of the other DBs. If it does, in this type of environment, you will end up with write contention related to TempDB and the blocks, while processes wait to write to that drive, can escalate rapidly into a serious performance problem when multiple users access their DBs simultaneously.
March 21, 2008 at 3:03 pm
I don't think that make any sense to have one DB per user. it means if you have 2000 user , you will create 2000 database on the server.
I think you can do other things in a similar fashion. You can actually, do the partitioned for Database, tables and indexes.
In our company we have created different file group for the database and those file groups are on different drives. We have also partitioned the tables and also Indexes are on different drives too.
March 21, 2008 at 3:49 pm
I think one database per user will result in an impossible to administer application and horrible application performance.
You will regret it immediately and every moment after that.
March 24, 2008 at 10:24 am
thank you all for your input! It helped!
March 25, 2008 at 12:02 pm
My preference would be a single database and use partitioned views to allow segregated access to the data. There is a possibility of a minor performance hit, but it would make maintenance a lot easier (and I mean a LOT easier!), adding new clients/users would simply be a matter of implemented a new partitioned view or set of partitioned views and the filegroups could be added/removed as needed.
Hope this helps!
Cogiko ergo sum (I geek, therefore I am).
March 25, 2008 at 12:20 pm
I'm not sure I saw an answer to this - how is the data being used? Is it being access by a common front-end? Do each of the customers have their own front-ends? Where are those being stored?
Depending on how your users use the database, you may find you don't have a "choice" in how to implement. For example, if there's a remote client component, you may HAVE to have separate databases unless you plan on dictating to thousands of customers PRECISELY when they will need to upgrade their clients to talk to an updated DB. Also - if you license/charge "a la carte" for modules, maintaining the logic dynamically for "who has access to what" can quickly become a bigger drain than the admin task of manually updating/upgrading individual database instances. Never mind all of the logic you need to make sure that each customer only sees their own stuff, etc... You may end up with a simpler, cleaner and faster model with individual databases in those scenarios.
It also becomes much harder to customize any one thing for a specific customer in that way.
Also - change control becomes tricky with lots of customers, regardless of whether you have one DB or individual ones. One "bad" change in a single DB, and you take down EVERYONE. Not so for the individual DB model.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 26, 2008 at 4:22 am
I thought the topic was dead but I guess people still like it 🙂
To answer your questions:
- data will be accessed through a single web app which will offer the same services to all the customers. The customization will be minimal and it should be part of the DB design requirements. The idea is that if however we start doing major changes for a sinle user then that user's data will be migrated to a separate DB and will become independant. If all the user rights managament is done one the app side I see the DB as having only a single user which has exec rights on stored procedured and that's it, the app makes sure the right sp is called at the right time.
- regarding change control it's an interesting point: I agree deploying code on a single DB which contains everybody is dangerous and that's why I thought of spreading out users across db's/servers so as to be able to maintain them (DBs and /or servers) one by one. The number of DBs will not be 1:1 with the user number but should instead be scaled depending on the size of users's data, their criticity as customers, timezones & peak times, special requests (if they are ready to pay for an individual DB why not) etc.
And on another note, I'm glad I started posting on this forum, seems more active than MSDN and full of bright ideas ! Thank you all.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply