October 9, 2013 at 10:46 am
Hey all,
I need some help in trying to decide a few things; mainly if I should create a single DB or multiple.
There are 6 database files that are currently setup in MS access most of which do not communicate with one another.
These databases all share a common theme which is basic client info, employment info, etc... and then the rest of each one of those database files are specific tables to that DB only.
I would like to move these into SQL and have a singular source for the client and employment info.
So my question leads into do I create a "Client" database that has all the tables that these other applications access and then separate databases for each of the other MS Access DBs? Or just put them all in one DB and be smart with naming conventions to logically separate them out.
It's important to note that most of these databases are for programs based off of grant money. So when the grant is over, that data will just sit dormant. Also, when we get new grants, I will have to add to it which ever way I go, be it another database or just more tables in the singular DB.
I was leaning the 1 DB way with all the tables for the programs in it, but am wondering if anyone can give me some real life experiences or something so I know better of which way to go.
Thanks!!!!!
October 9, 2013 at 1:37 pm
Here are a couple of things that I routinely deal with:
1) If you routinely refresh data in a test server, it's much easier to restore a database than to more over just some of the tables. The former requires scripting out security, the latter requires scripting out foreign keys, and dropping them first before the refresh.
2) If different backup cycles are needed for different portions of the data. And if some of the data is static, then read-only filegroups would save on backup space but make for a more complicated restore.
October 10, 2013 at 1:58 am
October 10, 2013 at 7:31 pm
Sean Pearce (10/10/2013)
Data that needs to be transactionally consistent must be in the same database.
That's a remarkable statement. MS technology allows reliable transactional consistency across multiple databases in multiple servers, including other server types as well as MS SQL Server instances, so it is really amazing if it doesn't provide it on a single server. I Believe there isn't a transactional consistency issue, but there is a recovery issue: if you recover database A as it was last Tuesday and database B as it was last Wednesday you will not get consistent results, just as you won't get consistent results if you restore several several filegroups of a database to several different times. That means that a degree of discipline is needed in handling backup and recovery, it can't be done at random; it doesn't mean you can't have transactional consistency. Of course if you want to use snapshot isolation, it must be enabled for all the databases involved in the transaction (that is explicitly stated in BoL), but the other isolation levels don't require prior enablement. I guess you may need to use full recovery model and maybe also marked transactions to ensure that recovery of each database is to the same point if you have a continuous operation (ie can't quiesce things to take a backup).
I guess I may have this all wrong - it's years since I last had to worry about this stuff, but I think I remember it all working just fine (not with snapshot isolation, SQL Server didn't yet have that, we used serializable isolation for the particular system) and without needing to produce any special Active-X or COM gadgets. If you can point at any documentation that says transactional consistency is a problem with multiple databases it would be useful.
Tom
October 11, 2013 at 1:11 am
Hi Tom,
Here is a blog post regarding this:
http://sqlblog.com/blogs/merrill_aldrich/archive/2011/11/25/one-database-or-ten.aspx
The most important idea about the log is this: the only way to get a transactionally consistent set of data over time is to have either just one stream of journal (log records) for the data, or to have multiple streams of journal that are robustly tied together, lock step. Because each database in SQL Server has its own, independent log file, the first situation is simple to implement – just use one database – while the second is very, very difficult. Splitting data into more than one database implies multiple log files, which in turn implies an independent log sequence. Having separate transaction log sequences means that, while it’s initially possible to perform a transaction across the two databases consistently and correctly, there’s very little guarantee after that transaction, in the space of backup and restore or disaster recovery, especially out in the real world.
I have a client that suffers from inconsistencies due to having dependent data on multiple databases. Perhaps you are correct that it is more a recovery issue than a transactional consistency issue, but the fact remains that it is very difficult to ensure consistent data across multiple databases.
October 11, 2013 at 7:05 am
Sean Pearce (10/11/2013)
Hi Tom,Here is a blog post regarding this:
http://sqlblog.com/blogs/merrill_aldrich/archive/2011/11/25/one-database-or-ten.aspx
Yes, that's quite a good piece.
Apart from thinking that people usually don't use the features that make recovery in the presence of multi-database transactions work, his point of view seems to be much the same as mine - there is no problem with transactional consistency until you run into the need for recovery, and then you need to have set things up correctly and have the necessary procedures in place to make recovery possible - it won't be possible if you don't do that. And his remark (for the multi-teneant scenario)
There are some administrative challenges to managing a whole collection of databases, but that can be solved with semi-clever code, tools, and rigor around implementation
suggests that we are not even too far apart in emphasis. But my previous post probably wasn't suffficiently clear about the need for "code, tools, and rigor". I said "a degree of discipline" when I meant "careful design, choosing the right tools, and having the right procedures in place and ensuring that they are followed rigorously".
But he appears to think that simple recovery model is much more commonly used than I have seen, and that no-one uses the full recovery model and recovery to point in time (or to marked transaction) and as a result of that view he expects the problems of restoring to be generally much nastier than they acually are for people who alreadyuse those features. Even if people don't normally use full recovery, it is possible to switch all the databases from simple recovery model to full recovery model immediately before starting to take any full backups, and after all the full backups have completed take a log backup for each database and then switch back to simple recovery - that allows you to use point in time (or marked transaction) recovery to a point shortly after the last full backup end time without suffering the backup storage (and IO) penalty of running all the time with full recovery model - it gives you equivalent capability to what you would have for a single database using simple recovery model, in return for not much storage and a slightly more complex backup system (which can easily be automated, of course).
Also, he doesn't consider systems which have chunks of time when no user transactions which write to database are allowed - for example systems where a period overnight is reserved for MI queries (which may write reports into a separate database) and for administrative tasks like index rebuilds; doing full (or incremental) backups during such a period ensures transactional consistency on recovery, since the databases concerned are in a static transactionally consistent state while the backups are happening. This works perfectly even with the sort of simple recovery model only system he thinks is far more common than full recovery.
Another point is that transactional consistency is maintained automatically if each transaction updates only one of the several databases. In my experience it is very often possible to split the data in such a way that that is true. He mentions a couple of generic scenarios for that, but there are specific ones that don't fit into either of those generic descriptions (Brent Ozar's comment on the blog mentions two of them; another one is where history deletions in an active database is delayed until confirmation of secure backup of an archive database containing the data is available, which is actually a pretty common technique, often done manually but sometimes automated).
Tom
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply