July 23, 2008 at 11:51 am
I have a project where there is a need to create a read-only database for reporting, from the transactional tables in the application DB. Principal reason is to eliminate resource contention between application processes, and user reporting activity (separation of work loads). Secondary reasons involve ease of report creation, and report execution speed on a schema designed for reporting.
In a normal scenario, we would simply choose either replication or snapshots to provide copies of transactional tables, and use SSIS to transform data as necessary. But there are some circumstances that get in the way:
1) The application is "boxed" - instead of the locally managed instances I am more familiar with, this is an application that is installed at client sites, and (hopefully) managed by their IT staff. Although in many instances, local DBAs are not available, thus most maintenance must be automated (Agent jobs, etc.)
2) Use of replication is a concern, from the standpoint of implementation and support complexity (we are not in the business of Remote DBA services). We sometimes run into problems where the current simple, single database implementations run into problems due to lack of local attention, or the occasional unintentional interference. Thus, I am concerned about replication adding more calls from the field for support.
3) Many of the sites are licensed for SQL Server 2005 Standard, not Enterprise, which limits the ability to use database snapshots.
4) There is a requirement for low latency between the application/transactional data, and the reporting data. "Real-time" is a term used in by business, but that expectation is being successfully managed. However in some cases, "live" data must present in the reports, with an optimal delay of no more than 5 minutes.
5) Currently, of the 350 tables in the application DB, roughly 70 (or ~20%) are consumed by the reports. The physical size on disk of the data varies greatly between clients.
I believe the use of an ETL mechanism, such as SSIS, would minimize the support and implementation complexity, but have not personally considered it in the past for a low latency scenario (as replication and snapshots were an option locally). Running packages every few minutes may not be as efficient as transactional replication that continuously moves data.
Given the above environmental concerns, can anyone offer opinions, or point to articles dealing with such scenarios?
July 23, 2008 at 12:08 pm
My experience with replication and DTS (not so much SSIS) is that they work great until they don't. Then it's a pain.
If you are worried about support calls, I'd think about setting up a separate filegroup, hopefully separate disks, and then denormalize or copy some data for reporting into this area. It doesn't help with some of the contention, but unless you move to separate physical boxes, you'll have resource contention. This is also simpler than a reporting database on the same box, and separate DB still brings with it tempdb contention.
July 23, 2008 at 12:26 pm
Steve Jones - Editor (7/23/2008)
My experience with replication and DTS (not so much SSIS) is that they work great until they don't. Then it's a pain.
My experience as well. In house it can be bad enough...
If you are worried about support calls, I'd think about setting up a separate filegroup, hopefully separate disks, and then denormalize or copy some data for reporting into this area. It doesn't help with some of the contention, but unless you move to separate physical boxes, you'll have resource contention. This is also simpler than a reporting database on the same box, and separate DB still brings with it tempdb contention.
Contention I can design for, and is the reason for this thread. The goal is to separate reporting to a separate physical instance.
What mechanism would you use for copying data in your above example?
July 23, 2008 at 1:32 pm
It will depend on some of the deatils of the requirements.
It has been my experience that complete real-time is often not really what is needed (or even wanted). Do you need EVERYTHING real-time, or just some of the data? If it is only some of the data, and you have a nightly window to re-sync everything else, you can make simpler decisions.
Something like:
- a nightly backup / restore process
- during the day keep track of data changes on the important data elements with triggers and using service broker or just a scheduled job to transfer the data to your reporting instance
With SQL 2008, you will get CDC - which is basically a way to tracking what has changed so you can update this type of real-time reporting solution. CDC is nice because it uses the transaction log and an agent, but doing the same thing with some simple triggers to queue what has changed and a separate process to move that data is not hard to produce. I would tend to use service broker, just because I am used to using it, but some triggers that write to a table and a SQL agent job that runs a stored procedure would work pretty well.
I like replication and use it regularly, and I am a big fan of SSIS, but for something that will get deployed where there will not be an on-staff DBA, that's a pretty big risk. They are too flexible - and there is a GUI to mess with them. Even if it works flawlessly, you may get Joe the server admin going in and clicking a checkbox that blows the whole thing up. And he will, of course, uncheck the thing before calling support claiming it broke on it's own.
July 23, 2008 at 2:42 pm
Michael Earl (7/23/2008)
It will depend on some of the deatils of the requirements.It has been my experience that complete real-time is often not really what is needed (or even wanted). Do you need EVERYTHING real-time, or just some of the data? If it is only some of the data, and you have a nightly window to re-sync everything else, you can make simpler decisions.
In my experience as well, real time reporting is often unnecessary, but in this case, certain business activities require recent data to support decision and approval processes. In some cases it is a dashboard type of activity, in others, reports run from tools such as SSRS or Crystal (desktop). Thus a reasonable definition of "Near Real Time" is in discussion here, i.e. relative low latency. Nightly resync will not be frequent enough.
Of course we can join nightly updated reporting tables with live transactional data where necessary, but we then limit the benefit of separating the reporting load from the transactions.
I would tend to use service broker
How would you use Service Broker in a scenario as this? It is one of the 2005 components I have not become familiar with as yet.
July 23, 2008 at 3:57 pm
Matt (7/23/2008)
Michael Earl (7/23/2008)
It will depend on some of the deatils of the requirements.It has been my experience that complete real-time is often not really what is needed (or even wanted). Do you need EVERYTHING real-time, or just some of the data? If it is only some of the data, and you have a nightly window to re-sync everything else, you can make simpler decisions.
In my experience as well, real time reporting is often unnecessary, but in this case, certain business activities require recent data to support decision and approval processes. In some cases it is a dashboard type of activity, in others, reports run from tools such as SSRS or Crystal (desktop). Thus a reasonable definition of "Near Real Time" is in discussion here, i.e. relative low latency. Nightly resync will not be frequent enough.
Of course we can join nightly updated reporting tables with live transactional data where necessary, but we then limit the benefit of separating the reporting load from the transactions.
I would tend to use service broker
How would you use Service Broker in a scenario as this? It is one of the 2005 components I have not become familiar with as yet.
Without having to get into using replication, or trying to roll your own pseudo-replication - I don't see any way you can get near real-time reporting for all data.
When faced with this type of requirement, I have found that the truly real-time data requirements are usually only the data that has been inserted/updated in the past xx hours. This kind of reporting is usually related to turnaround/performance type reporting (e.g. how many orders were filled in the last hour).
For the most part, those queries can be designed to be very efficient and generally don't block.
All of the other reporting is where we end up with blocking and performance issues. Reports that aggregate across the last month/quarter/year, etc... Those types of reports do not need even near real time data available.
So, what I have done is setup a reporting system (using SSIS or backup/restore) that is basically a lag system. Then, we move all reporting to the lag system that do not really need real time data.
I then focus on optimizing the reports that really do need real-time data and have those reports available on the production system.
I know this is not the best option, but it generally works well for me.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
July 24, 2008 at 5:21 am
As far as service broker.
Assuming you have a window at night to allow you to do a complete ETL to your reporting database, that would be my base process.
For everything that needed to be real-time, I would create a trigger on the tables that sends the ID for the table to service broker. Although there would be a performance hit, sending a message to service broker is a reasonably small hit, the SB queue would be in the same database, making it always available to reduce the chance of a messaging failure rolling back the updates being made by the application.
The service broker queue would have a process that pops off the incoming messages, builds the necessary data set, and sends it to a service broker queue on the second server. The queue on this server would put the data into the reporting database.
Now you could do this just putting changes into a table and having a stored procedure run every 5 minutes from the job agent, but SB would give you messaging that will make the process more reliable, gives you more on-demand processing of the queues, and allows you to multi-thread the process so multiple messages can be processed at the same time. You could do all of this manually in a process you write, but SB is already written, I am used to using it, and I would rather take the route of using something already written.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply