Almost real time data for reporting database.

  • Hi,

    I have this bizarre situation where there are about 350 DB's having same structure and design and each DB having 20000 or so objects.The size of each DB range anywhere from 30 GB to 150 GB.

    Now there are some issues as the reports are being fetched from these OLTP databases and the time the report execution is taking is unacceptable.The client wants to segregate each OLTP database to a separate reporting database which should be in read & write mode and that the reports should be executed on those reporting databases.The data on the reporting database should be almost real time with the max acceptable latency being 5 mins.

    So are there better ways to achieve this ? Thought of snapshot,log shipping and replication.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • with LogShipping, you cant write to database.

    So the report is required from all database or some specific database? How big are the reporting database?

    ----------
    Ashish

  • Thanks for the reply Ashish.

    The client needs a separate reporting database for each OLTP DB.So if he has 350 DB's he wants 350 reporting databases each containing 20000 tables(not sure why does his DB contains so many objects and he does not want to review or redesign of the structure).

    The most irritating part is that the client gives a new separate DB to each customer that he gets for his product.So the number of DB's might increase as his customer base increases.

    I also thought that Log shipping might be the best solution but just was a bit hasty on the part that the latency period beyond 5 mins is unacceptable for the client.Any drawbacks of this approach and if we do go ahead with this implementation so what kind of infrastructure will we need and the precautions that will have to be taken.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • this is definitely not the well designed requirement. As you mentioned the database size and numbers, you definitely need a spacious lab to host these report database.

    Having said that, I will not agree with this approach and will suggest to create one lab environment for reports where as per requirement the required database can be restored and reports can be generated.

    ----------
    Ashish

  • Hi,

    As the size of each DB is huge and 350 DB's are there, then OLAP will definitely take time to load and it won't be a Real Time. ETL will take time.

    Another way is that point the reports sproc direct to the OLTP DB (which is not a good practice) and keeping in mind the Query Optimization rules and Hints.

    regards

    PG

  • If you want Reporting Dbs to be in R\W mode, you can't use LS or DB mirroring. You need to use Replication but the setup would be a huge task. The design you are referring too is not a good one. 350 dbs need atleast a bunch of DBAs to maintain it.

    Thank You,

    Best Regards,

    SQLBuddy

  • sqlbuddy123 (9/13/2011)


    If you want Reporting Dbs to be in R\W mode, you can't use LS or DB mirroring. You need to use Replication but the setup would be a huge task. The design you are referring too is not a good one. 350 dbs need atleast a bunch of DBAs to maintain it.

    Thank You,

    Best Regards,

    SQLBuddy

    But what if I implement LS and set the secondary sever as my reporting server and set the recovery state to RESTORE WITH RECOVERY.

    --------------------------------------------------------------------------------------------------
    I am just an another naive wannabe DBA trying to learn SQL Server

  • LS won't work..

    Thank You,

    Best Regards,

    SQLBuddy

  • You're key issues are these three requirements.

    -350 duplicated databases

    -Near Real Time

    -R/W mode

    In my experience, its going to be very difficult to provide all of these from a single source instance.

    Logshipping is read only but you can get round this using a logical database and views onto the source data. It also requires that connections are disconnected to restore additional transaction logs. You arent clear on the required availability of this data but I'm going to assume we can discount logshipping.

    This leaves replication or mirroring. Mirroring is not readable without a database snapshots (enterprise only $$$) and will also be read only in a similar way to logshipping. Not ideal and probably not feasable.

    This leaves replication. Transactional would be my preference for real time. The database itself will be writable but unless you implement updating subscribers or peer to peer the updates will not reflect at source. You WILL begin to hit issues running 350 distribution agents and logreader agents on a single server so you will want to look at distributing this load over many servers. This would be my preference but 350 databases is well above the tested limits of the technology.

    Another thought, you could look at scalable shared databases with "maximising availability". This is a little known feature. I've not used it but it could provide a balance between logshipping and availablity.

  • This is the scalable shared database refresh phase.

    http://technet.microsoft.com/en-us/library/ms345367.aspx

    It could work but would require some logic to make the data switch.

  • How about sql replication? or if on a san talk to your san guys they can replicate the data from san side.

  • Transactional Replication is the closest you can think of implementing at this stage.

    You can also wait for some time before new version of sql server is released where in you can have a read-only replicas of the databases.

    http://msdn.microsoft.com/en-us/library/ff878253(v=sql.110).aspx"> http://msdn.microsoft.com/en-us/library/ff878253(v=sql.110).aspx



    Pradeep Singh

  • As previously mentioned, you will get issues with distribution agents on the distributor or subscriber once you hit 50-80 agents depending on your system.

    I'm pretty sure, replicating the files at the san level will make them available but you wont be able to mount them unless you use the scalable shared database option

  • MysteryJimbo (9/16/2011)


    You're key issues are these three requirements.

    -350 duplicated databases

    -Near Real Time

    -R/W mode

    - 10 to 52 Terabytes of space for the OLTP instances.

    - Unknown activity; particularly, unknown peak activity (do they all get reindexed at the same time?)

    You'll need to explain read/write mode in more detail; axiomatically, you can't keep a single given DW object both in sync with the original _and_ let changes cause it to diverge from the original simultaneously.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply