Implementing a reporting database off a live database

  • sjimmo

    just wondered, did you also replicate DDL as well, as OLTP requirements are different from reporting requirements, so things like indexing, how would you go about doing this ?

  • Gift Peddie (5/8/2009)


    Datawarehouse more expensive that replication with limited benefit if you don't have dimension modeling expert to use your data for cubes, it also comes with the risk for the project failing.

    Expense - Hardware aside, it depends on the complexity of the source data and what is required from the data warehouse.

    Cubes - These are optional and even without cubes you still have a very well structured source of report data

    Risk - All projects come with risk, however a well setup data warehouse will greatly reduce the risk of reports being badly written and wrong figures being quoted

  • ok fair enough, data warehouse. what exactly is a data warehouse and even if we were to do this, we need to get data from live to the warehouse in one manner or the other.

  • Just wondered how this would be done when you have loads of tables on the database.

    How would you track the following:

    Inserts

    Deletes

    Updates

    Its fine when the changes coming from the billing db to the reporting db is just inserts, that is kind of straight forward as you can easily have a marker on the table or use the new CDC on sql 2008, but i would be inclined to know how you would do this in situations where you have read/writes on the billing db.

    I have not used 2008 CDC but before 2008 you need a trigger to get update changes so you need a combination statements executed as a package to update the reporting database. Check below for known issues and options.

    http://sqlserver2000.databases.aspfaq.com/how-do-i-audit-changes-to-sql-server-data.html

    http://sqlserver2000.databases.aspfaq.com/can-sql-server-tell-me-which-row-was-inserted-most-recently.html

    Kind regards,
    Gift Peddie

  • Dean Jones (5/8/2009)


    ok fair enough, data warehouse. what exactly is a data warehouse and even if we were to do this, we need to get data from live to the warehouse in one manner or the other.

    Have a look here:

    http://blog.logeek.fr/2008/1/19/a-beginner-s-guide-to-datawarehouse

    He explains it better than me 🙂

    Of course you might find that dimensional modelling and all it entails a waste of time.

    In which case a data mart might be more useful...

  • If you've got SANs and/or other high-end infrastructure, then one thing that you should seriously consider is Mirror-splitting. I have implemented Reporting Databases for a number of clients and the high-end ones mostly use mirror-splitting as their replication method. (I am talking about SAN mirrors, not DB Mirrors).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • As I. That is why I did not elaborate. I was able to monitor exactly what Crystal does through a Lanalyzer - it was amazing, but keep in mind that this is a system that is built to go across many types of databases, not just one.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • I setup a table with the same field names, and primary key as the table which I am replicating from. This is required in order to perform replication.

    Fortunately though, that is the end of the requirements. I can add my own indexes on the replicated side of the database, indexes which are optimized for the various reports or other functions which I may be doing.

    I use this scenario frequently, especially in the event of some of the new 3rd party systems which we are using. If we perform reports directly against their database, we will negatively impact performance on the user community. So, we replicate (push or pull depending upon the system) to another server and database the tables which we need to perform BI portions of our business.

    It works well, without a lot of overhead on either server, yet when we want to do reports, we do not impact the other side.

    You can get into warehousing, but with your explanation, that is a lot of overkill and can easily leady one astray into a real quagmire of efficiencies.

    Just make sure that when you set up replication, that you do not want to create the table(s) or else every time you do a snapshot, it could recreate the table blowing away all of your custom indexes.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • Thanks for the point, points noted.

    So what you saying is that you have got the PK on every table which is required for transactional replication, but you are suggesting custom index on the subscriber as this is for reporting purposes and would obviously require seperate or same indexes for reporting services.

    My other problem is that, if the data is being moved across with the Pk, us there a way not to move the non clustered index as well.

    Also, there are Stored Procs on the subscriber already which is servicing web requests/reports, my idea is to have a new publication and not have the non clustered index DDL replicated accross, does this mean that I need to have a look at every stored proc and work out the required indexes on the respective table ?

    Is there an easy/straightforward way to do this ?

    Also, what is meant by the word sargable columns as candidates for non clustered indexes.

    thx

  • "Sargable" effectively means "can be used to search via an index". The general rule being that a column wrapped in a function is not sargable (there are some exceptions like the LEFT function and LIKE if there are no leading wildcards).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for the reply, was just thinking of a database which is being replicated, what would be the best approach to finding out the best columns that might benefit from having a particular kind of index on it.

    Also, what are the things to watch out for when performing maintenance on replicated objects. My concerns are, replicated tables will have inserts/updates/deletes on them which will effectively cause fragmentation and page splits on the database.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply