Workload splitting?

  • We have a SQL server which was originally designed for AdHoc reporting, based on various source systems. Later it slowly evolved into a data warehouse for company wide adhoc, on demand and scheduled reporting.

    However, it continued to evolve new functions as new business needs appeared. Now it also is becoming a business processing platform, and rapidly growing into an EDI server.

    The server has always been vastly underpowered, as the hardware group refuses to consider increaing the hardware (same hardware for 5+ years, and it was low end at that point). Now as all these other functions are expanding rapidly, I am seeing definate problems with not only performance, but scheduling, competition for resources and so on, typical for any overloaded box.

    What are some of the more common ways/splits used to handle these sorts of issues? My hunch/instinct would be spliting the functions out to different boxes by similar function/work loads. The catch here is the hardware people have entirely virtualized all servers. I am not entirely convienced breaking the load out to differnt VMs on the same hardware/SAN set would make much of a difference.

    Anyone have any thoughts or suggestions? At the moment with careful tuning and timing the server works without many bumps in the road, I am however seeing a likely major upswing in processes and projects which will be needed here, and considering ahead seems to be called for.

    Thanks!

  • David Lester (6/22/2011)


    We have a SQL server which was originally designed for AdHoc reporting, based on various source systems. Later it slowly evolved into a data warehouse for company wide adhoc, on demand and scheduled reporting.

    However, it continued to evolve new functions as new business needs appeared. Now it also is becoming a business processing platform, and rapidly growing into an EDI server.

    The server has always been vastly underpowered, as the hardware group refuses to consider increaing the hardware (same hardware for 5+ years, and it was low end at that point). Now as all these other functions are expanding rapidly, I am seeing definate problems with not only performance, but scheduling, competition for resources and so on, typical for any overloaded box.

    What are some of the more common ways/splits used to handle these sorts of issues? My hunch/instinct would be spliting the functions out to different boxes by similar function/work loads. The catch here is the hardware people have entirely virtualized all servers. I am not entirely convienced breaking the load out to differnt VMs on the same hardware/SAN set would make much of a difference.

    Anyone have any thoughts or suggestions? At the moment with careful tuning and timing the server works without many bumps in the road, I am however seeing a likely major upswing in processes and projects which will be needed here, and considering ahead seems to be called for.

    Could you elaborate on how are you planning to split/distribute workload?

    As far as I can see no matter how workload gets splitted I/O will keep pounding the very same SAN based I/O subsystem.

    On the other hand, as far as I know SQL Server technology does not allows having two or more instances actively pointing to the same "database" - like Oracle RAC does - therefore no matter how processing gets splitted all database requests will keep hitting the same SQL Server instance.

    Last but not least, I can see a gain in terms of processing power IF some ETL and Reporting tools are third party based like moving out of the SQL Server "host" your (making up here) *infomatica* and *cognos* logical servers - if this is the case I will work hard in having the database host running only the SQL Server instance.

    Hope this helps to initiate a dicussion on this matter.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks!

    I don't have a direct plan of any sort yet, I am looking to get the information and options first.

    I was thinking the same with regards to the SAN, as the actual I/O, will still be the same load and I/O channels no matter now many or few SQL vm's are created. Similarly, CPU and RAM strikes me as a similar situation. That is having one instance with two CPUs or two instances with one CPU is still using the same hardware, and still needs the same level of processing.

    Not being exceptionally well versed in VMs, I see them as a large pool of resources, if a process needs X amount of those resources to function, splitting it up into multiple vm instances will not reduce or offset that resource amount.

    My thinking on this so far strikes me as a "no way around" bigger hardware, or more of the VM allocated to the SQL instance. My preference would be returning to a physical box which is sized and configured correctly. Though I would need a Mt Fuji of absolute proof to manage that one.

  • David Lester (6/24/2011)


    Not being exceptionally well versed in VMs, I see them as a large pool of resources, if a process needs X amount of those resources to function, splitting it up into multiple vm instances will not reduce or offset that resource amount.

    Pretty much that, yes.

    Virtualization as VMWare does it - which is the underlying technology of many "Cloud Computing" providers - was designed as a way to facilitate resources assignment.

    The standard text book case goes like this...

    Scenario:

    You run two applications in two servers, App "A" and App "B".

    Both applications require at peak time four entire CPUs and 32 Gig of memory so in a non-virtualized environment you have to have two 4 CPUs 32 Gig servers.

    Further analysis shows App "A" has a peak between 2AM and 6AM while App "B" has its peak activity around 3PM.

    Virtualization allows you to have probably a single 4 CPUs 32 Gig physical server divided into two Virtual servers... you assign as much resources as you can to VM Server A from 2AM thru 6AM then balance resources between servers until around 3PM when you switch resources to VM Server B to take care of the load.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • David Lester (6/22/2011)


    What are some of the more common ways/splits used to handle these sorts of issues? My hunch/instinct would be spliting the functions out to different boxes by similar function/work loads. The catch here is the hardware people have entirely virtualized all servers. I am not entirely convienced breaking the load out to differnt VMs on the same hardware/SAN set would make much of a difference.

    As with any group, "hardware people" believe they know everything there is to know about their area of expertise and telling them what you need/think probably won't work. So, stick to the facts. In order to stick to the facts, you have to have some facts. 😉 You need to document the workload, the throughput, and the bottlenecks. Then have your boss give their boss a copy of the facts with the request of "please fix it". Then, publish the report every bloody week. If it goes on too long, get the GM involved.

    Now is the time to justify the need for hardware so they can get it into the budget for next year.

    The documentation will serve a second purpose... if the GM or other "C" level manager comes after you about the performance of the reports and the user experience, just hand him a copy of the documentation as say "Sure... but can we get a little help, please?"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul - Thank you! That confirms what I was thinking on how the virtualization would/should work. Which helps me determine the best place of focus.

    Jeff - Thanks as well! I have started compiling performance data with reported problems so as to tie the problems with a specific job or process to the hardware usage. Though, I must say fighting to get all these various jobs to run and finish in reasonable times has given me one heck of a crash course on performance tuning queries.

    To make things more fun, in the last week the two managers whom can decide such things are "no longer employed" here. Yay, fun! (*Sarcasm)

  • Actually, I think it may be possible for multiple SQL Server instances to share the same database files, but one requirement is that the database must be on a read-only volume. I've never used this setup, but it may be worth looking into.

    MSDN: Scalable Shared Databases Overview

    http://technet.microsoft.com/en-us/library/ms345392.aspx

    ...The scalable shared database feature allows you to scale out a read-only database built exclusively for reporting purposes (a reporting database). The reporting database must reside on a set of dedicated, read-only volumes whose primary purpose is hosting the database. Using commodity hardware for servers and volumes, you can scale out a reporting database that provides an identical view of the reporting data on multiple reporting servers...

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/27/2011)


    Actually, I think it may be possible for multiple SQL Server instances to share the same database files, but one requirement is that the database must be on a read-only volume. I've never used this setup, but it may be worth looking into.

    MSDN: Scalable Shared Databases Overview

    SSDO may help with reporting if bottleneck is processing power and not I/O but it would not help with ETL.

    Either way, nice to know Microsoft is circling around the issue.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • PaulB-TheOneAndOnly (6/29/2011)


    Eric M Russell (6/27/2011)


    Actually, I think it may be possible for multiple SQL Server instances to share the same database files, but one requirement is that the database must be on a read-only volume. I've never used this setup, but it may be worth looking into.

    MSDN: Scalable Shared Databases Overview

    SSDO may help with reporting if bottleneck is processing power and not I/O but it would not help with ETL.

    Either way, nice to know Microsoft is circling around the issue.

    I guess maybe it would be useful in a scenario where multiple OLTP servers or BI datamarts need to share a common repository of meta data that is periodically updated. Each SQL Server instance would maintain it's own separate buffer cache.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • VMWare allows a wide variety of tuning; if you don't know how your guest is tuned, and what else is on the host (and whether it's being moved among hosts or not), you'll never be able to predict performance. You need to work with the VMWare folks to understand things, if you're to be able to make accurate predictions (however good or bad those predictions are).

    A brief, non-comprehensive, and very simplistic overview:

    VMWare allows the following simple configurations:

    1) Selection of a maximum number of vCPU's (up to 8) per guest.

    2) Selection of a maximum amount of RAM

    3) Selection of a maximum amount of CPU (in Mhz) to be allocated among the vCPU's.

    4) The usual disk config, either managed through VMWare or direct passthrough to a SAN, with all the usual "figure out what spindles are involved, and how they're shared" requirements to understand performance.

    And some more advanced configurations:

    A) Reserved (guaranteed) amount of CPU

    B) Reserved (guaranteed) amount of RAM

    C) "Shares" to determine priority among many guests when there's contention for more CPU

    i) if you have two guests with 1000 shares each, 10Ghz total CPU available, and equal load on each guest, each guest gets 5Ghz

    ii) if you have two guests, one with 1000 shares and one with 3000 shares, in the situation above, one would get 2.5Ghz and one would get 7.5Ghz

    Thus, while all guests on the same host might be called equal... some guests can be more equal than others.

    On the up side, your VM can be moved to another, more powerful/less utilized host and/or disk setup without worrying about drivers* or reinstalling SQL Server within the guest.

    *Unless you have pass-through SAN access and change SAN configs.

  • Thanks all, great information!

    It does appear to me to be I/O as the problem for the most part, with some network traffic issues as well. From what I have been able to find out, it looks like we have two configured sets of hardware, one for vm/dumb terminal users, and the other which houses all 30+ sql servers in the organization.

    Even the vm group notice the I/O issues, however, rather than fixing it they claim the code running is the problem, as none of the other servers have issues. The catch of course is... the other SQL instances are for OLTP types of work, where no more than 10-50 records are touched in each transaction and less than 200 users are actively doing anything at once. On the other hand, this server is a warehouse, with vast amounts of analysis and summary type work, where the typical transaction will be pulling a year or more of data, easily a million or more rows. There is the catch, the users and management using this server need to do this hefty sort of queries, but we are a subdivision, and the parent company, and the vm group rarely if ever directly interact with us, or the projects, so they have no experience (nor a willingness to find any) dealing with larger datasets.

    (The last suggestion they had was to build everything as a dotNet process, and pull all the source data at run time down to the local PC and use dotNet to compile all the data, because that works for them when they need to create a report listing out 100 records or so. Apparently they believe a dozen people whom run a report at the same time pulling at most 5 million rows of data to the local machine will not cause any network or performance problems. )

    IT even has hired 3 different consultants to come in, and each told them 90% of the same things I have. (It was handy to learn some of those things I did not know) Did they change? Nope, they are hiring a 4th consultant. (I sense a pattern...)

    Here is to wishing the next IT director hired will understand the situation better.

  • From an architectural point of view, you may want to look at splitting across functional areas. So have a dedicated resource for ad-hoc BI, another for EDI, etc. Possibly split tactical/operational reporting versus strategic BI.

  • It sounds like this reporting / EDI server is perhaps just a snapshot of a production OLTP database and serving the purpose of an Operational Data Store (ODS). The query plans against it (long and complex) are going to be very different from what the application is doing (short and simple), so you need to re-evaluate what indexing is needed on this instance. Run some DMV queries or use the DTA tool to determine what indexes are not being utilized and what new indexes need to be added. Each time the database is restored from production, you may need a process to drop unneeded indexes and create new ones. That would involve less effort than designing an ODS data model specifically for the purpose, and it may work just as well, especially if you alraedy have all of your reporting and EDI code developed.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Bruce, that was what I was originally considering, a split across function area. I am concerned when such a split is done upon the same underlying hardware the performance gain such a split should achieve would be a wash, since the same amount of hardware resources are needed overall.

    Eric, originally this was just a snapshot, and used as an ODS. In the last couple of years I have redesigned this to fit a warehouse structure, so some denormalizing, indexing based upon usage patterns and so on.

    The behind the scenes "lets make this difficult" issue truly is a political/education one. IT which is part of the parent company is only given light resource intensive projects (ie a company phone directory), this becomes their basis of what is normal and acceptable. I report to the sub-division which is more sophisticated in their use of IT, nearly all their projects are high resource intensive ones (analyze a decade of enrollment 200+ ways). IT has made the light resource projects the hardware level they support and condone. They have decided any need for a SQL box beyond 2 core 2.5Ghz and 2 MB RAM is a problem in the coding, not the needs, requirements or hardware. (At this point, I wish it was a coding problem, so much easier to fix.)

  • David Lester (7/6/2011)


    Bruce, that was what I was originally considering, a split across function area. I am concerned when such a split is done upon the same underlying hardware the performance gain such a split should achieve would be a wash, since the same amount of hardware resources are needed overall.

    No, I think if you split across functional areas, you are likely to need more hardware resources. However, it does allow you to target the right hardware (virtual or real) to the right job, as well as planning for different growth patterns and usage cases for different things.

    For example, you DW platform may only need extended business hours, whereas the operational reporting needs 24x7. Very different usage cases, and you can tailor the hardware and software mix to match. Whereas if you put them both on the same platform, you end up with having to treat your DW as 24x7.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply