large table queries - using views - how do I get better performance?

  • I am currently testing out a few performance bottlenecks I have - this is the layout:

    I have 10 tables - each identical in definition (different name ) all containing approx 1.5 mill records each. These tables are basically 1 per day and currently in a data warehhouse type database.

    I dynamically create a new view each day (view only defines the fields a report may require - ie not all fields availabe in table) that joins the contents of the tables so that I can search 15 million records to gather information. several reports are within a 24 hr search criteria (can be split across 2 tables (ie midday through midday the following day)

    I have created indexies on specified fields (that match the where clause on the queries) but due to the volume - I am not sure any performance boost is gained there.

    What should I be looking at to get a faster response - These are what I have considered:

    Multiple CPU's (already at 4) and higher speed cpu (currently 3Ghz)

    More Memory (yes I am using the /POE and /3gb and AWE flags where appropriate)(currently at 16GB)

    Maybe reducing the # of tables that the view encompases (ie dynamically create view to match query when query is called (to like 3 days as opposed to 10) when doing a weekend query)

    My queries perform a few mathematical functions like SUM and count and generally return < 300 rows (but have actually read 1-2 mill to generate totals) My response is approx 2 mins (I am using reporting services to generate these reports - I know the SQL part takes less time and the report generation takes longer ) The reporting services is running on the same server (db and IIS part ) and I am intending to remove the IIS part to a separate server, but leaving the reporting services db on the same engine as the original data. so - any suggestions are welcome and ideas on what else to consider that may help speed thjings up. Thanks in appreciation

    ** What you see, Depends on what you Thought, Before, You looked! **

  • When you have 16 GB memory do not use /3gb...MS recommends not use /3gb if you have 16+ memory...but some gurus advise not to use /3gb when you have 12+ memory...

    http://support.microsoft.com/kb/283037/

    If your query peforms mathematical functions first dump the required data into temp table and run the mathematical functions to improve...

     

    MohammedU
    Microsoft SQL Server MVP

  • Thanks for the /3gb heads up - I was unaware of that.

    I am assuming to just perform the select statement to the temp table to pull all the necessary rows (should i do grouping here(or later) and should I created any indexies?) and then perform a 2nd query to perform the SUM and counts

    Is it a good/bad idea to create a tempdb on a RAM drive ? has anyone done this and is it supported/ recommended.

    Currently as I am using SQL2000 i cannot create indexies on my views - but if I later migrate to SQL2005 will that improve performance in my data warehouse type scenario? (this may come to pass if I cant get any improved performance (perceived or other).

    I also had a thought would partitioning the tables to different disks help - or not much (my current tables are all in 1 partition on an EMC SAN - running RAID1 striped - across 10 physical disks)

    ** What you see, Depends on what you Thought, Before, You looked! **

  • I'm not sure a RAM disk is supported for Tempdb, but I know some people have done it. It should definitely help your queries since lots of temp operations in your views will be speeded up somewhat.

    Partitioning may help, but you'd have to test. Definitely work on the memory options and use Perf mon to be sure you are using all the memory.

    Can you post some of the DDL or give us an idea of what the view encompasses? Also, maybe which fields are indexed?

  • AH!!!!! that was a killer - just typed a large response and hit the reply button - It all disappeared and now I must do again - I will be sure to copy before I hit reply again ...

    The tables contain approx 40 fields - all varchar and int .

    The views that are generated daily probably use only 10- 15 fields.

    The data is created initially using DTS to import approx 1.5 Mill records daily. The individual tables generally reference transactions between midnight to the following midnight. ie 4 days = 4 data tables.

    I did run perfmon and find that the Avg. disk queue length hits the ceiling when the query is being performed and comes back down as it completes.

    The report SQL is dynamically generated using several stored procedures and some configuration information that is derived from the args in .

    most of the reports in essance perform a single select statement (no JOINS) and may have 1 or 2 SUM actions being performed - with grouping of sepcified fields.

    The clustered index is based upon a datetime field and a couple of int fields (the int fields may return approx 10 options/values) i.e. numbers 1 through 10)

    I have another index on a string field ( varchar(24)) which usually contains 16 characters and unfortunately not very meaningful that can be broken down - ie (where strvalue '1HGTY456GTHY2HU') Yuck - i know . if I perform a count on this field I would be returned with about 40K rows and one row would contain approx 30% of the total. (i.e. the row )

    Hope this is helpful in allowing futher comments on suggestions and or recommendations.

    (sry this is a shorther response - wish I had copied the last one )

    Thanks in appreciation for all feedback so far

    ** What you see, Depends on what you Thought, Before, You looked! **

  • Having the IIS seperated will indeed lessen the contention.

    Is there another candidated for the clustered index? Datetime and 10 ints seems to be large. Exchanging for a non-clustered index?

    Any possibility to make up a top 10 most used queries? Perhaps there can be some improvement in the report-query generating process.

  • Run the queries in QA after enabling executon plan,SET STATISTIC I/O and see the plan where you need indexes and which are being used...

    Tempdb is RAM is not supported and I believe it is removed... back 6.5 days many people were using it but not any more after 7.0 release...

    Seperate tempdb with other dbs mdf and ldf files ...

    MohammedU
    Microsoft SQL Server MVP

  • in the overall scheme of things a couple of minutes might not be too bad. You need to have baselines to perofrmance tune, otherwise you can't and you'll essentially be thrashing about in the dark.

    You can have indexed views in sql 2000, but as these materialise the data you might actually degrade performance.

    I always suggest the sql 2000 performance tuning guide as a good starting point.

    There are just too many variables, as well observed from all the suggestions above - you need to find your bottlenecks - and make sure you don't have a hardware bottleneck first.

    As far as tempdb is concerned, it is possible to use a sata controller which will take solid state disks - this has been tried  http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/08/24/958.aspx

    you should monitor your server to establish the reads and writes per database and compare these to the i/o stats on your disk subsystem.

    On DW servers I've setup I'd normally have tempdb on a seperate disk array and with performance eqivilant to the data array(s). Doing work on disk is always slow, make sure you're not reading and writing to the same drive(s) at the same time , ( as you can't actually )

     

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thanks for all the positive feedback and reference links - They are all helping. From many of the comments above it looks like adding hardware is going to get the biggest performance increase. I have run the profiler and noticed that if I create indexes they do get used , but there is only a minimal difference in total time at the end of the day. The sheer amount of data is just what slows down these reports. I was just curious when posting this as I had seen mention in other posts that people were dealing with multi million record sets and generally my expectation of the performance was better than I was getting, and there was some "magical" method for optimizing it.

    It comes down to patience, observation and attention to detail to try and eek out those extra seconds. Much can be determined by the design of the data tables and the optimal usage of the extraction can determine the performance. Hardware plays a big part of this and is really not an excuse for poorly designed databases. And ultimately - as I am finding out - the more data you process, the longer it will take, performing a SUM on 10 million records does actually take longer than on 100K records

    Once again - thanks for all the feedback - I am certainly learning more as I read this and many more threads .

    ** What you see, Depends on what you Thought, Before, You looked! **

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply