Clustered Indexes on which columns

  • Thanks for reading and hopefully helping,

    I have worked as a database developer for some time and never really got into the DBA side of it. I am now looking for performance improvements and need to understand how to apply indexes.

    I have a table without primary keys but with a unique identifier inv_no and also month_no that contains the month, additionally there are other colummns supplier, customer, and sales amount.

    inv_nomonth_nosupplier customer sales_amt

    1200801Fredbert£1.50

    2200802DavePete£2.50

    3200801MarkSid£2.85

    Now I believe that I need to set up a clustered index first and then non clustered indexes. How do I identify the columns for clustered and those for non clustered. Iss there any advantage in partitions biggest table is 3 million rows.

    Users querying will be selecting column month_no, supplier, sum(sales_amt) and using clauses containg month and suppler ie where month = 200801 it will be very rare for them to use the unique id.

    In addition to just the performance I clear and reload the data using SSIS so may need to drop / disable the indexes and rebuild after.

    Thanks.

    Ells

  • I am not a big expert in tuning. But from what you say, it would be preferable to have the Clustered Index on month and suppler together.

    Maybe one of the top posters here can give you more ideas. They all have written articles and Blogs regarding performance tuning and about idetifying which is the best candidate for Cluster and non clustered Indexes.

    -Roy

  • Please post the full table definition (as a create table's best) and the most common queries that will run against the table

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have avoided posting the create statement as the number of columns is excessive. There are 176 columns of which there is one uniqure identifier and several columns of interest particularly the month and provider.

    I am just trying to understand what the criteria is for picking the columns for use in clustered and non clustered indexes. No one will query using the unique identifier but all queries will contain month and provider in the where clause.

    Thanks.

    Mark.

  • ... all queries will contain month and provider in the where clause.

    Then the combination makes a good clustered index !


    * Noel

  • I'd say if most queries are coming through on the inv_no & month_no columns, then those are probably the best candidates for the clustered index. But, with 176 columns, you're not going to be able to build non-clustered indexes with that many INCLUDE columns, so you're going to want to be sure about the clustered index. Nonclustered indexes with this many columns & rows could cause problems with key lookups, something to keep an eye out for.

    Without code & structure, these are just estimates though. Testing, testing, testing should be the main things as you work through this.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • we have a clustered index on a column in a table where the data is not unique

    reason is that we used to have it on another column but we run huge selects that return up to 7 million rows or more in some cases and it would go to index scan on anything over 1.5 million rows. table ranges from 200 - 300 million rows depending on the time of the month. now the clustered index is on the column that is in the where clause and it's always a clustered index seek because the data is always physically together

    not by the book, but it works in this case

  • We are having a few problems that I am investigating. Now I would like to believe this is not just the new indexes that I put on?

    We run a series of SSIS packages of which there are two that are misbehaving. Package 'Out' takes between 1000 seconds and 10000 seconds to run. On a bad day there is a lot of IO showing in the activity monitor and the server is always using nearly all of its RAM. The package 'Out' takes data from one instance and moves it to the data warehouse.

    The other package 'flag' is looping thropugh tables and flagging files. Now this has slowed down a lot and this week just ground to a halt. The only difference for 'flag' package is that there were new clustered and non lustered indexes. The indexes may have been disabled and rebuilt prior to 'flag' being run but I believe them to be up and ok as 'Out' package used them before 'flag' package was executed.

    The main question is how could I have made a mess of the the indexes?

    How would disabling or out of date indexes give me a problem?

    Dont think they would be out of date as package 'out' updates the indexes.

    What about statisics?

    Many Thanks

    Ells

    P.S are there any good resources for a newbie to read on indexes and query performance?

  • If the OUT process changes a lot of the data, especially if it does a lot of deletes, you could be seeing index fragmentation affecting the next process. Hard to say without being there. It could be statistics too.

    There are lots of articles on indexes available here on SSC. Just search for them using the box in the upper right. Two books I'd recommend are both from Kalen Delaney and part of the Inside SQL Server series: Internals & Performance Tuning and Optimization.

    If you're really stuck to determine what is causing the processes to slow down, I'd recommend reading through this white paper from Microsoft.

    http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant,

    The 'Out' package has been running without indexes for months and the processing time is getting longer. We added indexes recently and I think I will need to do some more testing. The 'flag' package probably was affected by indexes but I still have no step by step approach to problem solving. The flag package I looked at the sql statements used and checked the QEP and it all seemed very very good. I do believe that we have cpu and memory issues as the cpu can be running at 100% and also the memory can be 90% used by three instances.

    Thanks for the links I will try and find something on diagnosing bad performance as well.

    Thanks.

    Ells

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply