What Index do I need?

  • Hi all. I'm rather new to being a DBA on SQL and I'm struggling with what indexes I need. We've just switched from multiple Access DB's to a SQL Server 2005 for our datawarehousing, but we're still using Access as a front end for our users.

    One of the tables has 10.5million+ records in it, and is substantially slower for users to query than the old Access table was. At the moment I've indexed Nonclustered and non unique on both customer and product fields (separately, ie idx_Prod and idx_Cust) as they aren't unique records. These are the two fields that people predominantly link on, and I'm told they aren't applying criteria in their queries to other fields or linking on other fields.

    Should the indexes be clustered? Would that help speed things up for them? I also understand that they're linking local access tables to this problem table. Is there a problem or impact linking a linked table to a local table?

    If a clustered index is the way forward, how much longer will a create index statement take, as this table is imported daily from a csv file. The package that runs drops the indexes and recreates after the import has finished.

    Many thanks for your help, sorry for the noobish questions.

  • Can you post the table structure and some of the more common queries that are run against it? Use profiler to get an idea of the queries that Access runs for various operations.

    Do you reload the entire 10.5 million rows every day, or are you doing incremental loads?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK, here's the structure.

    [Product Code] nvarchar(10) null

    [Maj Gen] nvarchar(5) null

    [Nett Price] decimal(18, 2) null

    [Disc 1] decimal(18, 2) null

    [Disc 2] decimal(18, 2) null

    [Date Last Amended] datetime null

    [Percentage Increase Date] datetime null

    [Price Type] nvarchar(1) not null

    [Contract/Quote Expiry Date] datetime null

    [Date Last Used in SOE] datetime null

    [Full Customer Code] nvarchar(10) null

    [Business Sector] nvarchar(1) null

    Brand nvarchar(5) null

    I've never used profiler before so not sure what I'm doing and I'm actually out of the office at the moment on holiday!:cool:

  • In general, there are exceptions, you should have a clustered index on each table. You only get one and it affects how the data is stored, not just how it's retrieved. Further, the clustered index affects how all the other indexes are stored and accessed too. That means, picking the clustered is terribly important. The general rule of thumb that I follow is to put the clustered index on the most frequently used access path. For example, if the query run against the table is almost always using CustomerID for access, that's probably a good choice. If it's a child table that is most frequently accessed through the join, then the foreign key makes a good candidate for clustered index. The problem is, there's not a single right way to pick the cluster. Your system and the queries running on it will help you make that choice.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Where's the primary key?

    Do you reload the entire 10.5 million rows every day, or are you doing incremental loads?

    With profiler, the default template's pretty good, just make sure that you add the text data column for the RPC:completed event (second tab when creating a new trace). Let the trace run for an hour or so and then load the results into a table somewhere. Look for the most common queries on that table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Just to be sure I read your original post correctly. Users have a local copy of Access on thier desktop systems and are linking tables in this local Access database to a table on your SQL Server system, and this table has 10+ million rows. That correct?

    If so, YES, this is an issue as well. It sounds like local queries in the Access database are doing the selects and joins on the data. This means that the entire table is being transfered across your network to each of these databases.

    Last employer we had one power user who did (and still does) use an Access database for data analysis pulling data from our SQL Server. Justone, and we could tell when he was hammering the server with his queries. It didn't bring the server to its knees, but users would complain about about thier own queries that usually ran in a few seconds taking several minutes.

    I never got the chance while there to rewrite many of his queries to reduce the load on the network from our SQL Server system, but after looking at a few of his queries, I know I could have helped him quite a bit by moving many of his queries to the server.

    😎

  • gavinparnaby (10/22/2008)


    We've just switched from multiple Access DB's to a SQL Server 2005 for our datawarehousing, but we're still using Access as a front end for our users.

    I think Lynn is on the right track, how did you make the changes to the Access application to use the new SQL Server 2005 database? Are you using regular Access queries against linked tables, or are your queries pass through queries?

    Or did you use the Upsizing Wizard to create an Access Project? Access Projects will be using pass through queries. If you went this route, there are a few caveats you might need to consider, such as what data types it chooses for SQL Server. (outdated Text datatype, Floats and Reals instead of Decimal or Numeric)

    If you want to understand why Access is performing the way it is, you might want to check out:

    http://msdn.microsoft.com/en-us/library/bb188204.aspx

    it gets rather technical but it explains what's happening under the hood.

  • Hi again, and thanks all for your replies.

    First of all, a bit of background may make things easier to understand. WE had 65+ Access databases each with at least one table sometimes more which required replicating on the SQL server. The easiest thing to do for me as I was the only one doing the work was to create each table and it's update routine at a time, not using any wizards. Most of the time I imported the table and then changed the field types and sizes via code. Then I set up the DTS package for the daily or monthly import.

    It was only after I'd replicated all these steps that I started looking at indexing and keys.

    Where's the primary key?

    Do you reload the entire 10.5 million rows every day, or are you doing incremental loads?

    There isn't a primary key. The records are potentially not unique (due to our crap legacy system). I drop the indexes, delete the data and reimport the entire file then redo the indexes.

    @Lynn. You're essentially correct with your synopsis of the situation. Unfortunately my team consists of 2 + me while there are 20+ users each with multiple databases. I'm well aware that their queries and processing is terrible, and I know I'd be able to rewrite it to be much more efficient, but as ever, time is a major factor and there are also political undercurrents which make everything so much more contentious.

    I am presuming that all but routine selects should be on the Server and Access won't have too big an issue with the occasional make table from a select.

    @Chris. As above, there isn't an "Access application" per se, unfortunately each user has their own database (A legacy of "the way things were done") with some users having more than 1 and one team in particular having a tendency to max db's out past 2Gb! Without knowing the ins and outs of all the queries, I would suspect that they are not pass through at all, and that is the crux of the problem.

    Cheers again for all your help, I'm reasonably new to the whole DBA thing so it's a fairly steep learning curve!

  • Even the routine selects should be filtered on the server to reduce the amount of data crossing the network.

    I understand about time/political under currents. That was part of the reason I never got around to improving our Power Users Access database to make better use of the servers processing abilities before transfering data to Access for additional processing.

    😎

  • gavinparnaby (10/22/2008)


    @Lynn. You're essentially correct with your synopsis of the situation. Unfortunately my team consists of 2 + me while there are 20+ users each with multiple databases. I'm well aware that their queries and processing is terrible, and I know I'd be able to rewrite it to be much more efficient, but as ever, time is a major factor and there are also political undercurrents which make everything so much more contentious.

    Thing is, if Access is pulling the entire tables over and filtering on the client, there are no indexes in existence that will help you make the queries faster.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply