optimize a population of a table

  • Hi

    I have had a couple of problems lately and every time I post my problem here it gets solved one way or the other, so I try again.

    I have a table that uses way to much time and resources to populate.

    I use profiler to do a trace and then use QA to analyze the result and recommend index. Got a bit better, but still that single table can use hours and hours to populate.

    How do I monitor this table, and are there any parameters that I shoul check for?

    Regards

    DJ

  • Can you give more information about the population process? Single INSERT or BULK? How many rows? What frequency? OLAP or OLTP? ....

    See if this helps: http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Hi

    Would it help if I provided some of the stored procedure?

     

     

    Regars

    DJ

  • That would also be a start, but to have some additional information about your system wouldn't be bad at all.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • The server runs with  intel xeon cpu 2.0 GHZ and 1,5 GB RAM. Windows XP, SQL 2000

    Stored Procedure:

    CREATE PROCEDURE [xFF__Sale]

    AS

    BEGIN

                    ---------------------------------------------------------

                    SET NOCOUNT ON

                    …….      

                    TRUNCATE TABLE [dbo].[Sales]

                    ………

    ----------------------------------------------------

                    --                INSERT new rows                 --

    ----------------------------------------------------

                    SET @TempTime = getdate()

     

                    INSERT [dbo].[Sales]

                                   (

                                   [Company],

                                   [Division],

                                   [Year],

                                   [Period],

                                   [Level],

                                   [Facility],

                                   [District],

                                   [Item group]

                                   ………   

                    SELECT

                                   T1.[Company],

                                   T1.[Division],

                                   T1.[Year],

                                   T1.[Period],

                                   T1.[Level],

                                   SUBSTRING( T1.[Facility] , 1 , 2 ),

                                   T1.[District],

                                   T1.[Item group],

                                   ……..

                    FROM

                                   [dbo].[Sales Template] AS T1

                                   INNER JOIN [dbo].[Item Group Template] AS T2 ON

                                                   T1.[Company] = T2.[Company]        AND

                                                   T1.[Item group] = T2.[Item group]  

     

    END

    GO

    Hope this helps.

     

    Regards

    DJ

  • How many indexes are on dbo.Sales ? Can you drop them and recreate them before & after the Insert ?

    Does the table being joined to ([Item Group Template] have an appropriate index on columns Company and [Item Group] so that the optimizer uses an index instead of a tablescan ?

  • Hi

    Item group template did not have any index.

    I will create one, but before I do so, I would like your opinon on the different settings one can use.

    I will use only field company and item group in my index. These are used to get unique hits.

    Should my index be clusterd? or/and unique?

    Fill factor is default set to: 80. Is this sufficent?

    The sales table have one index. I have deleted it and are ready to create an new one. Should this be clustered, unique? And does it make sense to have several indexes on one table?

    I have read somewhere that one shall not have more than one clustered index, is this correct? Have been reading about all these issues, but I can not get any clear anwser, and very often people have different points of view and not to for get exeperience.

    A lot of questions, but I rather ask then guess. I really need this population job to run faster.

     

    Regards

    DJ

  • Hello Dan

    clustered index means that the actual data are stored ordered by that index - that of course allows only one clustered index per table.

    Indexes are helpful when selecting data from tables; during insert indexes (on the table you're inserting into) work in the opposite way - they slow down the process. That's why PW is asking whether you could drop the index before insert and then re-create it when you're done. This can sometimes help, but it also depends on how many records there are in the table before you start inserting. If the table is more or less empty before the insert, dropping the index can help a lot.

    Whether to use clustered index... These are great on tables that accept a load of data, and then are used only for selecting. If records are added frequently, clustered index can be helpful if it is on a column which will only grow (like identity, or a datetime column holding the time of insert). Otherwise it could cause problems, because if you insert a record that fits somewhere in the middle of a full page, lots of data must be physically rearranged.

    Unique index - yes, if the combination of columns is used as a unique key, then I would surely go for it. Fill factor depends on what happens with the data. If you'll never ever insert anything into that table (or e.g. truncate table before inserting any new records), fill factor can be 100%. Generally, 80% should be fine in most cases where data are inserted into the table periodically.

    HTH, Vladan

  • Ok here is the hard and fast of the issue. You are truncatin the data which is fast but as each record inserts it causes inserts and reording of the indexes to occurr especially if the data isn't in the order of the index itself.

    To increase your performance script out the indexes and do the following steps in order suggested.

    1) Truncate Table

    2) Drop Non-Clustered Indexes

    3) Drop Clustered Index

    4) Populate data

    5) Create Clustered Index

    6) Create Non-Clustered Indexes

    The order is strict otherwise you will see other issues show up to slow you down. Be aware that when the clustered index is dropped or created any non-clustered indexes will be updated to reflect the change so order does make a difference.

    If you have no other inserts going on thru the day and it is full repopulated each time use a fill factor of 100% on all indexes as there will be no need to worry about page splits and you don't want to waste space. In addition the tighter the data can live the fatser it will perform as there are fewer pages and extents to scan thru.

  • Also, this and a whole lot more useful informations can be found in the link I've posted above.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • It looks like your Insert statement would benefit from a Where clause.  Since you only want records that are in T2 you should state that in the Where clause.  e.g.

    Where t1.company IN (Select company from  [dbo].[Item Group Template] )

    Otherwise you are probably doing a table scan on T1 which I guess is the larger of the two tables.

    This may shed some light: http://www.sqlservercentral.com/faq/viewfaqanswer.asp?categoryid=3&faqid=381 

  • Hi Ron K

    You are right, t1 is larger one, about 50 000 records, while t2 has only 188 records.

    I will try to add a where clause.

    Thank you

     

    Regards

     

    Dan-Ketil

  • Even a simple Where t1.company = t2.company would certainly cause the optimizer to figure out it should drive off of the 188 records in t2.

    You definetly need an index at least on t1.company to help find the matching records.

    If there are many records where t1.company = t2.company but t1.[item group] <> t2.[item group] then I would also add an index on t1.company and t1.[item group].  In this case I would also add the second conditon to the where clause.

  • Hi

    Maybe I should have explained how these two tables are used, not sure if I need to, but just for the fun of it.

    sales is a register of all sales done. While item group template is an aggregated table for each of the articles. Obviously we have 188 unique item groups. Thus the aggreagted table has one record of each group, where we can extract more info regarding a singel item group. In sales we only have the item group reference, so I need to join with [item group template] to get usefull data.

    For a couple item groups we have allmost 1300 records in sales. In average we find 300-400 records for each item group

    t1.company = t2.company can be found on all records, since we only have one company id.

    t1.[item group] = t2.[item group] for 131 records, and it is these I am interested in.

    I have one index on t2 where I use both company and index as fields.

    For t1 I have not decided how many indexes and what fields to include.

    I run DBCC SHOWCONTIG to see the effect.

     

    Thank you all for the nice input. Keep it comming.

     

    Best regards

    DJ

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply