Inserts into a fact table

  • Hi there,

    I have an ETL job that runs the same time everyday (6:30 am).

    The job trucates the table and then inserts about 900,000 records.

    On some days the job runs and updates the fact table in less than 7 minutes.

    Example from log file

    [PROGRESS - 06:51:14] Delivery 'Fact1': 937052 insert(s), 0 update(s) to '"dbo"."Fct_AAPR_Expiring_Price"'

    [DETAIL - 06:51:14] Data Stream row limits: sample rate 1, max rows 0

    [DETAIL - 06:51:14] Data Source 'DataSource1': 937052 physical (937052 logical) rows read, 937052 used

    [PROGRESS - 06:51:14] Acquisition: 937052 accepted, 0 rejected

    [PROGRESS - 06:51:14] Delivery : 937052 direct, 0 summary/merge, 937052 total

    [PROGRESS - 06:51:14] Done - 0 00:06:43 elapsed

    databuild -- completed (07-Nov-2010 06:51:14)

    And on other days ( running the same ETL build ) the job runs for almost an hour

    Example from log

    [INTERNAL - 07:41:01] Mem(M): 0.0 [Peak=0.0] (Ref=0.0 Domain=0.0 Pool=0.0)

    [PROGRESS - 07:41:02] Delivery 'Fact1': 940734 insert(s), 0 update(s) to '"dbo"."Fct_AAPR_Expiring_Price"'

    [DETAIL - 07:41:02] Data Stream row limits: sample rate 1, max rows 0

    [DETAIL - 07:41:02] Data Source 'DataSource1': 940734 physical (940734 logical) rows read, 940734 used

    [PROGRESS - 07:41:02] Acquisition: 940734 accepted, 0 rejected

    [PROGRESS - 07:41:02] Delivery : 940734 direct, 0 summary/merge, 940734 total

    [PROGRESS - 07:41:02] Done - 0 00:55:54 elapsed

    How do I determine what is causing this performance issue?

  • A couple of things...

    I would make sure there aren't any other processes running on or against that server at the same time your process is running.

    Or you could be trying to pull data from a server that experiences varying levels of performance.

    Or varying degrees of traffic connecting to your server. I would try to run your process at a different time of day if possible.

    Other problems could be that the data is following different data flow paths within your package that varies between the slow and fast loads.

    Lastly, look into indexes that might affect performance, however, that would be an "all the time" type of performance problem if that was the case. I'd look into the other areas first.

    Just some ideas!

  • I'll check the overall load in that timeframe - concurrency may be an issue but most probably I/O contention.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply