Quicker ETL import performance

  • I am currently testing out scenario where import log files into a staging table then into a fact table which is then used to populate an Essbase cube for analysis.

    I use a dts package to import the raw text files into the staging area. This runs in about 3 minutes and imports about 50 million rows.

    The Structure of tables and sql used to import into the fact tables is as follows itself is as follows:

    --Staging Table

    CREATE TABLE [stg_LogFiles] (

    [LineRead] [varchar] (80) NULL ,

    [DateLoaded] [datetime] NULL

    )

    -- Fact Table

    CREATE TABLE [fct_Logs] (

    [LogDate] [datetime] NULL ,

    [Application] [int] NULL ,

    [Database] [int] NULL ,

    [User] [int] NULL ,

    [Retrieval] [int] NULL

    ) ON [PRIMARY]

    --Stored procedure to populate fact table

    CREATE PROCEDURE sp_LoadLogFact

    AS

    CREATE TABLE #F1 (

    [Date] VARCHAR (80) ,

    [Application] [varchar] (25),

    [Database] [varchar] (25),

    [User] [varchar] (25)

    )

    INSERT INTO #F1([Date],[Application],[Database],[User])

    (

    SELECT

    SUBSTRING(LineRead,2,24)

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead))

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1

    - CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1))

    ,SUBSTRING(

    LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1))

    FROM dbo.stg_LogFiles

    WHERE CHARINDEX('[',LineRead) = 1

    AND NOT SUBSTRING(LineRead,

    CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)

    ,2) = '//'

    AND SUBSTRING(LineRead,2,4) IN ('mon ','tue ','wed ','thu ','fri ','sat ','sun ') -- ONLY SELECT records where user id is present

    )

    INSERT INTO [ESSBASE_TEST_DW].[dbo].[fct_Logs]([LogDate], [Application], [Database], [User], [Retrieval])

    (select

    convert(datetime,substring([Date],5,7)+ right([Date],4) + substring([Date],11,9),108),

    A.ApplicationID,

    A.DatabaseID,

    U.UserID,

    1

    FROM #f1 B

    LEFT JOIN dbo.dimApplication A ON A.Application = B.Application

    LEFT JOIN dbo.dimUser U ON U.UserName = B.[User]

    )

    DROP TABLE #F1

    Currently this process takes about 26 min to run. The bulk of the time being taken by the stored procedure. I would just like some ideas on how to improve the speed of this process. There are no indexes on the tables and I am not sure if the SQL used in the stored procedure is the most efficient.

    Any information would be very much appreciated.

  • I'm stabbing in the dark but what about an index on #f1 Application and User fields, they are used as join predicates. Also have you looked at the actual execution plan, you might run this manually some morning and capture that. I think it might tell you better what exactly is going on..

    CEWII

  • this is a double post, it makes it easier to help if you keep to one post

    original

    http://www.sqlservercentral.com/Forums/Topic902981-360-1.aspx"> http://www.sqlservercentral.com/Forums/Topic902981-360-1.aspx

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply