April 14, 2010 at 3:53 am
I am currently testing out scenario where import log files into a staging table then into a fact table which is then used to populate an Essbase cube for analysis.
I use a dts package to import the raw text files into the staging area. This runs in about 3 minutes and imports about 50 million rows.
The Structure of tables and sql used to import into the fact tables is as follows itself is as follows:
--Staging Table
CREATE TABLE [stg_LogFiles] (
[LineRead] [varchar] (80) NULL ,
[DateLoaded] [datetime] NULL
)
-- Fact Table
CREATE TABLE [fct_Logs] (
[LogDate] [datetime] NULL ,
[Application] [int] NULL ,
[Database] [int] NULL ,
[User] [int] NULL ,
[Retrieval] [int] NULL
) ON [PRIMARY]
--Stored procedure to populate fact table
CREATE PROCEDURE sp_LoadLogFact
AS
CREATE TABLE #F1 (
[Date] VARCHAR (80) ,
[Application] [varchar] (25),
[Database] [varchar] (25),
[User] [varchar] (25)
)
INSERT INTO #F1([Date],[Application],[Database],[User])
(
SELECT
SUBSTRING(LineRead,2,24)
,SUBSTRING(
LineRead,
CHARINDEX('/',LineRead)+1,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)-1 - CHARINDEX('/',LineRead))
,SUBSTRING(
LineRead,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)-1
- CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1))
,SUBSTRING(
LineRead,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1) +1,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)+1)+1)- 1 -
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1) +1))
FROM dbo.stg_LogFiles
WHERE CHARINDEX('[',LineRead) = 1
AND NOT SUBSTRING(LineRead,
CHARINDEX('/',LineRead,CHARINDEX('/',LineRead)+ 1)
,2) = '//'
AND SUBSTRING(LineRead,2,4) IN ('mon ','tue ','wed ','thu ','fri ','sat ','sun ') -- ONLY SELECT records where user id is present
)
INSERT INTO [ESSBASE_TEST_DW].[dbo].[fct_Logs]([LogDate], [Application], [Database], [User], [Retrieval])
(select
convert(datetime,substring([Date],5,7)+ right([Date],4) + substring([Date],11,9),108),
A.ApplicationID,
A.DatabaseID,
U.UserID,
1
FROM #f1 B
LEFT JOIN dbo.dimApplication A ON A.Application = B.Application
LEFT JOIN dbo.dimUser U ON U.UserName = B.[User]
)
DROP TABLE #F1
Currently this process takes about 26 min to run. The bulk of the time being taken by the stored procedure. I would just like some ideas on how to improve the speed of this process. There are no indexes on the tables and I am not sure if the SQL used in the stored procedure is the most efficient.
Any information would be very much appreciated.
April 14, 2010 at 8:32 am
I'm stabbing in the dark but what about an index on #f1 Application and User fields, they are used as join predicates. Also have you looked at the actual execution plan, you might run this manually some morning and capture that. I think it might tell you better what exactly is going on..
CEWII
April 14, 2010 at 8:36 am
this is a double post, it makes it easier to help if you keep to one post
original
http://www.sqlservercentral.com/Forums/Topic902981-360-1.aspx"> http://www.sqlservercentral.com/Forums/Topic902981-360-1.aspx
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply