April 28, 2013 at 10:56 am
Hi All,
I m using the below tsql MERGE statement for loading data to one of the fact table in DWH db. For initial full load there would be around 30 million records and going forward for delta load it would be less than 50 k records.
Here i'm dealing with accumulating snapshot type of fact table where there will always insert i.e. if its a new record then insert into the table, when existing record but with different fact values then create a new record with different load date
To handle the above requirement I have created a temp table in the same database as that of target fact table with same structure and then used the below MERGE query.
INSERT INTO dbo.[Fct_Prod_Mthly_Actuals] ([Well_Skey],[DateKey],[Cost_Center],[Wghtd_WI_Key],[Item_Cd],[Prd_Lease_Vent_Cd],[Rev_Lease_Vent_Cd],[PHA_Flg],Days_Prodcd,Days_Injctd,[Pressure_Base],[Metric_Desc],[Metric_Vol],[Load_Date],[Update_Date])
SELECT [Well_Skey],[DateKey],[Cost_Center],[Wghtd_WI_Key],[Item_Cd],[Prd_Lease_Vent_Cd],[Rev_Lease_Vent_Cd],[PHA_Flg],Days_Prodcd,Days_Injctd,[Pressure_Base],[Metric_Desc],[Metric_Vol],CAST(CONVERT(VARCHAR(8),GETDATE(), 112)AS DATE) AS [LOAD_DATE],[Update_Date]
FROM
(
MERGE dbo.[Fct_Prod_Mthly_Actuals] AS DST
USING dbo.[Fct_Prod_Mthly_Actuals_Temp] AS SRC
ON (
SRC.[Well_Skey] = DST.[Well_Skey]
AND
SRC.[DateKey] = DST.[DateKey]
AND
SRC.[Wghtd_WI_Key] = DST.[Wghtd_WI_Key]
AND
SRC.[Item_Cd] = DST.[Item_Cd]
AND
SRC.[Metric_Desc] = DST.[Metric_Desc]
)
WHEN NOT MATCHED THEN
INSERT ([Well_Skey],[DateKey],[Cost_Center],[Wghtd_WI_Key],[Item_Cd],[Prd_Lease_Vent_Cd],[Rev_Lease_Vent_Cd],[PHA_Flg],Days_Prodcd,Days_Injctd,[Pressure_Base],[Metric_Desc],[Metric_Vol],[Load_Date],[Update_Date])
VALUES (SRC.[Well_Skey],SRC.[DateKey],SRC.[Cost_Center],SRC.[Wghtd_WI_Key],SRC.[Item_Cd],SRC.[Prd_Lease_Vent_Cd],SRC.[Rev_Lease_Vent_Cd],SRC.[PHA_Flg],SRC.Days_Prodcd,SRC.Days_Injctd,SRC.[Pressure_Base],SRC.[Metric_Desc],SRC.[Metric_Vol],SRC.[Load_Date], SRC.[Update_Date])
WHEN MATCHED
AND (ISNULL(DST.[Metric_Vol],'') <> ISNULL(SRC.[Metric_Vol],'')
OR ISNULL(DST.Pressure_Base,'') <> ISNULL(SRC.Pressure_Base,'')
OR ISNULL(DST.Days_Injctd,'') <> ISNULL(SRC.Days_Injctd,'')
OR ISNULL(DST.Days_Prodcd,'') <> ISNULL(SRC.Days_Prodcd,'')
OR ISNULL(DST.Rev_Lease_Vent_Cd,'') <> ISNULL(SRC.Rev_Lease_Vent_Cd,'')
OR ISNULL(DST.Prd_Lease_Vent_Cd,'') <> ISNULL(SRC.Prd_Lease_Vent_Cd,'')
OR ISNULL(DST.[PHA_Flg],'') <> ISNULL(SRC.[PHA_Flg],'')
)
THEN UPDATE
SET
DST.[LOAD_DATE] = DST.[LOAD_DATE]
OUTPUT SRC.[Well_Skey],SRC.[DateKey],SRC.[Cost_Center],SRC.[Wghtd_WI_Key],SRC.[Item_Cd],SRC.[Prd_Lease_Vent_Cd],SRC.[Rev_Lease_Vent_Cd],SRC.[PHA_Flg],SRC.Days_Prodcd,SRC.Days_Injctd,SRC.[Pressure_Base],SRC.[Metric_Desc],SRC.[Metric_Vol],SRC.[Update_Date], $Action AS MergeAction
) AS MRG
WHERE MRG.MergeAction = 'UPDATE'
;
So My query is:
1) Can the above query handle such large volume of data.
2) Do i need to create any indexes in either target fact/temp table. Or consider for partitioning the fact table.
Please suggest/advice..
Thanks in advance
Sam
April 29, 2013 at 10:01 am
I've never seen the two inserts done like that, it looks very clever.
Am I right in thinking that you are not actually updating anything in the fact table? DST.[LOAD_DATE] = DST.[LOAD_DATE] doesn't change anything.
This mean you are doing two inserts.
When not matched then insert.
When matched, but different, then insert.
Is the LOAD_DATE column the only difference between the two inserts?
As for answering your questions. An index on the join columns with an include of the other columns might help. I think the isnull will prevent the use of an index on the comparison columns.
If you are always loading a finite date range, and its not feasible to index everything, you might be able to index the DateKey column and filter the target using a common table expression.
30 million is pretty small, so I doubt partitioning is worthwhile unless you expect the table to get more than thirty times the size.
April 29, 2013 at 10:32 pm
If you are always loading a finite date range, and its not feasible to index everything, you might be able to index the DateKey column and filter the target using a common table expression.
Ed,
could you please explain more about "filter target using CTE". Also should i go about creating a clustered index on update date column (which is coming from source table) both on Target fact and temp fact table respectively.
April 30, 2013 at 8:12 am
sam 55243 (4/29/2013)
If you are always loading a finite date range, and its not feasible to index everything, you might be able to index the DateKey column and filter the target using a common table expression.
Ed,
could you please explain more about "filter target using CTE". Also should i go about creating a clustered index on update date column (which is coming from source table) both on Target fact and temp fact table respectively.
I don't know if it will work with the extra select from the output table, but something like this can work to limit the scope of the merge target. This is also useful if you want to delete when not matched by source and you only have a subset of source data.
I don't know about a clustered index on update date. I usually put a clustered index on something unique and ever increasing.
DECLARE @MinDateKey INT, @MaxDateKey INT
WITH cte_Fct_Prod_Mthly_Actual
AS
(SELECT *
FROMdbo.Fct_Prod_Mthly_Actuals
WHEREDateKey >= @MinDateKey
ANDDateKey <= @MaxDateKey
)
;MERGE cte_Fct_Prod_Mthly_Actual AS DST
USING dbo.[Fct_Prod_Mthly_Actuals_Temp] AS SRC
ON (
SRC.[Well_Skey] = DST.[Well_Skey]
AND
SRC.[DateKey] = DST.[DateKey]
AND
SRC.[Wghtd_WI_Key] = DST.[Wghtd_WI_Key]
AND
SRC.[Item_Cd] = DST.[Item_Cd]
AND
SRC.[Metric_Desc] = DST.[Metric_Desc]
)
WHEN NOT MATCHED THEN
INSERT ([Well_Skey],[DateKey],[Cost_Center],[Wghtd_WI_Key],[Item_Cd],[Prd_Lease_Vent_Cd],[Rev_Lease_Vent_Cd],[PHA_Flg],Days_Prodcd,Days_Injctd,[Pressure_Base],[Metric_Desc],[Metric_Vol],[Load_Date],[Update_Date])
VALUES (SRC.[Well_Skey],SRC.[DateKey],SRC.[Cost_Center],SRC.[Wghtd_WI_Key],SRC.[Item_Cd],SRC.[Prd_Lease_Vent_Cd],SRC.[Rev_Lease_Vent_Cd],SRC.[PHA_Flg],SRC.Days_Prodcd,SRC.Days_Injctd,SRC.[Pressure_Base],SRC.[Metric_Desc],SRC.[Metric_Vol],SRC.[Load_Date], SRC.[Update_Date])
WHEN MATCHED
AND (ISNULL(DST.[Metric_Vol],'') <> ISNULL(SRC.[Metric_Vol],'')
OR ISNULL(DST.Pressure_Base,'') <> ISNULL(SRC.Pressure_Base,'')
OR ISNULL(DST.Days_Injctd,'') <> ISNULL(SRC.Days_Injctd,'')
OR ISNULL(DST.Days_Prodcd,'') <> ISNULL(SRC.Days_Prodcd,'')
OR ISNULL(DST.Rev_Lease_Vent_Cd,'') <> ISNULL(SRC.Rev_Lease_Vent_Cd,'')
OR ISNULL(DST.Prd_Lease_Vent_Cd,'') <> ISNULL(SRC.Prd_Lease_Vent_Cd,'')
OR ISNULL(DST.[PHA_Flg],'') <> ISNULL(SRC.[PHA_Flg],'')
)
THEN UPDATE
SET
DST.[LOAD_DATE] = DST.[LOAD_DATE]
OUTPUT SRC.[Well_Skey],SRC.[DateKey],SRC.[Cost_Center],SRC.[Wghtd_WI_Key],SRC.[Item_Cd],SRC.[Prd_Lease_Vent_Cd],SRC.[Rev_Lease_Vent_Cd],SRC.[PHA_Flg],SRC.Days_Prodcd,SRC.Days_Injctd,SRC.[Pressure_Base],SRC.[Metric_Desc],SRC.[Metric_Vol],SRC.[Update_Date], $Action AS MergeAction
) AS MRG
--WHERE MRG.MergeAction = 'UPDATE'
;
April 30, 2013 at 8:54 am
You might hit an issue with Merge on a large record set.
This is a handy article http://technet.microsoft.com/en-us/library/cc879317.aspx and gives some advice on indexes, and stats to look at when tuning the merge.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
April 30, 2013 at 10:36 am
Thanks jason ill look into that.
May 1, 2013 at 9:22 am
We use a similar merge statement to insert into the target table, but with a 3 way insert for inserts, updates and deletes.
We have clustered indexes on the match columns ie Well_Skey,DateKey,DateKey,Wghtd_WI_Key,Item_Cd,Metric_Desc
It works but performance is not great. Filtering the source using a date_created column will help if you can do it, like
MERGE dbo.[Fct_Prod_Mthly_Actuals] AS DST
USING (SELECT * FROM dbo.[Fct_Prod_Mthly_Actuals_Temp] WHERE date_created> @load_from) AS SRC
As far as loading the initial 30 million rows, you will be better off using a simple insert statement.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply