November 14, 2009 at 6:30 pm
Within a small DWH project I'm performing lookups of dimension surrogates using T-SQL lookups
SK=surrogate, BK= business key
SK_Organization BK_Organization Rowstartdate Rowenddate
-1----------------NULL-------------01-01-2001----NULL
1-----------------5----------------01-01-2004----NULL
2-----------------6----------------01-01-2004----NULL
(the -1 represents the unknown value , when a BK value doesn't exist, for instance 102, the t-sql lookup simply uses coalesce(@result,-1) in order to assign an unknown. )
Well, within the dimension table, a unique index exists on the surrogate key (not on the business key because it holds a null instance).
Next to this I have a staging table which contains the business key.
Within the staging I assign the surrogate at SK_Organizationkey using a stored procedure, which calls the lookup function and is filled with the business key.
When handling large datasets, some rows, without a pattern, are associated a -1 value at the SK while having a valid business key.
When taking the business key of the invalid row, and executing it in the lookup function, without the procedure, a valid value occurs rather than the invalid -1.
So...function is correct.
I could give you all the programmation , but as having testing this about 10 times I can tell you there's something going on in the configuration, and I hope someone over can help me.
Or..I should use some hint in the T-SQL..? I've already put it serializable
Think of "large datasets", "update statement", "T-SQL lookups", "assign invalid values incidentally"
November 15, 2009 at 4:29 am
I don't want to sound offending but there still might be something wrong with the sp.
The way you describe it it sounds like you're using a cursor which might be the root cause for the wrong insert/update.
Without knowing the structure of your sp all I can say is: SQL Server is not really known for "assign invalid values incidentally". Usually, SQL Server does exactly what it's told to do. But that's not always the same as someone actually intended to do. 😉
November 15, 2009 at 8:59 am
Thank you IMU, i think incidental things might occur only when something in the configuration might be wrong unless code is wrong.
Hereby the anatomy of the stored procedure, would be helpful if you could take a look.
It's not cursor based
This is from my test bench which only assigns the organizationkey. And well..at the test bench also the odd -1 occurs
The etl.fn_Lookup_SK_OrganizationKey_Costcenter_Admunit is the function which is filled with the applicable fields from table staging.postedhourline
ALTER PROCEDURE [etl].[Usp_Up_Serviceledger_Surrogates]
/*
Author:
Creation date:
Modification date:
*/
AS
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
--SK_Organizationkey
BEGIN TRAN
UPDATE staging.[Postedhourline]
SETSK_Organizationkey=(
select etl.fn_Lookup_SK_OrganizationKey_Costcenter_Admunit(
[Business unit]
,[Employee department code]
,getdate()
,NULL
)
)
FROM staging.[Postedhourline]
COMMIT TRAN
November 15, 2009 at 10:40 am
Would you mind sharing the code for the function etl.fn_Lookup_SK_OrganizationKey_Costcenter_Admunit as well?
Maybe it's possible to change it into a normal join...
November 15, 2009 at 11:44 am
Hi,
Well, the function calls 2 other functions. Sadly it won't be possible to put it in one join because of the logic inside. The logic covers three tables, and if a certain value is true in one table for the underlying fact data, the handling is different.
The same things can happen to the cost center which has its surrogate looked up. In contrary to the organizationkey, the surrogate of the cost center is really simple. And maybe that one applies to be handled into a join
The function is much alike the other one but uses 3 instead of 4 fields.
etl.fn_lookup_costcenter ([global code 1]
,getdate()
,NULL)
ALTER FUNCTION [etl].[fn_Lookup_SK_Costcenter]
(
@costcenternvarchar(20)
,@Startdatedatetime
,@Enddatedatetime
)
RETURNS bigint
AS
BEGIN
-- Declare the return variable here
DECLARE @Result bigint
-- Add the T-SQL statements to compute the return value here
SELECT @Result =(
SELECTSK_Costcenter
FROM[DWH].[dbo].tbl_Dim_CostCenter
WHERErtrim(BK_Costcenter)=@Costcentercode AND
RowStartDate<= coalesce(@Startdate,getdate())And
coalesce(RowEndDate,getdate()) >=coalesce(@Enddate,getdate())
)
-- Return the result of the function
RETURN coalesce(@Result,-1)
END
November 15, 2009 at 12:49 pm
This one actually does look like it can be included without a function.
Regarding the other ones: I still don't know.
If you need to do a different handling the CASE statement might help.
Again: until now there's nothing that would make the server "assign invalid values incidentally".
But since you can't (or won't) show the other functions it's still guessing...
November 16, 2009 at 9:59 am
Hi Lutz,
The costcenter also makes mistakes so let's start of with this one.
It's easy yes to join the table.
But for clarity i've used the function as lookups might easily grow more complex, so for clarity that each distinct type of lookup is represented by 1 function I've kept it this way.
Otherwise my logic would still become divergent, which means if I have to change the logic of a lookup throughout the total solution I would have to walk through all procedures which inhabit for instance the costcenter key lookup.
Do you agree, using this function, that this shouldn't go wrong? So putting in a join is just a workaround without finding the cause
My latest test shows the following:
- at the end of the function I had coalesce(@Result,-1)
- I've changed this to coalesce(@result,-2)
When running the procedure, for the flawed records, it gives back -1.
So my conclusion is, for these records, the function receives a NULL value as business key instead of the value of the record. Because -1 is a surrogate in the dimension table which represents the NULL business key value.
Thanks for your help so far!
November 16, 2009 at 2:41 pm
Is there an order of operations issue?
I would assume that you would populate your dimension from your staging table not populate your staging table from your dimension. So, I assume that the dimension should be populated before you try to get the surrogate key that you are putting into the staging table? Perhaps you left out the order from your description, but I thought I'd check.
Also, not sure if it matters to you, but I would change the RowEndDate from a nullable column to a non-nullable column and use a terminus date instead of NULL (something like 9999-12-31).
November 17, 2009 at 4:08 am
Hi Lamprey,
The dimension table is always filled prior to loading the fact table.
The logic is handled by a SQL Server SSIS package.
In my testbase the dimension is always the same, so i don't run SSIS , I just use a static set.
I will test with the remark about the rowenddate.
What comes to mind to me, is that the update goes over the total table, each row has a lookup embedded, so wouldn't this be a too stressful situation for the server?
I would think the database engine splits up tasks intelligently to prevent server insanities so I can't imagine such.
It doesn't keep running endlessly , it's done in 1 minute to update about 200000 rows
regards
November 18, 2009 at 1:31 pm
Well, typical flaw in many database platforms is handling NULL.
Today I found out that I should place 31/12/9999 in my slowly changing dimension "current records".
Next I had to take out function calls which called the rowenddate range with NULL and had to remove coalesce() 's.
Now it's working sane
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply