November 11, 2010 at 4:27 pm
Hello folks,
I am faced with a challenging T-SQL problem for which I am unable to find a set based idea/solution that I am satisfied with. Your inputs are greatly appreciated.
Sample code:
DECLARE @T TABLE (
Id int identity(1, 1),
NKey int,
P1 int, IsP1Set bit,
P2 int, IsP2Set bit,
P3 int, IsP3Set bit,
LastModified datetime
)
INSERT INTO @T
VALUES
(1, 1, 1, null, 0, null, 0, '2010-11-11 01:00:00'),
(1, null, 0, 22, 1, null, 0, '2010-11-11 02:00:00'),
(1, 11, 1, 222, 1, null, 0, '2010-11-11 03:00:00'),
(1, null, 0, null, 0, 33, 1, '2010-11-11 04:00:00'),
(1, 111, 1, null, 0, null, 0, '2010-11-11 05:00:00'),
(1, null, 0, 2222, 1, null, 0, '2010-11-11 06:00:00'),
(1, 1111, 1, null, 0, null, 0, '2010-11-11 07:00:00')
select * from @T
/*
Expected result:
111111222213312010-11-11 07:00:00
--rows (7) (6) (4) MAX(LastModified)
*/
;with Result(NKey, P1, IsP1Set, P1Idx, P2, IsP2Set, P2Idx, P3, IsP3Set, P3Idx, LastModified) as (
select NKey,
P1, IsP1Set, ROW_NUMBER() OVER (PARTITION BY IsP1Set ORDER BY LastModified DESC) AS P1Idx,
P2, IsP2Set, ROW_NUMBER() OVER (PARTITION BY IsP2Set ORDER BY LastModified DESC) AS P2Idx,
P3, IsP3Set, ROW_NUMBER() OVER (PARTITION BY IsP3Set ORDER BY LastModified DESC) AS P3Idx,
LastModified
from @T
)
select *
from Result
where (IsP1Set = 1 and P1Idx = 1)
OR (IsP2Set = 1 and P2Idx = 1)
OR (IsP3Set = 1 and P3Idx = 1)
Here is the problem (sample code given below):
We are a datawarehouse team sourcing data from multiple systems in our org.
One of our sources is an auditing/logging SQL database.
The entities in the source have a property-bag design which look likes this:
TableA(NatuarlKey, Property1, IsProperty1Set, Property2, IsPropert2Set,....., LastModified)
where,
NaturakKey is typically a uniqueidentifier
Property1...n can be any datatype
IsProperty1Set...IsPropertynSet are bit fields which indicate if a value for this property has been set
n ranges from 1 to 100
And, for understandable obvious reasons, they have optimized this logging system for writes such that, for every change to a property (or a group of properties) they insert a new row with *only* the updated properties.
The datawarehouse problem is to get the "most recent" value for each of the properties into a single record.
I am able to write a query that filters out intermediate records, but I am puzzled with how to group the rows efficiently into a single record. For example, the code below generates 3 rows (1 for each of the 3 properties defined) with the most recent value of each of the properties.
Your help is deeply appreciated.
Thanks,
Manju.
November 11, 2010 at 4:46 pm
Manjeshwar,
I've used a form of the serial update here which should cure your pain on this specific instance. If you want the 'full rules' to keep you safe from harm in this instance, you'll want to look at the following two items:
The Quirky Update article: http://www.sqlservercentral.com/articles/T-SQL/68467/
The Safety Check wrapper: http://www.sqlservercentral.com/Forums/Topic802558-203-4.aspx#bm980118
Please note I do not follow the full rules in either of these articles, and you should understand both what I'm doing and a review of those to fully encapsulate this for safety if this is not a one-shot set of results.
However, this is successful across a series of NKeys and LastModifieds. Please be aware, I removed the 'ID' field from @T because it was unused in our scenario here, and you weren't accounting for it in the INSERT INTO statement.
DECLARE @T TABLE (
-- Id int identity(1, 1),
NKey int,
P1 int, IsP1Set bit,
P2 int, IsP2Set bit,
P3 int, IsP3Set bit,
LastModified datetime
)
INSERT INTO @T
SELECT 1, 1, 1, null, 0, null, 0, '2010-11-11 01:00:00' UNION ALL
SELECT 1, null, 0, 22, 1, null, 0, '2010-11-11 02:00:00' UNION ALL
SELECT 1, 11, 1, 222, 1, null, 0, '2010-11-11 03:00:00' UNION ALL
SELECT 1, null, 0, null, 0, 33, 1, '2010-11-11 04:00:00' UNION ALL
SELECT 1, 111, 1, null, 0, null, 0, '2010-11-11 05:00:00' UNION ALL
SELECT 1, null, 0, 2222, 1, null, 0, '2010-11-11 06:00:00' UNION ALL
SELECT 1, 1111, 1, null, 0, null, 0, '2010-11-11 07:00:00'
IF Object_ID('tempdb..#serialUpdate') IS NOT NULL
DROP TABLE #serialUpdate
CREATE TABLE #serialUpdate
(NKey int,
P1 int, IsP1Set bit,
P2 int, IsP2Set bit,
P3 int, IsP3Set bit,
LastModified datetime
)
CREATE CLUSTERED INDEX idx_serialUpdate ON #serialUpdate (NKey, LastModified)
DECLARE @NKeyINT,
@P1INT,
@P2INT,
@P3INT
INSERT INTO #serialUpdate
SELECT * FROM @T
UPDATE#serialUpdate
SET@p1 = p1 = CASE WHEN Nkey = @NKey
THEN CASEWHEN p1 IS NULL
THEN @p1
ELSE p1 END
ELSE p1 END,
@p2 = p2 = CASE WHEN Nkey = @NKey
THEN CASEWHEN p2 IS NULL
THEN @p2
ELSE p2 END
ELSE p2 END,
@p3 = p3 = CASE WHEN Nkey = @NKey
THEN CASEWHEN p3 IS NULL
THEN @p3
ELSE p3 END
ELSE p3 END,
@Nkey = NKey
OPTION (MAXDOP 1)
SELECT
su1.NKey,
su1.P1,
su1.P2,
su1.P3,
su1.LastModified
FROM
#SerialUpdate AS su1
JOIN
(SELECT
NKey, MAX( LastModified) AS MaxLM
FROM
#SerialUpdate AS su2
GROUP BY
NKey
) AS drv
ONsu1.NKey = drv.NKey
AND su1.LastModified = drv.MaxLM
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 11, 2010 at 5:01 pm
I doubt it's anywhere near as fast but...
;with Result(NKey, P1, IsP1Set, P1Idx, P2, IsP2Set, P2Idx, P3, IsP3Set, P3Idx, LastModified) as (
select NKey,
P1, IsP1Set, ROW_NUMBER() OVER (PARTITION BY IsP1Set ORDER BY LastModified DESC) AS P1Idx,
P2, IsP2Set, ROW_NUMBER() OVER (PARTITION BY IsP2Set ORDER BY LastModified DESC) AS P2Idx,
P3, IsP3Set, ROW_NUMBER() OVER (PARTITION BY IsP3Set ORDER BY LastModified DESC) AS P3Idx,
LastModified
from @T
)
select NKey,
max(ISNULL(P1,'')) as P1,
max(ISNULL(P2,'')) as P2,
max(ISNULL(P3,'')) as P3
from Result
where (IsP1Set = 1 and P1Idx = 1)
OR (IsP2Set = 1 and P2Idx = 1)
OR (IsP3Set = 1 and P3Idx = 1)
group by NKey
November 11, 2010 at 5:13 pm
Thanks for the response David.
But I don't really need the MAX value. Instead, what I am looking for is to get the most recent value of any of the properties.
"OOOOOPS"! I, now, got what you were suggesting; oh, well.
November 11, 2010 at 5:14 pm
Thanks Craig.
This looks interesting; will explore further more.
Thanks for the response.
November 11, 2010 at 5:23 pm
In 2008 you have a couple of options. You can cross apply a query to select the top 1 row for each ordered by date descending. This would give you the most recent row. Althernatively, you can use a cte to assign a Row_Number() to each row partioned by your grouping and then ordered by date descending. Then just use a where clause to filter out all rows from the cte where Row_Number() > 1.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
November 11, 2010 at 6:08 pm
Do you mean a solution similar to what David is suggeting above?
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply