September 26, 2011 at 4:22 pm
I have my records like this:
IDDateP_IDP_UUID CM
24842011-07-27 00:00:00.00036926349348432436
24842011-08-08 00:00:00.00036926349348613149
24842011-08-10 00:00:00.00036926349348613151
27262011-07-29 00:00:00.0003406026373248395973
36292011-07-26 00:00:00.0004982026462548392343
36292011-08-01 00:00:00.00049926462548419374
36292011-08-04 00:00:00.0004982026462548535510
36292011-08-15 00:00:00.0004982026462548590566
And I want results like this:
IDMin Date Max_DateP_IDP_UUID CM
24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day
27262011-07-292099-12-313406026373248395973
36292011-07-262011-07-314982026462548392343
36292011-08-012011-08-03 49926462548419374
36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out
I know, through cursor we can achieve this…Any other ways?
Thanks for your help and feedback..
-VG
September 26, 2011 at 4:46 pm
VGish (9/26/2011)
I have my records like this:IDDateP_IDP_UUID CM
24842011-07-27 00:00:00.00036926349348432436
24842011-08-08 00:00:00.00036926349348613149
24842011-08-10 00:00:00.00036926349348613151
27262011-07-29 00:00:00.0003406026373248395973
36292011-07-26 00:00:00.0004982026462548392343
36292011-08-01 00:00:00.00049926462548419374
36292011-08-04 00:00:00.0004982026462548535510
36292011-08-15 00:00:00.0004982026462548590566
And I want results like this:
IDMin Date Max_DateP_IDP_UUID CM
24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day
27262011-07-292099-12-313406026373248395973
36292011-07-262011-07-314982026462548392343
36292011-08-012011-08-03 49926462548419374
36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out
I know, through cursor we can achieve this…Any other ways?
Thanks for your help and feedback..
-VG
Yes, data islanding, date fill methodologies, and grouping techniques... and probably a few windows functions.
How many rows are you looking at needing to process?
So I understand your business rules:
1) Locate each continuous section of P_ID as per the DATE column as your ordering.
2) Collapse those rows into a begin/end. If it is the last record for the ID, use 12/31/2099 as the ending date. Otherwise, subtract one from the beginning date of the next section and use that as the tail date.
3) for each section, min/max the P_UUID and the CM fields.
That sound about right? Any chance we could get a consumable dataset in a temp table to play with?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2011 at 5:36 am
VG...
For future posts, please consider the methods for posting "readily consumable" data which are taught at the first link in my signature line below. Since you're relatively new here, I'll do the honors the first time. The following code uses a slightly different method for posting "readily consumable" data and I took the liberty of adding a UNIQUE CLUSTERED INDEX which might actually serve as a composite PK for this table...
--===== Create the test data. This is NOT a part of the solution
SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM
INTO dbo.TestTable
FROM (
SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL
SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL
SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL
SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL
SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL
SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL
SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL
SELECT 3629,'2011-08-15',49820,264625,48590566
) d (ID, Date, P_ID, P_UUID, CM)
;
CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01
ON dbo.TestTable (ID, Date, P_ID)
;
Here's one possible solution for solving your problem. The key is to recognize that you have to create "temporal groups" based on the ID and P_ID according to date. The rest is pretty straight forward and is documented in the comments in the code below...
[font="Arial Black"]{EDIT} Although the code below works for this particular set of data, I found an "opening" where an incorrect sort could occur. Please see the corrected code in a separate post further down in this thread. Thanks.[/font]--===== This is one possible solution
WITH
cteEndDate AS
( --=== Calculate the "end date" according to the rules given
-- and calculate "temporal groups" of ID's
SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date)
- ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID),
EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date
SELECT MIN(gr2.Date)-1
FROM dbo.TestTable gr2
WHERE gr2.ID = gr1.ID
AND gr2.Date > gr1.Date)
,'2099-12-31'),
ID, Date, P_ID, P_UUID, CM
FROM dbo.TestTable gr1
) --=== Then we just need to find the min and max dates according to the groups
SELECT ID = MAX(ID),
MinDate = MIN(Date),
MaxDate = MAX(EndDate),
P_ID = MAX(P_ID),
P_UUID = MAX(P_UUID),
CM = MAX(CM)
FROM cteEndDate
GROUP BY GroupNum
ORDER BY ID, MinDate
;
My apologies...I didn't have the time to make my normal "million row test" so the code might not be as optimized as it could be.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 27, 2011 at 2:49 pm
Hi Jeff,
This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?
ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)
THanks,
VG
September 27, 2011 at 7:50 pm
Actually, I made a small mistake. While the code works perfectly for the data given, I left an "opening" for an incorrect sort. Please use the following code, instead (test data creation code and solution code combined just to make life a little simpler :-))...
--===== Create the test data. This is NOT a part of the solution
SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM
INTO dbo.TestTable
FROM (
SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL
SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL
SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL
SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL
SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL
SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL
SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL
SELECT 3629,'2011-08-15',49820,264625,48590566
) d (ID, Date, P_ID, P_UUID, CM)
;
CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01
ON dbo.TestTable (ID, Date, P_ID)
;
--===== This is one possible solution
WITH
cteEndDate AS
( --=== Calculate the "end date" according to the rules given
-- and calculate "temporal groups" of ID's
SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date) --DateNum
- ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID,Date), --PidNum
EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date
SELECT MIN(gr2.Date)-1
FROM dbo.TestTable gr2
WHERE gr2.ID = gr1.ID
AND gr2.Date > gr1.Date)
,'2099-12-31'),
ID, Date, P_ID, P_UUID, CM
FROM dbo.TestTable gr1
) --=== Then we just need to find the min and max dates according to the groups
SELECT ID = MAX(ID),
MinDate = MIN(Date),
MaxDate = MAX(EndDate),
P_ID = MAX(P_ID),
P_UUID = MAX(P_UUID),
CM = MAX(CM)
FROM cteEndDate
GROUP BY ID,P_ID,GroupNum
ORDER BY ID, MinDate
;
I got home kind of late tonight... I'll try to find the time to explain how the "magic" row numbers work to solve this problem tomorrow night.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 27, 2011 at 7:57 pm
VGish (9/27/2011)
Hi Jeff,This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?
ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)
THanks,
VG
Take a look at the corrected code... Notice the name of the column becomes GroupNum.
Now, look at the GROUP BY in the code... I'll try to explain more tomorrow night.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 1, 2011 at 1:57 pm
Sorry folks... I've been burning some midnight oil at the job and the answer to the question of how this works is a bit of a long one. To be real brief, the first ROW_NUMBER creates a simple list of sequential numbers in order by date. The second ROW_NUMBER creates a slightly more complicated list of sequential numbers for each ID and P_ID ordered by Date.
The difference between the two sets of sequenctial numbers forms a group number for each "set" of contiguous ID's and P_ID's. The rest is simple... we find the min and max of items by ID, P_ID, and GroupNum.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply