Code help, may be can be acheived through CTE

Question

Code help, may be can be acheived through CTE

VGish

SSChasing Mays

Points: 643
More actions
September 26, 2011 at 4:22 pm

#245421

I have my records like this:
IDDateP_IDP_UUID CM
24842011-07-27 00:00:00.00036926349348432436
24842011-08-08 00:00:00.00036926349348613149
24842011-08-10 00:00:00.00036926349348613151
27262011-07-29 00:00:00.0003406026373248395973
36292011-07-26 00:00:00.0004982026462548392343
36292011-08-01 00:00:00.00049926462548419374
36292011-08-04 00:00:00.0004982026462548535510
36292011-08-15 00:00:00.0004982026462548590566
And I want results like this:
IDMin Date Max_DateP_IDP_UUID CM
24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day
27262011-07-292099-12-313406026373248395973
36292011-07-262011-07-314982026462548392343
36292011-08-012011-08-03 49926462548419374
36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out
I know, through cursor we can achieve this…Any other ways?
Thanks for your help and feedback..
-VG

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

VGish (9/26/2011)
I have my records like this:
IDDateP_IDP_UUID CM
24842011-07-27 00:00:00.00036926349348432436
24842011-08-08 00:00:00.00036926349348613149
24842011-08-10 00:00:00.00036926349348613151
27262011-07-29 00:00:00.0003406026373248395973
36292011-07-26 00:00:00.0004982026462548392343
36292011-08-01 00:00:00.00049926462548419374
36292011-08-04 00:00:00.0004982026462548535510
36292011-08-15 00:00:00.0004982026462548590566
And I want results like this:
IDMin Date Max_DateP_IDP_UUID CM
24842011-07-272099-12-3136926349348613151 – Since the P_ID is same, though ON different day
27262011-07-292099-12-313406026373248395973
36292011-07-262011-07-314982026462548392343
36292011-08-012011-08-03 49926462548419374
36292011-08-042099-12-31 4982026462548590566 – Since P_ID is same, it is flattened out
I know, through cursor we can achieve this…Any other ways?
Thanks for your help and feedback..
-VG

Yes, data islanding, date fill methodologies, and grouping techniques... and probably a few windows functions.

How many rows are you looking at needing to process?

So I understand your business rules:

1) Locate each continuous section of P_ID as per the DATE column as your ordering.

2) Collapse those rows into a begin/end. If it is the last record for the ID, use 12/31/2099 as the ending date. Otherwise, subtract one from the beginning date of the next section and use that as the tail date.

3) for each section, min/max the P_UUID and the CM fields.

That sound about right? Any chance we could get a consumable dataset in a temp table to play with?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 2

VG...

For future posts, please consider the methods for posting "readily consumable" data which are taught at the first link in my signature line below. Since you're relatively new here, I'll do the honors the first time. The following code uses a slightly different method for posting "readily consumable" data and I took the liberty of adding a UNIQUE CLUSTERED INDEX which might actually serve as a composite PK for this table...

--===== Create the test data. This is NOT a part of the solution

SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM

INTO dbo.TestTable

FROM (

SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL

SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL

SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL

SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL

SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL

SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL

SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL

SELECT 3629,'2011-08-15',49820,264625,48590566

) d (ID, Date, P_ID, P_UUID, CM)

;

CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01

ON dbo.TestTable (ID, Date, P_ID)

;

Here's one possible solution for solving your problem. The key is to recognize that you have to create "temporal groups" based on the ID and P_ID according to date. The rest is pretty straight forward and is documented in the comments in the code below...

[font="Arial Black"]{EDIT} Although the code below works for this particular set of data, I found an "opening" where an incorrect sort could occur. Please see the corrected code in a separate post further down in this thread. Thanks.[/font]--===== This is one possible solution

WITH

cteEndDate AS

( --=== Calculate the "end date" according to the rules given

-- and calculate "temporal groups" of ID's

SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date)

- ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID),

EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date

SELECT MIN(gr2.Date)-1

FROM dbo.TestTable gr2

WHERE gr2.ID = gr1.ID

AND gr2.Date > gr1.Date)

,'2099-12-31'),

ID, Date, P_ID, P_UUID, CM

FROM dbo.TestTable gr1

) --=== Then we just need to find the min and max dates according to the groups

SELECT ID = MAX(ID),

MinDate = MIN(Date),

MaxDate = MAX(EndDate),

P_ID = MAX(P_ID),

P_UUID = MAX(P_UUID),

CM = MAX(CM)

FROM cteEndDate

GROUP BY GroupNum

ORDER BY ID, MinDate

;

My apologies...I didn't have the time to make my normal "million row test" so the code might not be as optimized as it could be.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

VGish SSChasing Mays Points: 643 More actions · Answer 3

Hi Jeff,

This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?

ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)

THanks,

VG

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 4

Actually, I made a small mistake. While the code works perfectly for the data given, I left an "opening" for an incorrect sort. Please use the following code, instead (test data creation code and solution code combined just to make life a little simpler :-))...

--===== Create the test data. This is NOT a part of the solution

SELECT ID,Date = CAST(Date AS DATETIME),P_ID,P_UUID,CM

INTO dbo.TestTable

FROM (

SELECT 2484,'2011-07-27',369,263493,48432436 UNION ALL

SELECT 2484,'2011-08-08',369,263493,48613149 UNION ALL

SELECT 2484,'2011-08-10',369,263493,48613151 UNION ALL

SELECT 2726,'2011-07-29',34060,263732,48395973 UNION ALL

SELECT 3629,'2011-07-26',49820,264625,48392343 UNION ALL

SELECT 3629,'2011-08-01',499,264625,48419374 UNION ALL

SELECT 3629,'2011-08-04',49820,264625,48535510 UNION ALL

SELECT 3629,'2011-08-15',49820,264625,48590566

) d (ID, Date, P_ID, P_UUID, CM)

;

CREATE UNIQUE CLUSTERED INDEX IX_TestTable_Composite01

ON dbo.TestTable (ID, Date, P_ID)

;

--===== This is one possible solution

WITH

cteEndDate AS

( --=== Calculate the "end date" according to the rules given

-- and calculate "temporal groups" of ID's

SELECT GroupNum = ROW_NUMBER() OVER (ORDER BY ID,Date) --DateNum

- ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID,Date), --PidNum

EndDate = ISNULL(( --=== NextDate-1 or NULL if no next date

SELECT MIN(gr2.Date)-1

FROM dbo.TestTable gr2

WHERE gr2.ID = gr1.ID

AND gr2.Date > gr1.Date)

,'2099-12-31'),

ID, Date, P_ID, P_UUID, CM

FROM dbo.TestTable gr1

) --=== Then we just need to find the min and max dates according to the groups

SELECT ID = MAX(ID),

MinDate = MIN(Date),

MaxDate = MAX(EndDate),

P_ID = MAX(P_ID),

P_UUID = MAX(P_UUID),

CM = MAX(CM)

FROM cteEndDate

GROUP BY ID,P_ID,GroupNum

ORDER BY ID, MinDate

;

I got home kind of late tonight... I'll try to find the time to explain how the "magic" row numbers work to solve this problem tomorrow night.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 5

VGish (9/27/2011)
Hi Jeff,
This works perfectly like what I wanted...But, I didn’t understand this part …Why are we doing this?
ROW_NUMBER() OVER (ORDER BY ID,Date) - ROW_NUMBER() OVER (PARTITION BY ID,P_ID ORDER BY ID,P_ID)
THanks,
VG

Take a look at the corrected code... Notice the name of the column becomes GroupNum.

Now, look at the GROUP BY in the code... I'll try to explain more tomorrow night.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004313 More actions · Answer 6

Sorry folks... I've been burning some midnight oil at the job and the answer to the question of how this works is a bit of a long one. To be real brief, the first ROW_NUMBER creates a simple list of sequential numbers in order by date. The second ROW_NUMBER creates a slightly more complicated list of sequential numbers for each ID and P_ID ordered by Date.

The difference between the two sets of sequenctial numbers forms a group number for each "set" of contiguous ID's and P_ID's. The rest is simple... we find the min and max of items by ID, P_ID, and GroupNum.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)