self-joining derived tables?

Question

Post reply

self-joining derived tables?

Oblio Leitch

SSCrazy

Points: 2137
More actions
May 17, 2007 at 3:00 pm

#172513
Is it possible to self-join a derived table? I have a master query generating a recordset that I want to introspectively join to and extract another value. So, given these values:
```
id     date
07801  2004-12-12
15224  2006-07-06
19890  2006-09-11
19890  2006-10-06
19890  2006-10-23
```
I want to generate a recordset like this:
```
id     date        end_date
07801  2004-12-12  2006-07-06
15224  2006-07-06  2006-09-11
19890  2006-09-11  null
19890  2006-10-06  null
19890  2006-10-23  null
```
(the rule is, get MIN(date) when 1.id <> 2.id)
The trick is, the first set is created by a weighty query which I don't want to have to rerun. I considered pushing it into a table var and running updates, but that could be weighty, too.
Does this make sense? Is it possible?

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

As this is SQL 2005 (based on where you posted), try putting your hefty query as a CTE and using that in your query.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Option b is to insert the results of this query into a table variable or temp table, then do the self join there.

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 3

Yep, CTE's seem to do the job. I just wish the rest of it was smoother.

Interestingly, I couldn't run two consecutive selects off of it; I probably don't understand it that well.

Thanks!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

Show us what you have so far. Perhaps we can see what changes are needed to get it to work.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 5

CTE's are only available for the next SELECT statement. You can't define a CTE & then run multiple selects against it. If you need that kind of functionality, you'll need temp tables or table variables.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 6

Lynn, I would, except the CTE is 18 lines, and the principle use is 30 lines. The basis is a historical table, like this:

id-aid-buserdate
16251413761 2005-03-26 00:00:00
16251416241 2006-12-07 00:00:00
16601306739 2005-11-07 00:00:00
16932313695 2005-05-23 00:00:00
17266115115 2005-05-13 00:00:00
17266215115 2005-04-15 00:00:00
17266217975 2006-01-23 00:00:00
17266217585 2006-02-24 00:00:00
17266219487 2006-08-11 00:00:00

As I said, it's historical, so each line of (id-a & id-b) is superseded by the next largest date. For example, 17266:2 changes users between 4/15/05 and 1/23/06. My goal is to create an "end" date like this:

17266215115 2005-04-15 2006-01-23

I'm also generating a table of months (within user-requested range) on the fly to join against, so this user appears once a month between Apr 05 through Jan 06.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Oblio,

Based on what you gave me above, how does this look:

declare @datatbl table (

ida int,

idb int,

userid int,

iddate datetime

)

insert into @datatbl values(16251,4,13761,'2005-03-26 00:00:00')

insert into @datatbl values(16251,4,16241,'2006-12-07 00:00:00');

insert into @datatbl values(16601,3,06739,'2005-11-07 00:00:00');

insert into @datatbl values(16932,3,13695,'2005-05-23 00:00:00');

insert into @datatbl values(17266,1,15115,'2005-05-13 00:00:00');

insert into @datatbl values(17266,2,15115,'2005-04-15 00:00:00');

insert into @datatbl values(17266,2,17975,'2006-01-23 00:00:00');

insert into @datatbl values(17266,2,17585,'2006-02-24 00:00:00');

insert into @datatbl values(17266,2,19487,'2006-08-11 00:00:00');

with MyCTE_cte (

RowNumber,

IdA,

IdB,

UserId,

IdDate

) as (

select

row_number() over(partition by ida, idb order by ida, idb) as RowNumber,

ida,

idb,

userid,

iddate

from

@datatbl

)

select

a.IdA,

a.IdB,

a.UserId,

a.IdDate as StartDate,

b.IdDate as EndDate

from

MyCTE_cte a

left outer join MyCTE_cte b

on (a.RowNumber = (b.RowNumber - 1)

and a.IdA = b.IdA

and a.IdB = b.IdB)

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 8

What does the "a.RowNumber = (b.RowNumber - 1)" do? it doesn't look at the previous row, does it? If so, that would be bad, because you don't know if things are in the right order, right?

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

What I gave you may or may not work in your situation, but based on the data it seemed to work. Take it, play with it, read BOL to see what the different pieces are doing. What the a.RowNumber = (b.RowNumber - 1) is doing is linking row one with row two if it exists.

Best thing you can do, is play with it and make changes based on your data and your knowledge of it. I am just trying get you going in the right direction. If you still don't understand things, come back and post your question(s), there are manu of us who will try and help.

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 10

Right, but I'm always looking for better ways to do things.

This is what I'm using to generate an initial working set:

SELECT a.[id1],a.[id2],a.,a.[assignDate]
FROM [sampleTbl] a
JOIN 
(/*fetches all the previous open records */
SELECT [id1],[id2],MAX([assignDate]) "cah_assign"
FROM [sampleTbl] 
WHERE [assignDate]@leftDate)
AND a.[cType] LIKE '_C'
UNION
/*fetches within the time-frame */
SELECT [id1],[id2],,[assignDate]
FROM [sampleTbl]
WHERE [assignDate] BETWEEN @leftDate AND @rightDate
AND [cType] LIKE '_C'

Afterwards, I'm using this to link back to the source to make other determinations (I want the earliest close date; "row two" may not be the earliest possible close).

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

I think we could use a little more info. Could you post the DDL for the source table, some sample data, and the expected output from the sample data. Be sure to include any exceptions you have already mentioned in the sample data so we can see that as well.

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 12

How quickly CTE's have made folks forget all about Temp Tables, Table Variables, and Views

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

CTE's haven't made me forget. I find them easier to use where I have normally used a derived table in a query. I don't mind using them, but also look at moving those to temp tables, table variables, or views when I find that I need the same data in numerous areas.

CTE's allow me to build complex queries incrementally and they make it easier to know what is going on.

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 14

.note{font-style: italic;font-family: serif !important;color: red;Jeff,

I haven't forgotten them, but I thought this would yield more performance.

Lynn,

Obviously, I should have approached this with a bit more background. My ultimate goal is to generate a report showing a count of people who were responsible in a given time-frame. To do so, I need to find out, for each person, when they began oversight, to when it changed hands or was closed, or the type changed. All we care about is how many people+report pairs per month for a given range of months. The source table has historical data, and a person/report pair may have a couple entries while still remaining valid. The second letter of [type] must be a 'C', but we need to know if it changes.

My approach is to gather all the people/reports, then join to an on-the-fly table of months in the range to gather counts.

DECLARE @data TABLE(
id1 INT NOT NULL,
id2 TINYINT NOT NULL,
 INT,
[assignDate] SMALLDATETIME,
[cType] CHAR(2),
[closeDate]  SMALLDATETIME
)

INSERT @data VALUES (2101,5,15123,'2005-03-04','CC',NULL)
INSERT @data VALUES (2101,5,15123,'2006-02-07','CC','2006-07-01')
INSERT @data VALUES (2499,4,07195,'2006-01-04','DP',NULL)
INSERT @data VALUES (2499,4,10068,'2006-01-10','DP',NULL)
INSERT @data VALUES (2499,4,09573,'2006-01-17','DP',NULL)
INSERT @data VALUES (2499,4,09573,'2006-09-25','DC',NULL)
INSERT @data VALUES (3327,5,07957,'2005-09-08','CC','2006-10-31')
INSERT @data VALUES (3352,4,07801,'2005-09-20','DC',NULL)
INSERT @data VALUES (3352,4,07801,'2006-05-17','DC',NULL)
INSERT @data VALUES (3352,4,08328,'2006-07-01','DC',NULL)
INSERT @data VALUES (3352,4,17698,'2006-10-20','DC',NULL)
INSERT @data VALUES (3606,3,09573,'2005-05-17','UC','2006-05-15')
INSERT @data VALUES (3687,0,84085,'2006-07-17','CF','2006-08-07')
INSERT @data VALUES (3687,2,03327,'2005-12-23','DC',NULL)
INSERT @data VALUES (3687,2,03327,'2006-08-15','DC',NULL)
INSERT @data VALUES (3702,0,08327,'2006-03-17','CF','2007-05-17') should be skipped
INSERT @data VALUES (3702,4,89515,'2006-06-05','UY','2007-01-29') should be skipped
INSERT @data VALUES (3963,5,12241,'2006-06-16','DP','2007-01-02') should be skipped
INSERT @data VALUES (4000,2,05233,'2006-01-18','DC',NULL)
INSERT @data VALUES (4000,2,19408,'2006-07-03','DC',NULL)
INSERT @data VALUES (7772,2,15463,'2004-03-04','DC',NULL)
INSERT @data VALUES (7772,2,15463,'2006-02-28','DC',NULL)
INSERT @data VALUES (7772,2,10035,'2006-06-30','DC','2006-09-08')
INSERT @data VALUES (7772,5,15463,'2006-02-06','CC','2006-05-31')
INSERT @data VALUES (8477,0,17673,'2006-07-17','CF','2007-04-12') should be skipped
INSERT @data VALUES (8554,3,12461,'2006-10-18','UC','2006-10-19')
INSERT @data VALUES (8557,3,12338,'2006-01-08','DC','2006-02-17')
INSERT @data VALUES (8557,3,12338,'2006-02-17','DP',NULL)
INSERT @data VALUES (8557,3,12338,'2006-06-15','DC',NULL)
INSERT @data VALUES (11753,4,80325,'2005-10-31','UC','2006-05-10')
INSERT @data VALUES (11753,4,80325,'2006-05-10','US',NULL)
INSERT @data VALUES (11909,0,06290,'2006-01-01','CF','2006-01-24') should be skipped
INSERT @data VALUES (11909,5,06290,'2006-01-06','UC',NULL)
INSERT @data VALUES (11909,5,60447,'2006-01-24','UC',NULL)
INSERT @data VALUES (11909,5,46342,'2006-01-25','UC','2006-04-05')
INSERT @data VALUES (11979,2,03327,'2005-05-10','CC',NULL)
INSERT @data VALUES (11979,2,03327,'2006-05-09','CC','2006-07-03')

So, given a date range of Jan06 through Dec06, I want to generate one record per id-pair per user per consecutive time period. I need to generate a list like this:

id1id2userdateBegandateClose
2101515123'2005-03-04''2006-07-01'the first is overriden by the second; begins before range
2499409573'2006-09-25'nullprior records ignored; wrong type
3327507957'2005-09-08''2006-10-31'begins before range
3352407801'2005-09-20'nullbegins before range, several entries
3606309573'2005-05-17''2006-05-15'begins before range, has close date
4000205233'2006-01-18'null
4000219408'2006-07-03'null
7772215463'2004-03-04''2006-06-30'begins before range, changes user
7772210035'2006-06-30''2006-09-08'second user
7772515463'2006-02-06''2006-05-31'
8554312461'2006-10-18''2006-10-19'appears only in Oct
8557312338'2006-01-08''2006-02-17'second entry closed it
8557312338'2006-06-15'null
11753480325'2005-10-31''2006-05-10'
11909506290'2006-01-06''2006-01-24'begins before range, changes user
11909546342'2006-01-25''2006-04-05'
11979203327'2005-05-10''2006-07-03'begins before range, continues over a second row