self-joining derived tables?

  • Is it possible to self-join a derived table? I have a master query generating a recordset that I want to introspectively join to and extract another value. So, given these values:

    id     date
    07801  2004-12-12
    15224  2006-07-06
    19890  2006-09-11
    19890  2006-10-06
    19890  2006-10-23
    

    I want to generate a recordset like this:

    id     date        end_date
    07801  2004-12-12  2006-07-06
    15224  2006-07-06  2006-09-11
    19890  2006-09-11  null
    19890  2006-10-06  null
    19890  2006-10-23  null
    

    (the rule is, get MIN(date) when 1.id <> 2.id)

    The trick is, the first set is created by a weighty query which I don't want to have to rerun. I considered pushing it into a table var and running updates, but that could be weighty, too.

    Does this make sense? Is it possible?

  • As this is SQL 2005 (based on where you posted), try putting your hefty query as a CTE and using that in your query.

  • Option b is to insert the results of this query into a table variable or temp table, then do the self join there.

  • Yep, CTE's seem to do the job. I just wish the rest of it was smoother.

    Interestingly, I couldn't run two consecutive selects off of it; I probably don't understand it that well.

    Thanks!

  • Show us what you have so far.  Perhaps we can see what changes are needed to get it to work.

  • CTE's are only available for the next SELECT statement. You can't define a CTE & then run multiple selects against it. If you need that kind of functionality, you'll need temp tables or table variables.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Lynn, I would, except the CTE is 18 lines, and the principle use is 30 lines. The basis is a historical table, like this:

    id-aid-buserdate
    16251413761 2005-03-26 00:00:00
    16251416241 2006-12-07 00:00:00
    16601306739 2005-11-07 00:00:00
    16932313695 2005-05-23 00:00:00
    17266115115 2005-05-13 00:00:00
    17266215115 2005-04-15 00:00:00
    17266217975 2006-01-23 00:00:00
    17266217585 2006-02-24 00:00:00
    17266219487 2006-08-11 00:00:00
    

    As I said, it's historical, so each line of (id-a & id-b) is superseded by the next largest date. For example, 17266:2 changes users between 4/15/05 and 1/23/06. My goal is to create an "end" date like this:

    17266215115 2005-04-15 2006-01-23
    

    I'm also generating a table of months (within user-requested range) on the fly to join against, so this user appears once a month between Apr 05 through Jan 06.

  • Oblio,

    Based on what you gave me above, how does this look:

    declare @datatbl table (

        ida int,

        idb int,

        userid int,

        iddate datetime

        )

    insert into @datatbl values(16251,4,13761,'2005-03-26 00:00:00')

    insert into @datatbl values(16251,4,16241,'2006-12-07 00:00:00');

    insert into @datatbl values(16601,3,06739,'2005-11-07 00:00:00');

    insert into @datatbl values(16932,3,13695,'2005-05-23 00:00:00');

    insert into @datatbl values(17266,1,15115,'2005-05-13 00:00:00');

    insert into @datatbl values(17266,2,15115,'2005-04-15 00:00:00');

    insert into @datatbl values(17266,2,17975,'2006-01-23 00:00:00');

    insert into @datatbl values(17266,2,17585,'2006-02-24 00:00:00');

    insert into @datatbl values(17266,2,19487,'2006-08-11 00:00:00');

    with MyCTE_cte (

        RowNumber,

        IdA,

        IdB,

        UserId,

        IdDate

    ) as (

    select

        row_number() over(partition by ida, idb order by ida, idb) as RowNumber,

        ida,

        idb,

        userid,

        iddate

    from

        @datatbl

    )

    select

        a.IdA,

        a.IdB,

        a.UserId,

        a.IdDate as StartDate,

        b.IdDate as EndDate

    from

        MyCTE_cte a

        left outer join MyCTE_cte b

            on (a.RowNumber = (b.RowNumber - 1)

                and a.IdA = b.IdA

                and a.IdB = b.IdB)

  • What does the "a.RowNumber = (b.RowNumber - 1)" do? it doesn't look at the previous row, does it? If so, that would be bad, because you don't know if things are in the right order, right?

  • What I gave you may or may not work in your situation, but based on the data it seemed to work.  Take it, play with it, read BOL to see what the different pieces are doing.  What the a.RowNumber = (b.RowNumber - 1) is doing is linking row one with row two if it exists.

    Best thing you can do, is play with it and make changes based on your data and your knowledge of it.  I am just trying get you going in the right direction.  If you still don't understand things, come back and post your question(s), there are manu of us who will try and help.

  • Right, but I'm always looking for better ways to do things.

    This is what I'm using to generate an initial working set:

    SELECT a.[id1],a.[id2],a.,a.[assignDate]
    FROM [sampleTbl] a
    JOIN 
    (/*fetches all the previous open records */
    SELECT [id1],[id2],MAX([assignDate]) "cah_assign"
    FROM [sampleTbl] 
    WHERE [assignDate]@leftDate)
    AND a.[cType] LIKE '_C'
    UNION
    /*fetches within the time-frame */
    SELECT [id1],[id2],,[assignDate]
    FROM [sampleTbl]
    WHERE [assignDate] BETWEEN @leftDate AND @rightDate
    AND [cType] LIKE '_C'
    

    Afterwards, I'm using this to link back to the source to make other determinations (I want the earliest close date; "row two" may not be the earliest possible close).

  • I think we could use a little more info.  Could you post the DDL for the source table, some sample data, and the expected output from the sample data.  Be sure to include any exceptions you have already mentioned in the sample data so we can see that as well.

  • How quickly CTE's have made folks forget all about Temp Tables, Table Variables, and Views

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CTE's haven't made me forget.  I find them easier to use where I have normally used a derived table in a query.  I don't mind using them, but also look at moving those to temp tables, table variables, or views when I find that I need the same data in numerous areas.

    CTE's allow me to build complex queries incrementally and they make it easier to know what is going on.

  • .note{font-style: italic;font-family: serif !important;color: red;Jeff,

    I haven't forgotten them, but I thought this would yield more performance.

    Lynn,

    Obviously, I should have approached this with a bit more background. My ultimate goal is to generate a report showing a count of people who were responsible in a given time-frame. To do so, I need to find out, for each person, when they began oversight, to when it changed hands or was closed, or the type changed. All we care about is how many people+report pairs per month for a given range of months. The source table has historical data, and a person/report pair may have a couple entries while still remaining valid. The second letter of [type] must be a 'C', but we need to know if it changes.

    My approach is to gather all the people/reports, then join to an on-the-fly table of months in the range to gather counts.

    DECLARE @data TABLE(
    id1 INT NOT NULL,
    id2 TINYINT NOT NULL,
     INT,
    [assignDate] SMALLDATETIME,
    [cType] CHAR(2),
    [closeDate]  SMALLDATETIME
    )
    
    INSERT @data VALUES (2101,5,15123,'2005-03-04','CC',NULL)
    INSERT @data VALUES (2101,5,15123,'2006-02-07','CC','2006-07-01')
    INSERT @data VALUES (2499,4,07195,'2006-01-04','DP',NULL)
    INSERT @data VALUES (2499,4,10068,'2006-01-10','DP',NULL)
    INSERT @data VALUES (2499,4,09573,'2006-01-17','DP',NULL)
    INSERT @data VALUES (2499,4,09573,'2006-09-25','DC',NULL)
    INSERT @data VALUES (3327,5,07957,'2005-09-08','CC','2006-10-31')
    INSERT @data VALUES (3352,4,07801,'2005-09-20','DC',NULL)
    INSERT @data VALUES (3352,4,07801,'2006-05-17','DC',NULL)
    INSERT @data VALUES (3352,4,08328,'2006-07-01','DC',NULL)
    INSERT @data VALUES (3352,4,17698,'2006-10-20','DC',NULL)
    INSERT @data VALUES (3606,3,09573,'2005-05-17','UC','2006-05-15')
    INSERT @data VALUES (3687,0,84085,'2006-07-17','CF','2006-08-07')
    INSERT @data VALUES (3687,2,03327,'2005-12-23','DC',NULL)
    INSERT @data VALUES (3687,2,03327,'2006-08-15','DC',NULL)
    INSERT @data VALUES (3702,0,08327,'2006-03-17','CF','2007-05-17') should be skipped
    INSERT @data VALUES (3702,4,89515,'2006-06-05','UY','2007-01-29') should be skipped
    INSERT @data VALUES (3963,5,12241,'2006-06-16','DP','2007-01-02') should be skipped
    INSERT @data VALUES (4000,2,05233,'2006-01-18','DC',NULL)
    INSERT @data VALUES (4000,2,19408,'2006-07-03','DC',NULL)
    INSERT @data VALUES (7772,2,15463,'2004-03-04','DC',NULL)
    INSERT @data VALUES (7772,2,15463,'2006-02-28','DC',NULL)
    INSERT @data VALUES (7772,2,10035,'2006-06-30','DC','2006-09-08')
    INSERT @data VALUES (7772,5,15463,'2006-02-06','CC','2006-05-31')
    INSERT @data VALUES (8477,0,17673,'2006-07-17','CF','2007-04-12') should be skipped
    INSERT @data VALUES (8554,3,12461,'2006-10-18','UC','2006-10-19')
    INSERT @data VALUES (8557,3,12338,'2006-01-08','DC','2006-02-17')
    INSERT @data VALUES (8557,3,12338,'2006-02-17','DP',NULL)
    INSERT @data VALUES (8557,3,12338,'2006-06-15','DC',NULL)
    INSERT @data VALUES (11753,4,80325,'2005-10-31','UC','2006-05-10')
    INSERT @data VALUES (11753,4,80325,'2006-05-10','US',NULL)
    INSERT @data VALUES (11909,0,06290,'2006-01-01','CF','2006-01-24') should be skipped
    INSERT @data VALUES (11909,5,06290,'2006-01-06','UC',NULL)
    INSERT @data VALUES (11909,5,60447,'2006-01-24','UC',NULL)
    INSERT @data VALUES (11909,5,46342,'2006-01-25','UC','2006-04-05')
    INSERT @data VALUES (11979,2,03327,'2005-05-10','CC',NULL)
    INSERT @data VALUES (11979,2,03327,'2006-05-09','CC','2006-07-03')

    So, given a date range of Jan06 through Dec06, I want to generate one record per id-pair per user per consecutive time period. I need to generate a list like this:

    id1id2userdateBegandateClose
    2101515123'2005-03-04''2006-07-01'the first is overriden by the second; begins before range
    2499409573'2006-09-25'nullprior records ignored; wrong type
    3327507957'2005-09-08''2006-10-31'begins before range
    3352407801'2005-09-20'nullbegins before range, several entries
    3606309573'2005-05-17''2006-05-15'begins before range, has close date
    4000205233'2006-01-18'null
    4000219408'2006-07-03'null
    7772215463'2004-03-04''2006-06-30'begins before range, changes user
    7772210035'2006-06-30''2006-09-08'second user
    7772515463'2006-02-06''2006-05-31'
    8554312461'2006-10-18''2006-10-19'appears only in Oct
    8557312338'2006-01-08''2006-02-17'second entry closed it
    8557312338'2006-06-15'null
    11753480325'2005-10-31''2006-05-10'
    11909506290'2006-01-06''2006-01-24'begins before range, changes user
    11909546342'2006-01-25''2006-04-05'
    11979203327'2005-05-10''2006-07-03'begins before range, continues over a second row
    

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply