Rolling Up Data

  • I need to rollup data that looks like this:

    personnofromdatetodateacctstatcdmcatcd membgroup statgroup hqfieldcd changed_bychanged_datefieldcd tempfldcd

    0000061976-06-01 00:00:00.0001977-11-30 00:00:00.000ENROR0103USCONV1998-08-26 00:00:00INDNULL

    0000061977-12-01 00:00:00.0001981-03-31 00:00:00.000SERVR0104USCONV1998-08-26 00:00:00INDNULL

    0000061981-04-01 00:00:00.0001982-04-30 00:00:00.000FREGR0105USBCROS2008-10-21 00:00:00INDNULL

    0000061983-02-01 00:00:00.0001984-11-30 00:00:00.000ENROR0307USCONV1998-08-26 00:00:00INDNULL

    0000062006-06-01 00:00:00.0002006-07-26 00:00:00.000AWAIR0611USLHECK2006-05-22 00:00:00NULL NULL

    0000062006-07-27 00:00:00.0002008-04-16 00:00:00.000ENROR0612USDCURT2006-07-27 00:00:00INDNULL

    0000062008-04-17 00:00:00.0002009-04-13 00:00:00.000SERVR0613USKWHAT2009-06-22 00:00:00INDNULL

    0000062009-04-14 00:00:00.0002009-06-28 00:00:00.000SERVR0613USKWHAT2009-06-22 00:00:00INDNULL

    0000062009-06-29 00:00:00.0002009-07-29 00:00:00.000FPERR0614USSSMIT2009-06-29 00:00:00INDNULL

    0000062009-07-30 00:00:00.0002009-08-31 00:00:00.000FREGR0615USSSMIT2009-06-29 00:00:00INDNULL

    0000062009-09-01 00:00:00.0002010-06-30 00:00:00.000SERVR0616USSSMIT2009-06-29 00:00:00INDNULL

    0000062010-07-01 00:00:00.0002010-07-31 00:00:00.000FPERR0617USSSMIT2010-11-02 00:00:00NULL NULL

    0000062010-08-01 00:00:00.0002010-11-30 00:00:00.000FREGR0618USSSMIT2010-11-02 00:00:00NULL NULL

    0000062010-12-01 00:00:00.000NULL INTRR0619USSSMIT2010-11-02 00:00:00NULL NULL

    To this:

    personnomcatcdacctstatcdfromdatetodatefieldcd hqfieldcd

    000006RFREG1976-06-011982-04-30INDUS

    000006RENRO1983-02-011984-11-30INDUS

    000006RAWAI2006-06-012006-07-26USUS

    000006RSERV2006-07-272010-06-30INDUS

    000006RINTR2010-07-01NULLUSUS

    Here's the code to generate sample data:

    USE [tempdb]

    GO

    CREATE TABLE [pershist]

    (

    [personno] [varchar](6) NOT NULL,

    [fromdate] [datetime] NOT NULL,

    [todate] [datetime] NULL,

    [acctstatcd] [varchar](10) NOT NULL,

    [mcatcd] [varchar](10) NOT NULL,

    [membgroup] [varchar](2) NOT NULL,

    [statgroup] [varchar](2) NOT NULL,

    [hqfieldcd] [varchar](4) NOT NULL,

    [changed_by] [varchar](50) NULL,

    [changed_date] [smalldatetime] NULL,

    [fieldcd] [varchar](4) NULL,

    [tempfldcd] [varchar](4) NULL,

    CONSTRAINT [PK_pershist] PRIMARY KEY CLUSTERED ([personno] ASC, [fromdate] ASC, [acctstatcd] ASC, [mcatcd] ASC)

    )

    GO

    INSERT INTO

    pershist

    SELECT

    '000006' AS personsno,

    '1976-06-01 00:00:00.000' AS fromdate,

    '1977-11-30 00:00:00.000' AS todate,

    'ENRO' AS acctstatcd,

    'R' AS mcatcd,

    '01' AS membgroup,

    '03' AS statgroup,

    'US' AS hqfieldcd,

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1977-12-01 00:00:00.000',

    '1981-03-31 00:00:00.000',

    'SERV',

    'R',

    '01',

    '04',

    'US',

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1981-04-01 00:00:00.000',

    '1982-04-30 00:00:00.000',

    'FREG',

    'R',

    '01',

    '05',

    'US',

    'BCROS',

    '2008-10-21 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1983-02-01 00:00:00.000',

    '1984-11-30 00:00:00.000',

    'ENRO',

    'R',

    '03',

    '07',

    'US',

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2006-06-01 00:00:00.000',

    '2006-07-26 00:00:00.000',

    'AWAI',

    'R',

    '06',

    '11',

    'US',

    'LHECK',

    '2006-05-22 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2006-07-27 00:00:00.000',

    '2008-04-16 00:00:00.000',

    'ENRO',

    'R',

    '06',

    '12',

    'US',

    'DCURT',

    '2006-07-27 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2008-04-17 00:00:00.000',

    '2009-04-13 00:00:00.000',

    'SERV',

    'R',

    '06',

    '13',

    'US',

    'KWHAT',

    '2009-06-22 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-04-14 00:00:00.000',

    '2009-06-28 00:00:00.000',

    'SERV',

    'R',

    '06',

    '13',

    'US',

    'KWHAT',

    '2009-06-22 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-06-29 00:00:00.000',

    '2009-07-29 00:00:00.000',

    'FPER',

    'R',

    '06',

    '14',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-07-30 00:00:00.000',

    '2009-08-31 00:00:00.000',

    'FREG',

    'R',

    '06',

    '15',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-09-01 00:00:00.000',

    '2010-06-30 00:00:00.000',

    'SERV',

    'R',

    '06',

    '16',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-07-01 00:00:00.000',

    '2010-07-31 00:00:00.000',

    'FPER',

    'R',

    '06',

    '17',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-08-01 00:00:00.000',

    '2010-11-30 00:00:00.000',

    'FREG',

    'R',

    '06',

    '18',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-12-01 00:00:00.000',

    NULL,

    'INTR',

    'R',

    '06',

    '19',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    GO

    IF OBJECT_ID('tempdb.dbo.pershist') IS NOT NULL

    BEGIN

    DROP TABLE pershist

    END

    Basically this data is a set of work assignment locations and statuses for an employee and I essentially need to create "header" rows for each assignment to a location that is contiguous. This is based on personno, membgroup, and fieldcd. Fieldcd is actually determined using COALESCE(tempfldcd, fieldcd, hqfieldcd).

    I thought I could get RANK() (or DENSE_RANK) to work but it with this specific set of data the row with data with a fromdate of 2006-06-01 ends up in the same group as the rows beginning with the row with the fromdate of 2010-07-01 and I need these to be separate rows in the rolled up data because they are not contiguous. I'm thinking a combination of RANK() and ROW_NUMBER() (to get contiguous rows), but thought someone here might have a better way.

  • Jack,

    How are you deciding which AcctStatCD to use here? Is it always the final one, can it be any, or is there a precidence order? For example, in the first rollup, there's ENRO, FREG, and SERV as options for the rollup.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Jack,

    This is close. I *believe* it should work for you, but I didn't have a chance to expand the dataset to make sure I didn't step on anything. Once you let us know how you want to decide the other columns, we can include those into this.

    R3 is superfluous, I left it in for your larger data set reviewing of the results.

    One thing I am getting though is that MAX() is ignoring your NULL, so you'll notice the final entry is a GETDATE() swap via an ISNULL().

    ;WITH cte AS

    (select

    personno,

    fromdate,

    todate,

    membgroup,

    ISNULL( fieldcd, 'US') AS FieldCD,

    Row_Number() OVER ( Partition BY PersonNo Order By FromDate ) AS RowNum,

    Row_Number() OVER ( Partition BY PersonNo, membgroup Order By FieldCD, FromDate ) AS RowNum2,

    Row_Number() OVER ( Partition BY PersonNo, membgroup, FieldCD Order By FromDate ) AS RowNum3

    FROM

    pershist

    )

    , cte2 AS

    (select

    *,

    RowNum - RowNum2 AS r2,

    RowNum - RowNum3 AS r3

    FROM

    cte

    )

    SELECT

    Personno,

    --MAX( mCatCD) AS mCatCD,

    --MAX( AcctStatCD) AS AcctStatCD,

    MIN( FromDate) AS MinFromDate,

    MAX( ISNULL( ToDate, GETDATE())) AS MaxToDate,

    FieldCD,

    'US' AS HQFieldCD

    FROM

    cte2

    GROUP BY

    PersonNo,

    FieldCD,

    r2

    ORDER BY

    PersonNo,

    MIN( FromDate)


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig,

    Looks like it might work. I'm still trying to understand how/why it works but it does look good.

    Here's what's needed for the rest of the columns:

    acctstatcd needs to be the most recent/last for each grouping

    fromdate should be the min fromdate for each group

    todate should be the max todate for each group with NULL being the max in the case where there is a NULL. I can't actually use getdate() for the nulls because it is possible to have a future todate. Don't ask me why because I don't know, I just know it happens.

  • Okay, here's what I've got now:

    USE [tempdb]

    GO

    CREATE TABLE [pershist]

    (

    [personno] [varchar](6) NOT NULL,

    [fromdate] [datetime] NOT NULL,

    [todate] [datetime] NULL,

    [acctstatcd] [varchar](10) NOT NULL,

    [mcatcd] [varchar](10) NOT NULL,

    [membgroup] [varchar](2) NOT NULL,

    [statgroup] [varchar](2) NOT NULL,

    [hqfieldcd] [varchar](4) NOT NULL,

    [changed_by] [varchar](50) NULL,

    [changed_date] [smalldatetime] NULL,

    [fieldcd] [varchar](4) NULL,

    [tempfldcd] [varchar](4) NULL,

    CONSTRAINT [PK_pershist] PRIMARY KEY CLUSTERED ([personno] ASC, [fromdate] ASC, [acctstatcd] ASC, [mcatcd] ASC)

    )

    GO

    INSERT INTO

    pershist

    SELECT

    '000006' AS personsno,

    '1976-06-01 00:00:00.000' AS fromdate,

    '1977-11-30 00:00:00.000' AS todate,

    'ENRO' AS acctstatcd,

    'R' AS mcatcd,

    '01' AS membgroup,

    '03' AS statgroup,

    'US' AS hqfieldcd,

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1977-12-01 00:00:00.000',

    '1981-03-31 00:00:00.000',

    'SERV',

    'R',

    '01',

    '04',

    'US',

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1981-04-01 00:00:00.000',

    '1982-04-30 00:00:00.000',

    'FREG',

    'R',

    '01',

    '05',

    'US',

    'BCROS',

    '2008-10-21 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '1983-02-01 00:00:00.000',

    '1984-11-30 00:00:00.000',

    'ENRO',

    'R',

    '03',

    '07',

    'US',

    'CONV',

    '1998-08-26 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2006-06-01 00:00:00.000',

    '2006-07-26 00:00:00.000',

    'AWAI',

    'R',

    '06',

    '11',

    'US',

    'LHECK',

    '2006-05-22 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2006-07-27 00:00:00.000',

    '2008-04-16 00:00:00.000',

    'ENRO',

    'R',

    '06',

    '12',

    'US',

    'DCURT',

    '2006-07-27 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2008-04-17 00:00:00.000',

    '2009-04-13 00:00:00.000',

    'SERV',

    'R',

    '06',

    '13',

    'US',

    'KWHAT',

    '2009-06-22 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-04-14 00:00:00.000',

    '2009-06-28 00:00:00.000',

    'SERV',

    'R',

    '06',

    '13',

    'US',

    'KWHAT',

    '2009-06-22 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-06-29 00:00:00.000',

    '2009-07-29 00:00:00.000',

    'FPER',

    'R',

    '06',

    '14',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-07-30 00:00:00.000',

    '2009-08-31 00:00:00.000',

    'FREG',

    'R',

    '06',

    '15',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2009-09-01 00:00:00.000',

    '2010-06-30 00:00:00.000',

    'SERV',

    'R',

    '06',

    '16',

    'US',

    'SSMIT',

    '2009-06-29 00:00:00',

    'IND',

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-07-01 00:00:00.000',

    '2010-07-31 00:00:00.000',

    'FPER',

    'R',

    '06',

    '17',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-08-01 00:00:00.000',

    '2010-11-30 00:00:00.000',

    'FREG',

    'R',

    '06',

    '18',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    UNION ALL

    SELECT

    '000006',

    '2010-12-01 00:00:00.000',

    NULL,

    'INTR',

    'R',

    '06',

    '19',

    'US',

    'SSMIT',

    '2010-11-02 00:00:00',

    NULL,

    NULL

    GO

    ;WITH cte AS

    (select

    personno,

    fromdate,

    todate,

    membgroup,

    mcatcd,

    acctstatcd,

    COALESCE(tempfldcd, fieldcd, hqfieldcd) AS FieldCD,

    Row_Number() OVER ( Partition BY PersonNo Order By FromDate ) AS RowNum,

    Row_Number() OVER ( Partition BY PersonNo, membgroup Order By FieldCD, FromDate ) AS RowNum2,

    Row_Number() OVER ( Partition BY PersonNo, membgroup, FieldCD Order By FromDate ) AS RowNum3

    FROM

    pershist

    )

    , cte2 AS

    (select

    *,

    RowNum - RowNum2 AS r2,

    RowNum - RowNum3 AS r3

    FROM

    cte

    )

    , cte3 AS

    (

    SELECT

    Personno,

    mCatCD,

    acctstatcd,

    MIN( FromDate) AS MinFromDate,

    MAX( ISNULL( ToDate, GETDATE())) AS MaxToDate,

    FieldCD,

    'US' AS HQFieldCD,

    r2 AS assignment_group,

    ROW_NUMBER() OVER (PARTITION BY r2 ORDER BY MIN(Fromdate)) AS group_order,

    ROW_NUMBER() OVER (PARTITION BY r2 ORDER BY MIN(Fromdate) desc) AS group_order_desc

    FROM

    cte2 A

    GROUP BY

    PersonNo,

    FieldCD,

    mcatcd,

    acctstatcd,

    r2

    )

    SELECT

    A.personno,

    A.HQFieldCD,

    A.FieldCD,

    A.mcatcd,

    B.acctstatcd,

    A.MinFromDate,

    B.MaxToDate

    FROM

    cte3 A JOIN

    cte3 B

    ON A.assignment_group = B.assignment_group AND

    A.personno = B.personno AND

    A.HQFieldCD = B.HQFieldCD AND

    A.mcatcd = B.mcatcd

    WHERE

    A.group_order = 1 AND

    B.group_order_desc = 1

    IF OBJECT_ID('tempdb.dbo.pershist') IS NOT NULL

    BEGIN

    DROP TABLE pershist

    END

    This gives me the right results for this specific case. I can check it against the rest of the data in the morning.

  • Hey Jack, sorry I fell off the earth on this, kind of got distracted with my own local insanity.

    How'd this end up working out for you?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Seems to be working correctly. I did have to use r3 instead of r2 for later grouping because I found instances where r2 was not grouping correctly.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply