Summarising transaction data removing nulls

  • I have two tables one list changes of hospital ward and one lists changes of consultant doctor. These can change independently ie a ward change can occur without a consultant change and vice versa. I want to summarise these changes to give the status at each date_serial value.

    create table #temp_ward_stay(date_serial bigint,ward_id varchar(10))

    ;

    insert into #temp_ward_stay

    values

    (201501021200,'W1'),(201501031201,'W2'),(201501091200,'W3'),(201501161200,'W4'),(201501161201,'W5')(201501271200,'W3'),(201502101200,'W5')

    ;

    create table #temp_consultant_episode(

    date_serial bigint,

    consultant_id varchar(10))

    ;

    insert into #temp_consultant_episode values

    (201501021200,'C1'),(201501031200,'C2'),(201501031201,'C3'),(201501091201,'C4'),

    (201501121200,'C3'),(201501301200,'C6'),(201502111200,'C6'),(201502111201,'C7')

    ;

    If I use:

    select date_serial,consultant_id,null as ward_id from #temp_consultant_episode

    union

    select date_serial,null as consultant_id,ward_id from #temp_ward_stay

    I get:

    date_serial ward_id consultant_id

    -------------------- ---------- -------------

    201501021200 NULL C1

    201501021200 W1 NULL

    201501031200 NULL C2

    201501031201 NULL C3

    201501031201 W2 NULL

    201501091200 W3 NULL

    201501091201 NULL C4

    201501121200 NULL C3

    201501161200 W4 NULL

    201501161201 W5 NULL

    201501271200 W3 NULL

    201501301200 NULL C6

    201502101200 W5 NULL

    201502111200 NULL C6

    201502111201 NULL C7

    whereas what I actually want is:

    date_serialward_idconsultant_id

    201501021200W1C1

    201501031200W1C2

    201501031201W2C3

    201501091200W3C3

    201501091201W3C4

    201501121200W3C3

    201501161200W4C3

    201501161201W5C3

    201501271200W3C3

    201501301200W3C6

    201502101200W5C6

    201502111200W5C6

    201502111201W5C7

    I can see how I could remove the nulls where the date_serial values are the same:

    select ce.date_serial,ward_id,consultant_id from #temp_consultant_episode ce

    join #temp_ward_stay ws

    on ce.date_serial=ws.date_serial

    but I can't see how to move forward from here

    I be grateful for any suggestions (avoiding cursors etc if possible)...

  • David 76709 (10/27/2015)


    I have two tables one list changes of hospital ward and one lists changes of consultant doctor. These can change independently ie a ward change can occur without a consultant change and vice versa. I want to summarise these changes to give the status at each date_serial value.

    create table #temp_ward_stay(date_serial bigint,ward_id varchar(10))

    ;

    insert into #temp_ward_stay

    values

    (201501021200,'W1'),(201501031201,'W2'),(201501091200,'W3'),(201501161200,'W4'),(201501161201,'W5')(201501271200,'W3'),(201502101200,'W5')

    ;

    create table #temp_consultant_episode(

    date_serial bigint,

    consultant_id varchar(10))

    ;

    insert into #temp_consultant_episode values

    (201501021200,'C1'),(201501031200,'C2'),(201501031201,'C3'),(201501091201,'C4'),

    (201501121200,'C3'),(201501301200,'C6'),(201502111200,'C6'),(201502111201,'C7')

    ;

    If I use:

    select date_serial,consultant_id,null as ward_id from #temp_consultant_episode

    union

    select date_serial,null as consultant_id,ward_id from #temp_ward_stay

    I get:

    date_serial ward_id consultant_id

    -------------------- ---------- -------------

    201501021200 NULL C1

    201501021200 W1 NULL

    201501031200 NULL C2

    201501031201 NULL C3

    201501031201 W2 NULL

    201501091200 W3 NULL

    201501091201 NULL C4

    201501121200 NULL C3

    201501161200 W4 NULL

    201501161201 W5 NULL

    201501271200 W3 NULL

    201501301200 NULL C6

    201502101200 W5 NULL

    201502111200 NULL C6

    201502111201 NULL C7

    whereas what I actually want is:

    date_serialward_idconsultant_id

    201501021200W1C1

    201501031200W1C2

    201501031201W2C3

    201501091200W3C3

    201501091201W3C4

    201501121200W3C3

    201501161200W4C3

    201501161201W5C3

    201501271200W3C3

    201501301200W3C6

    201502101200W5C6

    201502111200W5C6

    201502111201W5C7

    I can see how I could remove the nulls where the date_serial values are the same:

    select ce.date_serial,ward_id,consultant_id from #temp_consultant_episode ce

    join #temp_ward_stay ws

    on ce.date_serial=ws.date_serial

    but I can't see how to move forward from here

    I be grateful for any suggestions (avoiding cursors etc if possible)...

    Hi and welcome to SSC. Thank you for posting ddl and sample data. I really wish more people would do that. It makes helping SO much easier!!!

    Here is one way you can accomplish this.

    select date_serial

    , MAX(consultant_id) as consultant_id

    , MAX(ward_id) as ward_id

    from

    (

    select date_serial,consultant_id,null as ward_id from #temp_consultant_episode

    union

    select date_serial,null as consultant_id,ward_id from #temp_ward_stay

    ) x

    group by x.date_serial

    order by x.date_serial

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you.

    That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:

    date_serial consultant_id ward_id

    -------------------- ------------- ----------

    201501021200 C1 W1

    201501031200 C2 NULL <- NULL should be W1

    201501031201 C3 W2

    201501091200 NULL W3 <- NULL should be C3

    201501091201 C4 NULL <- NULL should be W3

    201501121200 C3 NULL <- NULL should be W3

    201501161200 NULL W4 <- NULL should be C3

    201501161201 NULL W5 <- NULL should be C3

    201501271200 NULL W3 <- NULL should be C3

    201501301200 C6 NULL <- NULL should be W3

    201502101200 NULL W5 <- NULL should be C6

    201502111200 C6 NULL <-NULL should be W5

    201502111201 C7 NULL <-NULL should be W5

    Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?

  • David 76709 (10/27/2015)


    Thank you.

    That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:

    date_serial consultant_id ward_id

    -------------------- ------------- ----------

    201501021200 C1 W1

    201501031200 C2 NULL <- NULL should be W1

    201501031201 C3 W2

    201501091200 NULL W3 <- NULL should be C3

    201501091201 C4 NULL <- NULL should be W3

    201501121200 C3 NULL <- NULL should be W3

    201501161200 NULL W4 <- NULL should be C3

    201501161201 NULL W5 <- NULL should be C3

    201501271200 NULL W3 <- NULL should be C3

    201501301200 C6 NULL <- NULL should be W3

    201502101200 NULL W5 <- NULL should be C6

    201502111200 C6 NULL <-NULL should be W5

    201502111201 C7 NULL <-NULL should be W5

    Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?

    Are you wanting the most recent value for each date_serial present? Is it safe to use that column as the order? It sort of looks like a date of some sort but doesn't quite work.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (10/27/2015)


    David 76709 (10/27/2015)


    Thank you.

    That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:

    date_serial consultant_id ward_id

    -------------------- ------------- ----------

    201501021200 C1 W1

    201501031200 C2 NULL <- NULL should be W1

    201501031201 C3 W2

    201501091200 NULL W3 <- NULL should be C3

    201501091201 C4 NULL <- NULL should be W3

    201501121200 C3 NULL <- NULL should be W3

    201501161200 NULL W4 <- NULL should be C3

    201501161201 NULL W5 <- NULL should be C3

    201501271200 NULL W3 <- NULL should be C3

    201501301200 C6 NULL <- NULL should be W3

    201502101200 NULL W5 <- NULL should be C6

    201502111200 C6 NULL <-NULL should be W5

    201502111201 C7 NULL <-NULL should be W5

    Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?

    Are you wanting the most recent value for each date_serial present? Is it safe to use that column as the order? It sort of looks like a date of some sort but doesn't quite work.

    The two tables are lists of transactions and the date_serial can be relied upon for order. I want to show all the values from both the tables in date_serial order. When the tables are merged in the union query and a NULL is produced, I want the most recent preceding value (ordered by date serial) for that column to be "carried forward".

    eg when the query above gives:

    <snip>

    201501021200 C1 W1 (there was a row in both tables for this date_serial value)

    201501031200 C2 NULL (there was only a row in the consultant_episode table for this date_serial value)

    </snip>

    I want the NULL value to be replaced by W1 from the preceding row to give:

    201501021200 C1 W1

    201501031200 C2 W1

  • This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/

    Here's an example on how you could handle your data.

    DECLARE @date_serial bigint,

    @ward_id varchar(10),

    @consultant_id varchar(10);

    CREATE TABLE #Result(

    date_serial bigint,

    ward_id varchar(10),

    consultant_id varchar(10));

    CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);

    INSERT INTO #Result

    SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,

    ward_id,

    consultant_id

    FROM #temp_ward_stay ws

    FULL

    JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;

    UPDATE r WITH(TABLOCKX)

    SET

    @ward_id = ward_id = ISNULL( ward_id, @ward_id),

    @consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),

    @date_serial = date_serial

    FROM #Result r

    OPTION (MAXDOP 1);

    SELECT *

    FROM #Result

    ORDER BY date_serial;

    DROP TABLE #Result;

    ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (10/27/2015)


    This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/

    Here's an example on how you could handle your data.

    DECLARE @date_serial bigint,

    @ward_id varchar(10),

    @consultant_id varchar(10);

    CREATE TABLE #Result(

    date_serial bigint,

    ward_id varchar(10),

    consultant_id varchar(10));

    CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);

    INSERT INTO #Result

    SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,

    ward_id,

    consultant_id

    FROM #temp_ward_stay ws

    FULL

    JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;

    UPDATE r WITH(TABLOCKX)

    SET

    @ward_id = ward_id = ISNULL( ward_id, @ward_id),

    @consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),

    @date_serial = date_serial

    FROM #Result r

    OPTION (MAXDOP 1);

    SELECT *

    FROM #Result

    ORDER BY date_serial;

    DROP TABLE #Result;

    ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

    Awesome Luis. I was thinking we could use the quirky update but hadn't yet had time to really look at it. Thanks for the cleanup!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Luis Cazares (10/27/2015)


    This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/

    Here's an example on how you could handle your data.

    DECLARE @date_serial bigint,

    @ward_id varchar(10),

    @consultant_id varchar(10);

    CREATE TABLE #Result(

    date_serial bigint,

    ward_id varchar(10),

    consultant_id varchar(10));

    CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);

    INSERT INTO #Result

    SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,

    ward_id,

    consultant_id

    FROM #temp_ward_stay ws

    FULL

    JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;

    UPDATE r WITH(TABLOCKX)

    SET

    @ward_id = ward_id = ISNULL( ward_id, @ward_id),

    @consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),

    @date_serial = date_serial

    FROM #Result r

    OPTION (MAXDOP 1);

    SELECT *

    FROM #Result

    ORDER BY date_serial;

    DROP TABLE #Result;

    ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

    Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • yb751 (10/27/2015)


    Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?

    Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (10/27/2015)


    This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/

    Here's an example on how you could handle your data.

    DECLARE @date_serial bigint,

    @ward_id varchar(10),

    @consultant_id varchar(10);

    CREATE TABLE #Result(

    date_serial bigint,

    ward_id varchar(10),

    consultant_id varchar(10));

    CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);

    INSERT INTO #Result

    SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,

    ward_id,

    consultant_id

    FROM #temp_ward_stay ws

    FULL

    JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;

    UPDATE r WITH(TABLOCKX)

    SET

    @ward_id = ward_id = ISNULL( ward_id, @ward_id),

    @consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),

    @date_serial = date_serial

    FROM #Result r

    OPTION (MAXDOP 1);

    SELECT *

    FROM #Result

    ORDER BY date_serial;

    DROP TABLE #Result;

    ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

    Many Thanks Luis. As you can probably guess I am still reading the article....:) I have some experience with SQL but not enough to be confident that there was not a set based solution to this problem. I will try and master the quirky update.

  • Luis Cazares (10/27/2015)


    yb751 (10/27/2015)


    Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?

    Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

    And the one rule that so many people forget.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Luis Cazares (10/27/2015)


    yb751 (10/27/2015)


    Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?

    Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

    Ahh right, read about using an anchor in the article. Thanks.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply