Summarising transaction data removing nulls

Question

Summarising transaction data removing nulls

David 76709

SSC Rookie

Points: 33
More actions
October 27, 2015 at 7:56 am

#309235

I have two tables one list changes of hospital ward and one lists changes of consultant doctor. These can change independently ie a ward change can occur without a consultant change and vice versa. I want to summarise these changes to give the status at each date_serial value.
create table #temp_ward_stay(date_serial bigint,ward_id varchar(10))
;
insert into #temp_ward_stay
values
(201501021200,'W1'),(201501031201,'W2'),(201501091200,'W3'),(201501161200,'W4'),(201501161201,'W5')(201501271200,'W3'),(201502101200,'W5')
;
create table #temp_consultant_episode(
date_serial bigint,
consultant_id varchar(10))
;
insert into #temp_consultant_episode values
(201501021200,'C1'),(201501031200,'C2'),(201501031201,'C3'),(201501091201,'C4'),
(201501121200,'C3'),(201501301200,'C6'),(201502111200,'C6'),(201502111201,'C7')
;
If I use:
select date_serial,consultant_id,null as ward_id from #temp_consultant_episode
union
select date_serial,null as consultant_id,ward_id from #temp_ward_stay
I get:
date_serial ward_id consultant_id
-------------------- ---------- -------------
201501021200 NULL C1
201501021200 W1 NULL
201501031200 NULL C2
201501031201 NULL C3
201501031201 W2 NULL
201501091200 W3 NULL
201501091201 NULL C4
201501121200 NULL C3
201501161200 W4 NULL
201501161201 W5 NULL
201501271200 W3 NULL
201501301200 NULL C6
201502101200 W5 NULL
201502111200 NULL C6
201502111201 NULL C7
whereas what I actually want is:
date_serialward_idconsultant_id
201501021200W1C1
201501031200W1C2
201501031201W2C3
201501091200W3C3
201501091201W3C4
201501121200W3C3
201501161200W4C3
201501161201W5C3
201501271200W3C3
201501301200W3C6
201502101200W5C6
201502111200W5C6
201502111201W5C7
I can see how I could remove the nulls where the date_serial values are the same:
select ce.date_serial,ward_id,consultant_id from #temp_consultant_episode ce
join #temp_ward_stay ws
on ce.date_serial=ws.date_serial
but I can't see how to move forward from here
I be grateful for any suggestions (avoiding cursors etc if possible)...

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286609 More actions · Answer 1

David 76709 (10/27/2015)
I have two tables one list changes of hospital ward and one lists changes of consultant doctor. These can change independently ie a ward change can occur without a consultant change and vice versa. I want to summarise these changes to give the status at each date_serial value.
create table #temp_ward_stay(date_serial bigint,ward_id varchar(10))
;
insert into #temp_ward_stay
values
(201501021200,'W1'),(201501031201,'W2'),(201501091200,'W3'),(201501161200,'W4'),(201501161201,'W5')(201501271200,'W3'),(201502101200,'W5')
;
create table #temp_consultant_episode(
date_serial bigint,
consultant_id varchar(10))
;
insert into #temp_consultant_episode values
(201501021200,'C1'),(201501031200,'C2'),(201501031201,'C3'),(201501091201,'C4'),
(201501121200,'C3'),(201501301200,'C6'),(201502111200,'C6'),(201502111201,'C7')
;
If I use:
select date_serial,consultant_id,null as ward_id from #temp_consultant_episode
union
select date_serial,null as consultant_id,ward_id from #temp_ward_stay
I get:
date_serial ward_id consultant_id
-------------------- ---------- -------------
201501021200 NULL C1
201501021200 W1 NULL
201501031200 NULL C2
201501031201 NULL C3
201501031201 W2 NULL
201501091200 W3 NULL
201501091201 NULL C4
201501121200 NULL C3
201501161200 W4 NULL
201501161201 W5 NULL
201501271200 W3 NULL
201501301200 NULL C6
201502101200 W5 NULL
201502111200 NULL C6
201502111201 NULL C7
whereas what I actually want is:
date_serialward_idconsultant_id
201501021200W1C1
201501031200W1C2
201501031201W2C3
201501091200W3C3
201501091201W3C4
201501121200W3C3
201501161200W4C3
201501161201W5C3
201501271200W3C3
201501301200W3C6
201502101200W5C6
201502111200W5C6
201502111201W5C7
I can see how I could remove the nulls where the date_serial values are the same:
select ce.date_serial,ward_id,consultant_id from #temp_consultant_episode ce
join #temp_ward_stay ws
on ce.date_serial=ws.date_serial
but I can't see how to move forward from here
I be grateful for any suggestions (avoiding cursors etc if possible)...

Hi and welcome to SSC. Thank you for posting ddl and sample data. I really wish more people would do that. It makes helping SO much easier!!!

Here is one way you can accomplish this.

select date_serial

, MAX(consultant_id) as consultant_id

, MAX(ward_id) as ward_id

from

(

select date_serial,consultant_id,null as ward_id from #temp_consultant_episode

union

select date_serial,null as consultant_id,ward_id from #temp_ward_stay

) x

group by x.date_serial

order by x.date_serial

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

David 76709 SSC Rookie Points: 33 More actions · Answer 2

Thank you.

That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:

date_serial consultant_id ward_id

-------------------- ------------- ----------

201501021200 C1 W1

201501031200 C2 NULL <- NULL should be W1

201501031201 C3 W2

201501091200 NULL W3 <- NULL should be C3

201501091201 C4 NULL <- NULL should be W3

201501121200 C3 NULL <- NULL should be W3

201501161200 NULL W4 <- NULL should be C3

201501161201 NULL W5 <- NULL should be C3

201501271200 NULL W3 <- NULL should be C3

201501301200 C6 NULL <- NULL should be W3

201502101200 NULL W5 <- NULL should be C6

201502111200 C6 NULL <-NULL should be W5

201502111201 C7 NULL <-NULL should be W5

Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?

Sean Lange SSC Guru Points: 286609 More actions · Answer 3

David 76709 (10/27/2015)
Thank you.
That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:
date_serial consultant_id ward_id
-------------------- ------------- ----------
201501021200 C1 W1
201501031200 C2 NULL <- NULL should be W1
201501031201 C3 W2
201501091200 NULL W3 <- NULL should be C3
201501091201 C4 NULL <- NULL should be W3
201501121200 C3 NULL <- NULL should be W3
201501161200 NULL W4 <- NULL should be C3
201501161201 NULL W5 <- NULL should be C3
201501271200 NULL W3 <- NULL should be C3
201501301200 C6 NULL <- NULL should be W3
201502101200 NULL W5 <- NULL should be C6
201502111200 C6 NULL <-NULL should be W5
201502111201 C7 NULL <-NULL should be W5
Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?

Are you wanting the most recent value for each date_serial present? Is it safe to use that column as the order? It sort of looks like a date of some sort but doesn't quite work.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

David 76709 SSC Rookie Points: 33 More actions · Answer 4

Sean Lange (10/27/2015)
David 76709 (10/27/2015)
Thank you.
That doesn't quite do it. I am still getting a NULL where the values of date_serial are not present in both tables:
date_serial consultant_id ward_id
-------------------- ------------- ----------
201501021200 C1 W1
201501031200 C2 NULL <- NULL should be W1
201501031201 C3 W2
201501091200 NULL W3 <- NULL should be C3
201501091201 C4 NULL <- NULL should be W3
201501121200 C3 NULL <- NULL should be W3
201501161200 NULL W4 <- NULL should be C3
201501161201 NULL W5 <- NULL should be C3
201501271200 NULL W3 <- NULL should be C3
201501301200 C6 NULL <- NULL should be W3
201502101200 NULL W5 <- NULL should be C6
201502111200 C6 NULL <-NULL should be W5
201502111201 C7 NULL <-NULL should be W5
Warning: Null value is eliminated by an aggregate or other SET operation. <- what does this warning mean?
Are you wanting the most recent value for each date_serial present? Is it safe to use that column as the order? It sort of looks like a date of some sort but doesn't quite work.

The two tables are lists of transactions and the date_serial can be relied upon for order. I want to show all the values from both the tables in date_serial order. When the tables are merged in the union query and a NULL is produced, I want the most recent preceding value (ordered by date serial) for that column to be "carried forward".

eg when the query above gives:

<snip>

201501021200 C1 W1 (there was a row in both tables for this date_serial value)

201501031200 C2 NULL (there was only a row in the consultant_episode table for this date_serial value)

</snip>

I want the NULL value to be replaced by W1 from the preceding row to give:

201501021200 C1 W1

201501031200 C2 W1

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/

Here's an example on how you could handle your data.

DECLARE @date_serial bigint,

@ward_id varchar(10),

@consultant_id varchar(10);

CREATE TABLE #Result(

date_serial bigint,

ward_id varchar(10),

consultant_id varchar(10));

CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);

INSERT INTO #Result

SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,

ward_id,

consultant_id

FROM #temp_ward_stay ws

FULL

JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;

UPDATE r WITH(TABLOCKX)

SET

@ward_id = ward_id = ISNULL( ward_id, @ward_id),

@consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),

@date_serial = date_serial

FROM #Result r

OPTION (MAXDOP 1);

SELECT *

FROM #Result

ORDER BY date_serial;

DROP TABLE #Result;

ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286609 More actions · Answer 6

Luis Cazares (10/27/2015)
This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/
Here's an example on how you could handle your data.
DECLARE @date_serial bigint,
@ward_id varchar(10),
@consultant_id varchar(10);
CREATE TABLE #Result(
date_serial bigint,
ward_id varchar(10),
consultant_id varchar(10));
CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);
INSERT INTO #Result
SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,
ward_id,
consultant_id
FROM #temp_ward_stay ws
FULL
JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;
UPDATE r WITH(TABLOCKX)
SET
@ward_id = ward_id = ISNULL( ward_id, @ward_id),
@consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),
@date_serial = date_serial
FROM #Result r
OPTION (MAXDOP 1);
SELECT *
FROM #Result
ORDER BY date_serial;
DROP TABLE #Result;
ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

Awesome Luis. I was thinking we could use the quirky update but hadn't yet had time to really look at it. Thanks for the cleanup!!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Y.B. SSChampion Points: 11652 More actions · Answer 7

Luis Cazares (10/27/2015)
This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/
Here's an example on how you could handle your data.
DECLARE @date_serial bigint,
@ward_id varchar(10),
@consultant_id varchar(10);
CREATE TABLE #Result(
date_serial bigint,
ward_id varchar(10),
consultant_id varchar(10));
CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);
INSERT INTO #Result
SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,
ward_id,
consultant_id
FROM #temp_ward_stay ws
FULL
JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;
UPDATE r WITH(TABLOCKX)
SET
@ward_id = ward_id = ISNULL( ward_id, @ward_id),
@consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),
@date_serial = date_serial
FROM #Result r
OPTION (MAXDOP 1);
SELECT *
FROM #Result
ORDER BY date_serial;
DROP TABLE #Result;
ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?

SELECT quote FROM brain WHERE original = 1
0 rows returned

Luis Cazares SSC Guru Points: 183706 More actions · Answer 8

yb751 (10/27/2015)
Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?

Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

David 76709 SSC Rookie Points: 33 More actions · Answer 9

Luis Cazares (10/27/2015)
This looks very similar to a running total problem (except that there's no total). This can be managed by the quirky update, a cursor or a triangular join. Here's a further explanation on the advantages and disadvantages of each method. http://www.sqlservercentral.com/articles/T-SQL/68467/
Here's an example on how you could handle your data.
DECLARE @date_serial bigint,
@ward_id varchar(10),
@consultant_id varchar(10);
CREATE TABLE #Result(
date_serial bigint,
ward_id varchar(10),
consultant_id varchar(10));
CREATE CLUSTERED INDEX IX_Result ON #Result( date_serial ASC);
INSERT INTO #Result
SELECT ISNULL( ws.date_serial, ce.date_serial) date_serial,
ward_id,
consultant_id
FROM #temp_ward_stay ws
FULL
JOIN #temp_consultant_episode ce ON ws.date_serial = ce.date_serial;
UPDATE r WITH(TABLOCKX)
SET
@ward_id = ward_id = ISNULL( ward_id, @ward_id),
@consultant_id = consultant_id = ISNULL( consultant_id, @consultant_id),
@date_serial = date_serial
FROM #Result r
OPTION (MAXDOP 1);
SELECT *
FROM #Result
ORDER BY date_serial;
DROP TABLE #Result;
ATTENTION: If you simply copy and paste this code without understanding how and why it works, you're open to major problems.

Many Thanks Luis. As you can probably guess I am still reading the article....:) I have some experience with SQL but not enough to be confident that there was not a set based solution to this problem. I will try and master the quirky update.

Sean Lange SSC Guru Points: 286609 More actions · Answer 10

Luis Cazares (10/27/2015)
yb751 (10/27/2015)
Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?
Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

And the one rule that so many people forget.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Y.B. SSChampion Points: 11652 More actions · Answer 11

Luis Cazares (10/27/2015)
yb751 (10/27/2015)
Awesome solution Luis, I'm going to keep this one for a rainy day. Just curious why you created a @date_serial variable. Is it needed?
Yes, it's needed to keep the code safe. It's part of the rules to implement the quirky update. I strongly suggest that you keep the article instead of my post.

Ahh right, read about using an anchor in the article. Thanks.

SELECT quote FROM brain WHERE original = 1
0 rows returned