Need help with writing TSQL

Question

Need help with writing TSQL

rajg

Say Hey Kid

Points: 691
More actions
November 9, 2010 at 12:54 am

#228826

Hi,
I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

irobertson Hall of Fame Points: 3695 More actions · Answer 1

Hi Bhimraj,

Do you have any other fields in the table to do the grouping on? If not, it makes things trickier. If you could post a SQL statement to create your data table with some sample data in the form of insert statements that would be a big help.

Cheers, Iain

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 2

Can there only ever be be one row with a NULL ChangeDate, with this row belonging to the most recent group?

If this is not the case, then how can you determine which group a row with a NULL date belongs to, since you state that the StepId does not uniquely identify a group?

Or maybe, if there are multiple rows with a NULL ChangeDate, should each "NULL" row be assigned to the last group with the same StepId? - please clarify.

parthi-1705 SSCrazy Eights Points: 9277 More actions · Answer 3

Hi,

Find the below way i have tried

declare @Temp table (Changedate datetime, stepid int)

declare @Temp1 table (Changedate datetime, stepid int,Types bit)

insert into @Temp

Select '2007-06-03 20:39:00',1

union all Select '2007-06-26 12:47:00',1

union all Select '2007-07-09 15:17:00',2

union all Select '2007-08-14 14:42:00',2

union all Select '2007-08-20 10:42:00',2

union all Select '2007-10-10 15:36:00',3

union all Select '2007-12-14 09:53:00',3

union all Select '2008-01-28 14:45:00',2

union all Select '2008-07-02 15:32:00',2

union all Select '2008-12-16 13:38:00',2

union all Select '2009-02-27 14:04:00',2

union all Select NULL,2 /*I have included one more Null to test*/

union all Select '2009-03-06 15:14:00',1

union all Select '2009-03-26 09:41:00',1

union all Select '2009-12-11 20:27:00',1

union all Select '2010-04-14 14:44:00',1

union all Select NULL,1

-- Considering @temp as your table

/*Here you are adding one more column with default as 0 */

insert into @Temp1

Select *,0 from @Temp

Select * from @Temp1

/*Here you are updating the default value to 1 where it is NULL */

Update @Temp1 set Types=1 where Changedate is null

/*Final Statement*/

Select Changedate,stepid,case when Types=1 then 'Current' else 'Copmleted' end as types from @Temp1

Thanks

Parthiban .L

Thanks
Parthi

irobertson Hall of Fame Points: 3695 More actions · Answer 4

Edit: supplies solution to different problem *facepalm*

Sachin Nandanwar SSCertifiable Points: 7699 More actions · Answer 5

declare @t table(id int identity,dt date NULL,stepid int)

insert @t

select '2008-03-03',1 union all

select '2008-03-04',1 union all

select '2008-03-14',5 union all

select null,5 union all

select '2008-03-20',6 union all

select '2008-03-21',6 union all

select '2008-04-04',20 union all

select '2008-04-08',20 union all

select null,20 union all

select '2008-04-15',1 union all

select '2008-04-18',1 union all

select '2008-06-20',1 union all

select '2008-06-22',1 union all

select '2008-06-25',1 union all

select null,1

;with cte

as

(

select id,stepid,(dense_rank()over(order by stepid,id)-id)+stepid rid from @t

)

,cte1

as

(

select c.stepid,t.dt,c.id,c.rid from cte c

inner join @t t on t.id=c.id

where t.dt is null

)

select t.id,t.dt,t.stepid,

(case when c.rid IS null then 'complete'else 'current'end) status

from cte1 c1

inner join cte c on c.rid=c1.rid

right join @t t on t.id=c.id

order by c1.id,c1.stepid

--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server

irobertson Hall of Fame Points: 3695 More actions · Answer 6

Parthiban, I'm not sure your solution does what is required. As far as I can tell, the OP wants *all* records in a group to show as 'current' where the last record in the group has a null date.

The following should do what you need, but comes with a warning that it creates triangular joins, which are a hidden form of RBAR (search for Jeff Moden's article on the subject on this site to understand this if you don't already).

-- create some test data

create table #tmp (

Date_Field datetime

, Int_Field int

)

-- insert some values

-- assumption1: the null row is always the last in a set

-- assumption2: dates are sequential, i.e. all dates in the second group

-- will be after the latest date in the first group, etc

insert #tmp values ('2010-11-09',1)

insert #tmp values ('2010-11-14',1)

insert #tmp values ('2010-11-19',1)

insert #tmp values ('2010-11-24',1)

insert #tmp values (NULL,1)

insert #tmp values ('2010-12-04',2)

insert #tmp values ('2010-12-09',2)

insert #tmp values ('2010-12-14',2)

insert #tmp values (NULL,2)

insert #tmp values ('2010-12-24',3)

insert #tmp values ('2010-12-29',3)

insert #tmp values ('2011-01-03',3)

insert #tmp values ('2011-01-08',3)

insert #tmp values ('2011-01-13',1)

insert #tmp values ('2011-01-18',1)

insert #tmp values ('2011-01-23',1)

insert #tmp values ('2011-01-28',1)

insert #tmp values (NULL,2)

-- add a sequential column and index

alter table #tmp add Order_Key int identity

create clustered index ix_tmp on #tmp(Order_Key)

-- offset join the table to itself to identify

-- where the group for each row starts and finishes

-- warning: these create triangular joins

; with cte as (

select a.Order_Key

, a.Date_Field

, a.Int_Field

, (select isnull(max(Order_Key)+1,1) from #tmp d

where d.Order_Key <= a.Order_Key

and d.Int_Field <> a.Int_Field

) as Bottom_Range

, (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d

where d.Order_Key >= a.Order_Key

and d.Int_Field <> a.Int_Field

) as Top_Range

from #tmp a

left join #tmp b

on a.Order_Key = b.Order_Key + 1

)

-- pull the final extract together

select *

, case when exists (select 'Null Value Found' from cte e

where e.Order_Key between cte.Bottom_Range and cte.Top_Range

and e.Date_Field is null

)

then 'Current'

else 'Completed'

end as [Type]

from cte

drop table #tmp

I couldn't come up with a way to avoids this, perhaps someone else might be able to?

Regards, Iain

irobertson Hall of Fame Points: 3695 More actions · Answer 7

Nice solution Sachin. I've run some comparative testing, which throws up some interesting results. Using the following test script, your solution runs quicker and has much lower read counts. But, when you generate a comparative execution plan, SQL estimates the rows returned by yours really badly, making it look like my solution is better (by 0% to 100%), even though it isn't. I wonder why this is?

Also, there is an issue with your solution producing too many rows, not sure why this might be? The row count in the test table is 73728, your solution returns 75828.

Test script:

/**** uses

dbcc freeproccache

dbcc dropcleanbuffers

do not run on production server

*****/

-- create some test data

create table #tmp (

Date_Field datetime

, Int_Field int

)

-- insert some values

-- assumption: the null row is always the last in a set

insert #tmp values ('2010-11-09',1)

insert #tmp values ('2010-11-14',1)

insert #tmp values ('2010-11-19',1)

insert #tmp values ('2010-11-24',1)

insert #tmp values (NULL,1)

insert #tmp values ('2010-12-04',2)

insert #tmp values ('2010-12-09',2)

insert #tmp values ('2010-12-14',2)

insert #tmp values (NULL,2)

insert #tmp values ('2010-12-24',3)

insert #tmp values ('2010-12-29',3)

insert #tmp values ('2011-01-03',3)

insert #tmp values ('2011-01-08',3)

insert #tmp values ('2011-01-13',1)

insert #tmp values ('2011-01-18',1)

insert #tmp values ('2011-01-23',1)

insert #tmp values ('2011-01-28',1)

insert #tmp values (NULL,2)

-- bump up the data counts

declare @i int = 1

while @i <=12

begin

insert #tmp

select dateadd(dd,90*@i,Date_Field)

, Int_Field

from #tmp

set @i = @i+1

end

select count(*) from #tmp

-- 73728

-- add a sequential column and index

alter table #tmp add Order_Key int identity

create clustered index ix_tmp on #tmp(Order_Key)

dbcc freeproccache

dbcc dropcleanbuffers

set statistics io on

set statistics time on

-- offset join the table to itself to identify

-- where the group for each row starts and finishes

-- warning: these create triangular joins

; with cte as (

select a.Order_Key

, a.Date_Field

, a.Int_Field

, (select isnull(max(Order_Key)+1,1) from #tmp d

where d.Order_Key <= a.Order_Key

and d.Int_Field <> a.Int_Field

) as Bottom_Range

, (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d

where d.Order_Key >= a.Order_Key

and d.Int_Field <> a.Int_Field

) as Top_Range

from #tmp a

left join #tmp b

on a.Order_Key = b.Order_Key + 1

)

-- pull the final extract together

select *

, case when exists (select 'Null Value Found' from cte e

where e.Order_Key between cte.Bottom_Range and cte.Top_Range

and e.Date_Field is null

)

then 'Current'

else 'Completed'

end as [Type]

from cte

go

dbcc freeproccache

dbcc dropcleanbuffers

;with cte

as

(

select Order_Key,Int_Field,(dense_rank()over(order by Int_Field,Order_Key)-Order_Key)+Int_Field rid from #tmp

)

,cte1

as

(

select c.Int_Field,t.Date_Field,c.Order_Key,c.rid from cte c

inner join #tmp t on t.Order_Key=c.Order_Key

where t.Date_Field is null

)

select t.Order_Key,t.Date_Field,t.Int_Field,

(case when c.rid IS null then 'complete'else 'current'end) status

from cte1 c1

inner join cte c on c.rid=c1.rid

right join #tmp t on t.Order_Key=c.Order_Key

order by t.Order_Key,c1.Int_Field

-- 75828 rows

drop table #tmp

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 8

Sachin's solution does rely on the added IDENTITY column to define the order of all the rows in the table, including those rows with a NULL ChangeDate. If there is some other column (that the OP hasn't yet mentioned) that could be used to define this ordering then Sachin's approach could made to work, but without such a column and without further clarification from the OP we are forced to make arbitrary assumptions.

If there is a single row with a NULL ChangeDate that has the same StepId as the row with the maximum non-null ChangeDate value, then the situation is unambiguous.

ChangeDate StepId Status

----------- ------ --------

2010-05-01 1 Complete

2010-05-03 1 Complete

2010-05-08 1 Complete

2010-06-02 2 Current

2010-06-07 2 Current

NULL 2 Current

Now suppose we add another row with StepId = 1 and a NULL ChangeDate.

ChangeDate StepId

----------- --------

NULL 1

Since the ChangeDate column is NULL, there is no unambiguous ordering for this row. Is it part of the original group with StepId = 1 like this...?

ChangeDate StepId Status

----------- ------ --------

2010-05-01 1 Current

2010-05-03 1 Current

2010-05-08 1 Current

NULL 1 Current

2010-06-02 2 Current

2010-06-07 2 Current

NULL 2 Current

Or is it the first row of a new group, like this...?

ChangeDate StepId Status

----------- ------ --------

2010-05-01 1 Complete

2010-05-03 1 Complete

2010-05-08 1 Complete

2010-06-02 2 Current

2010-06-07 2 Current

NULL 2 Current

NULL 1 Current

rajg Say Hey Kid Points: 691 More actions · Answer 9

Sorry for delay in reply and THANK YOU all for the solutions you provided here. Your help is very much appreciated.

Solutions provided by parthi-1705, sachin-355820, irobertson - is what I was looking for. I have some other columns in this table such as identity column and few other. But none of these column can be used for grouping.

@andrewd.smith : There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information. But the solution provided here works for me.

I am going to check performance with actual data. Thank you again !! This is why I like this forum - people are ready with differnt solutions much before you clarify the problem.

irobertson Hall of Fame Points: 3695 More actions · Answer 10

rajg (11/9/2010)
There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.

This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉

Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...

Regards, Iain

Edit: typo...

WayneS SSC Guru Points: 95461 More actions · Answer 11

rajg (11/9/2010)
Hi,
I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj

Do you have anything besides the StepID to put this data into "groups"? As it is, the StepID can repeat, so the definition of a group is that the StepID changes from it's previous value. If there are some other fields that will provide this grouping, I'm sure that there is an extremely efficient, fast method to solve this issue.

Did you see how Iain provided sample data in a readily consumable format? If you would do this, it would make it A LOT easier on all of the volunteers on this site. Please read the first link in my signature for how to do this. You should have (at the minimum), the CREATE TABLE statement(s) for the applicable table(s), and INSERT statement(s) for each table to put in some sample data, and expected results based upon the sample data provided. Indexes and constraints frequently prove to be very useful also!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

irobertson (11/9/2010)
rajg (11/9/2010)
There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.
This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉
Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...
Regards, Iain
Edit: typo...

I'd be interetested in knowing, as well. One "out of place" date (ie: parallel steps) and BOOM!... all the solutions shatter like glass because they rely only on a sequential date or the supposed implicit order of data. Not good. Now, if that's all there is, then all is well. But, like has been asked many times, I'd really be interested if there are any other columns of data associated with this problem so we can make the solutions a bit more shatter-proof.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

rajg (11/9/2010)
Hi,
I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj

Pictures are pretty but they don't do a whole lot for helping people help you. Please take a look at the article located in my signature line below. THAT's the way to post data to get the best help.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 14

I take that back. Sachin's looks like it'll work just fine (classic double rownumber) but is guaranteed only if the original insert were to have an ORDER BY in this case.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)