November 9, 2010 at 12:54 am
Hi,
I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj
November 9, 2010 at 2:15 am
Hi Bhimraj,
Do you have any other fields in the table to do the grouping on? If not, it makes things trickier. If you could post a SQL statement to create your data table with some sample data in the form of insert statements that would be a big help.
Cheers, Iain
November 9, 2010 at 3:02 am
Can there only ever be be one row with a NULL ChangeDate, with this row belonging to the most recent group?
If this is not the case, then how can you determine which group a row with a NULL date belongs to, since you state that the StepId does not uniquely identify a group?
Or maybe, if there are multiple rows with a NULL ChangeDate, should each "NULL" row be assigned to the last group with the same StepId? - please clarify.
November 9, 2010 at 5:44 am
Hi,
Find the below way i have tried
declare @Temp table (Changedate datetime, stepid int)
declare @Temp1 table (Changedate datetime, stepid int,Types bit)
insert into @Temp
Select '2007-06-03 20:39:00',1
union all Select '2007-06-26 12:47:00',1
union all Select '2007-07-09 15:17:00',2
union all Select '2007-08-14 14:42:00',2
union all Select '2007-08-20 10:42:00',2
union all Select '2007-10-10 15:36:00',3
union all Select '2007-12-14 09:53:00',3
union all Select '2008-01-28 14:45:00',2
union all Select '2008-07-02 15:32:00',2
union all Select '2008-12-16 13:38:00',2
union all Select '2009-02-27 14:04:00',2
union all Select NULL,2 /*I have included one more Null to test*/
union all Select '2009-03-06 15:14:00',1
union all Select '2009-03-26 09:41:00',1
union all Select '2009-12-11 20:27:00',1
union all Select '2010-04-14 14:44:00',1
union all Select NULL,1
-- Considering @temp as your table
/*Here you are adding one more column with default as 0 */
insert into @Temp1
Select *,0 from @Temp
Select * from @Temp1
/*Here you are updating the default value to 1 where it is NULL */
Update @Temp1 set Types=1 where Changedate is null
/*Final Statement*/
Select Changedate,stepid,case when Types=1 then 'Current' else 'Copmleted' end as types from @Temp1
Thanks
Parthiban .L
Thanks
Parthi
November 9, 2010 at 5:55 am
Edit: supplies solution to different problem *facepalm*
November 9, 2010 at 6:13 am
declare @t table(id int identity,dt date NULL,stepid int)
insert @t
select '2008-03-03',1 union all
select '2008-03-04',1 union all
select '2008-03-14',5 union all
select null,5 union all
select '2008-03-20',6 union all
select '2008-03-21',6 union all
select '2008-04-04',20 union all
select '2008-04-08',20 union all
select null,20 union all
select '2008-04-15',1 union all
select '2008-04-18',1 union all
select '2008-06-20',1 union all
select '2008-06-22',1 union all
select '2008-06-25',1 union all
select null,1
;with cte
as
(
select id,stepid,(dense_rank()over(order by stepid,id)-id)+stepid rid from @t
)
,cte1
as
(
select c.stepid,t.dt,c.id,c.rid from cte c
inner join @t t on t.id=c.id
where t.dt is null
)
select t.id,t.dt,t.stepid,
(case when c.rid IS null then 'complete'else 'current'end) status
from cte1 c1
inner join cte c on c.rid=c1.rid
right join @t t on t.id=c.id
order by c1.id,c1.stepid
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
November 9, 2010 at 6:21 am
Parthiban, I'm not sure your solution does what is required. As far as I can tell, the OP wants *all* records in a group to show as 'current' where the last record in the group has a null date.
The following should do what you need, but comes with a warning that it creates triangular joins, which are a hidden form of RBAR (search for Jeff Moden's article on the subject on this site to understand this if you don't already).
-- create some test data
create table #tmp (
Date_Field datetime
, Int_Field int
)
-- insert some values
-- assumption1: the null row is always the last in a set
-- assumption2: dates are sequential, i.e. all dates in the second group
-- will be after the latest date in the first group, etc
insert #tmp values ('2010-11-09',1)
insert #tmp values ('2010-11-14',1)
insert #tmp values ('2010-11-19',1)
insert #tmp values ('2010-11-24',1)
insert #tmp values (NULL,1)
insert #tmp values ('2010-12-04',2)
insert #tmp values ('2010-12-09',2)
insert #tmp values ('2010-12-14',2)
insert #tmp values (NULL,2)
insert #tmp values ('2010-12-24',3)
insert #tmp values ('2010-12-29',3)
insert #tmp values ('2011-01-03',3)
insert #tmp values ('2011-01-08',3)
insert #tmp values ('2011-01-13',1)
insert #tmp values ('2011-01-18',1)
insert #tmp values ('2011-01-23',1)
insert #tmp values ('2011-01-28',1)
insert #tmp values (NULL,2)
-- add a sequential column and index
alter table #tmp add Order_Key int identity
create clustered index ix_tmp on #tmp(Order_Key)
-- offset join the table to itself to identify
-- where the group for each row starts and finishes
-- warning: these create triangular joins
; with cte as (
select a.Order_Key
, a.Date_Field
, a.Int_Field
, (select isnull(max(Order_Key)+1,1) from #tmp d
where d.Order_Key <= a.Order_Key
and d.Int_Field <> a.Int_Field
) as Bottom_Range
, (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d
where d.Order_Key >= a.Order_Key
and d.Int_Field <> a.Int_Field
) as Top_Range
from #tmp a
left join #tmp b
on a.Order_Key = b.Order_Key + 1
)
-- pull the final extract together
select *
, case when exists (select 'Null Value Found' from cte e
where e.Order_Key between cte.Bottom_Range and cte.Top_Range
and e.Date_Field is null
)
then 'Current'
else 'Completed'
end as [Type]
from cte
drop table #tmp
I couldn't come up with a way to avoids this, perhaps someone else might be able to?
Regards, Iain
November 9, 2010 at 7:11 am
Nice solution Sachin. I've run some comparative testing, which throws up some interesting results. Using the following test script, your solution runs quicker and has much lower read counts. But, when you generate a comparative execution plan, SQL estimates the rows returned by yours really badly, making it look like my solution is better (by 0% to 100%), even though it isn't. I wonder why this is?
Also, there is an issue with your solution producing too many rows, not sure why this might be? The row count in the test table is 73728, your solution returns 75828.
Test script:
/**** uses
dbcc freeproccache
dbcc dropcleanbuffers
do not run on production server
*****/
-- create some test data
create table #tmp (
Date_Field datetime
, Int_Field int
)
-- insert some values
-- assumption: the null row is always the last in a set
insert #tmp values ('2010-11-09',1)
insert #tmp values ('2010-11-14',1)
insert #tmp values ('2010-11-19',1)
insert #tmp values ('2010-11-24',1)
insert #tmp values (NULL,1)
insert #tmp values ('2010-12-04',2)
insert #tmp values ('2010-12-09',2)
insert #tmp values ('2010-12-14',2)
insert #tmp values (NULL,2)
insert #tmp values ('2010-12-24',3)
insert #tmp values ('2010-12-29',3)
insert #tmp values ('2011-01-03',3)
insert #tmp values ('2011-01-08',3)
insert #tmp values ('2011-01-13',1)
insert #tmp values ('2011-01-18',1)
insert #tmp values ('2011-01-23',1)
insert #tmp values ('2011-01-28',1)
insert #tmp values (NULL,2)
-- bump up the data counts
declare @i int = 1
while @i <=12
begin
insert #tmp
select dateadd(dd,90*@i,Date_Field)
, Int_Field
from #tmp
set @i = @i+1
end
select count(*) from #tmp
-- 73728
-- add a sequential column and index
alter table #tmp add Order_Key int identity
create clustered index ix_tmp on #tmp(Order_Key)
dbcc freeproccache
dbcc dropcleanbuffers
set statistics io on
set statistics time on
-- offset join the table to itself to identify
-- where the group for each row starts and finishes
-- warning: these create triangular joins
; with cte as (
select a.Order_Key
, a.Date_Field
, a.Int_Field
, (select isnull(max(Order_Key)+1,1) from #tmp d
where d.Order_Key <= a.Order_Key
and d.Int_Field <> a.Int_Field
) as Bottom_Range
, (select isnull(min(Order_Key)-1,(select max(Order_Key) from #tmp)) from #tmp d
where d.Order_Key >= a.Order_Key
and d.Int_Field <> a.Int_Field
) as Top_Range
from #tmp a
left join #tmp b
on a.Order_Key = b.Order_Key + 1
)
-- pull the final extract together
select *
, case when exists (select 'Null Value Found' from cte e
where e.Order_Key between cte.Bottom_Range and cte.Top_Range
and e.Date_Field is null
)
then 'Current'
else 'Completed'
end as [Type]
from cte
go
dbcc freeproccache
dbcc dropcleanbuffers
;with cte
as
(
select Order_Key,Int_Field,(dense_rank()over(order by Int_Field,Order_Key)-Order_Key)+Int_Field rid from #tmp
)
,cte1
as
(
select c.Int_Field,t.Date_Field,c.Order_Key,c.rid from cte c
inner join #tmp t on t.Order_Key=c.Order_Key
where t.Date_Field is null
)
select t.Order_Key,t.Date_Field,t.Int_Field,
(case when c.rid IS null then 'complete'else 'current'end) status
from cte1 c1
inner join cte c on c.rid=c1.rid
right join #tmp t on t.Order_Key=c.Order_Key
order by t.Order_Key,c1.Int_Field
-- 75828 rows
drop table #tmp
November 9, 2010 at 8:37 am
Sachin's solution does rely on the added IDENTITY column to define the order of all the rows in the table, including those rows with a NULL ChangeDate. If there is some other column (that the OP hasn't yet mentioned) that could be used to define this ordering then Sachin's approach could made to work, but without such a column and without further clarification from the OP we are forced to make arbitrary assumptions.
If there is a single row with a NULL ChangeDate that has the same StepId as the row with the maximum non-null ChangeDate value, then the situation is unambiguous.
ChangeDate StepId Status
----------- ------ --------
2010-05-01 1 Complete
2010-05-03 1 Complete
2010-05-08 1 Complete
2010-06-02 2 Current
2010-06-07 2 Current
NULL 2 Current
Now suppose we add another row with StepId = 1 and a NULL ChangeDate.
ChangeDate StepId
----------- --------
NULL 1
Since the ChangeDate column is NULL, there is no unambiguous ordering for this row. Is it part of the original group with StepId = 1 like this...?
ChangeDate StepId Status
----------- ------ --------
2010-05-01 1 Current
2010-05-03 1 Current
2010-05-08 1 Current
NULL 1 Current
2010-06-02 2 Current
2010-06-07 2 Current
NULL 2 Current
Or is it the first row of a new group, like this...?
ChangeDate StepId Status
----------- ------ --------
2010-05-01 1 Complete
2010-05-03 1 Complete
2010-05-08 1 Complete
2010-06-02 2 Current
2010-06-07 2 Current
NULL 2 Current
NULL 1 Current
November 9, 2010 at 9:08 am
Sorry for delay in reply and THANK YOU all for the solutions you provided here. Your help is very much appreciated.
Solutions provided by parthi-1705, sachin-355820, irobertson - is what I was looking for. I have some other columns in this table such as identity column and few other. But none of these column can be used for grouping.
@andrewd.smith : There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information. But the solution provided here works for me.
I am going to check performance with actual data. Thank you again !! This is why I like this forum - people are ready with differnt solutions much before you clarify the problem.
November 9, 2010 at 9:35 am
rajg (11/9/2010)
There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.
This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉
Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...
Regards, Iain
Edit: typo...
November 9, 2010 at 11:10 am
rajg (11/9/2010)
Hi,I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj
Do you have anything besides the StepID to put this data into "groups"? As it is, the StepID can repeat, so the definition of a group is that the StepID changes from it's previous value. If there are some other fields that will provide this grouping, I'm sure that there is an extremely efficient, fast method to solve this issue.
Did you see how Iain provided sample data in a readily consumable format? If you would do this, it would make it A LOT easier on all of the volunteers on this site. Please read the first link in my signature for how to do this. You should have (at the minimum), the CREATE TABLE statement(s) for the applicable table(s), and INSERT statement(s) for each table to put in some sample data, and expected results based upon the sample data provided. Indexes and constraints frequently prove to be very useful also!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 15, 2010 at 1:14 am
irobertson (11/9/2010)
rajg (11/9/2010)
There are two columns that can be used for ordering here - identity column and NextChangeDate. Sorry, I didnt mention this earlier. I didnt thought solution will require this information.This is one of the reasons it's best to post DDL and sample data that looks as much like your problem as possible. Often, it's the least obvious thing that is key to solving the problem 😉
Glad we could be of help, I'd be interested in how the solutions work out in your environment. Please let us know...
Regards, Iain
Edit: typo...
I'd be interetested in knowing, as well. One "out of place" date (ie: parallel steps) and BOOM!... all the solutions shatter like glass because they rely only on a sequential date or the supposed implicit order of data. Not good. Now, if that's all there is, then all is well. But, like has been asked many times, I'd really be interested if there are any other columns of data associated with this problem so we can make the solutions a bit more shatter-proof.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 15, 2010 at 1:16 am
rajg (11/9/2010)
Hi,I have a table with two columns i.e. ChangeDate and StepId. I need a third column 'status' at run time based on the data in this table.
Records can be grouped based on the stepId. But there can be multiple groups for same stepId. For any group, if the last ChangeDate value is null, then all the rows from that group will have status 'current' . If last ChangeDate value is not null, then all the rows from that group will have status 'completed'
Please check attached screenshot for more details. Can somebody please tell what could be the optimized and efficient TSQL to get the output ?
Thanks,
Bhimraj
Pictures are pretty but they don't do a whole lot for helping people help you. Please take a look at the article located in my signature line below. THAT's the way to post data to get the best help.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 15, 2010 at 1:23 am
I take that back. Sachin's looks like it'll work just fine (classic double rownumber) but is guaranteed only if the original insert were to have an ORDER BY in this case.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply