January 30, 2014 at 6:04 am
Hi,
A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks
January 30, 2014 at 6:18 pm
I'll take that as a no :hehe:
Anyway, here's what I'm generally trying to do:
CREATE TABLE testsql (
cid integer,
ctime datetime,
cgm integer);
INSERT INTO testsql VALUES
('1', '2012-01-01 07:00', '1'),
('1', '2012-01-01 07:01', '1'),
('1', '2012-01-01 07:02', '1'),
('1', '2012-01-01 07:03', NULL),
('1', '2012-01-01 07:04', NULL),
('1', '2012-01-01 07:05', '5'),
('1', '2012-01-01 07:06', '5'),
('1', '2012-01-01 07:07', '5'),
('1', '2012-01-01 07:08', NULL),
('1', '2012-01-01 07:09', NULL),
('1', '2012-01-01 07:10', NULL),
('1', '2012-01-01 07:11', NULL),
('2', '2013-05-02 07:12', '24'),
('2', '2013-05-02 07:13', '24'),
('2', '2013-05-02 07:14', '24'),
('2', '2013-05-02 07:15', NULL),
('2', '2013-05-02 07:16', NULL),
('2', '2013-05-02 07:17', '44'),
('2', '2013-05-02 07:18', '44'),
('2', '2013-05-02 07:19', NULL),
('2', '2013-05-02 07:20', '4'),
('2', '2013-05-02 07:21', '4'),
('2', '2013-05-02 07:22', '4');
And here's the query
WITH C (id, pt, gm, dr) as (
SELECT cid, ctime, cgm,
dense_rank() over (partition by cid, cgm order by cid, ctime) as dr
FROM testsql
)
SELECT id, pt, gm,
CASE WHEN gm IS NULL THEN
LAG(gm, cast(c.DR as int)) OVER (PARTITION BY id ORDER BY id, pt)
ELSE gm END as gm2
FROM C
ORDER BY id, pt, gm
There's one result towards the end that is still NULL, rather than being filled in with a LAG value, like the rest in the CASE column are.
Any suggestions?
January 30, 2014 at 8:20 pm
Could this be the article:
Group Islands of Contiguous Dates (SQL Spackle)
By Jeff Moden, 2013/10/04 (first published: 2010/12/07)
January 31, 2014 at 9:21 am
sqldriver,
Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:
('1', '2012-01-01 07:05', '5'),
('1', '2012-01-01 07:06', '5'),
('1', '2012-01-01 07:07', '5'),
('2', '2013-05-02 07:10', '24'),
('2', '2013-05-02 07:20', '24'),
('2', '2013-05-02 07:30', '24')
Could there be gaps in the ctime values (as shown above)?
What version of SQL Server must this run under?
January 31, 2014 at 9:24 am
bitbucket-25253 (1/30/2014)
Could this be the article:Group Islands of Contiguous Dates (SQL Spackle)
By Jeff Moden, 2013/10/04 (first published: 2010/12/07)
That's a fine article, but this was a forum post. There were a number of pages of replies, but maybe it only seemed memorable to me!
January 31, 2014 at 9:27 am
Michael Meierruth (1/31/2014)
sqldriver,Could your data also contain a sequence like the following, i.e. cgm changes without one or more rows having cgm null in between:
('1', '2012-01-01 07:05', '5'),
('1', '2012-01-01 07:06', '5'),
('1', '2012-01-01 07:07', '5'),
('2', '2013-05-02 07:10', '24'),
('2', '2013-05-02 07:20', '24'),
('2', '2013-05-02 07:30', '24')
Could there be gaps in the ctime values (as shown above)?
What version of SQL Server must this run under?
Hi,
No, the issue with the data is that there are NULL gaps in the 'check in' column all the way through. I haven't seen a contiguous block. It will be running on 2012.
Thanks
January 31, 2014 at 10:44 am
OK, just saw your use of LAG which means SS20012.
But you're making completely wrong use of it.
If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.
In your final select display the dr column and you'll see this right away.
Also, no dense_rank needed here. Rank is sufficient.
January 31, 2014 at 12:37 pm
Michael Meierruth (1/31/2014)
OK, just saw your use of LAG which means SS20012.But you're making completely wrong use of it.
If you delete one of the rows with cgm='5' from your data and run your script the results become a lot worse.
In your final select display the dr column and you'll see this right away.
Also, no dense_rank needed here. Rank is sufficient.
It's not completely wrong - it's getting me all the test data except one row how I want it. Do you have another suggestion?
Switching from dense rank to rank doesn't change output, so I'm not sure what you mean.
Thanks
February 1, 2014 at 5:35 am
sqldriver,
You didn't understand my remark regarding removing one of the rows with cgm='5'.
If you do that your solution will now show even more incorrect results.
I understand what you are trying to do but unfortunately the rank values being generated make your approach invalid.
I can't see a way a to generate the rank values the way you would like them.
Thus here is my solution which takes a more classic approach in dealing with arbitrarily embedded groups.
It also will handle some strange cases.
Thus a more generic problem definition is:
within every cid group, update every group of nulls with the cgm value of the row immediately before the null group (if it exists)
Here is the script:
with
cte1 as -- change cgm nulls to 0 to avoid confusion with other types of nulls later on
(
select cid,ctime,isnull(cgm,0) cgm
from testsql
),
cte2 as -- find cid,cgm values before and after each row (-1 is returned for the first and last row)
(
select cid,ctime,cgm,
lag(cgm,1,-1) over (order by ctime) cgmbef,lead(cgm,1,-1) over (order by ctime) cgmaft,
lag(cid,1,-1) over (order by ctime) cidbef,lead(cid,1,-1) over (order by ctime) cidaft
from cte1
),
cte3 as -- find first and last element of each null group
(
select cid,ctime,cgmbef,cgmaft,
case when cid<>cidbef or cgm<>cgmbef then 1 else 0 end isfirst,
case when cid<>cidaft or cgm<>cgmaft then 1 else 0 end islast
from cte2
where cgm=0
),
cte4 as -- extract only the beginning of each group
(
select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid
from cte3
where isfirst=1
),
cte5 as -- extract only then end of each group
(
select cid,ctime,cgmbef,rank() over(partition by cid order by ctime) nullgroupid
from cte3
where islast=1
)
-- merge the two
select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef
from cte4 a
join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid
where a.cgmbef>0
And instead of the final select you could do an update (which accepts the preceeding with block):
update t
set t.cgm=u.cgmbef
from testsql t
join
(
select a.cid,a.ctime starttime,b.ctime endtime,a.cgmbef
from cte4 a
join cte5 b on a.cid=b.cid and a.nullgroupid=b.nullgroupid
where a.cgmbef>0
) u on t.cid=u.cid and t.ctime>=starttime and t.ctime<=endtime
February 1, 2014 at 11:46 am
sqldriver (1/30/2014)
Hi,A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks
Was this it?
http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
February 1, 2014 at 2:30 pm
Jeff Moden (2/1/2014)
sqldriver (1/30/2014)
Hi,A while back I recall reading a post on here where a user needed to update a column with values until there was a new value, then update with that new value until it hit the next new value, and so on.
There were some really excellent solutions, and now that I'm facing a similar problem, I'd really like to review them. But of course I can't find it.
I know it's sort of a gaps and islands problem generally.
Thanks
Was this it?
http://www.sqlservercentral.com/Forums/Topic1529984-8-1.aspx
Hi Jeff,
No, thanks though. The one I have in mind had solutions from all the big guns - I remember staring at some of the code and going cross-eyed. I'm cursing myself for not bookmarking it.
Also, Michael, I'll see how your solution adapts when I have some more time.
Thanks! :Wow:
February 1, 2014 at 3:20 pm
sqldriver (2/1/2014)
The one I have in mind had solutions from all the big guns.
BWAA-HAAA!!!! Was I there? 😛
Also, Michael, I'll see how your solution adapts when I have some more time.
Thanks! :Wow:
I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 1, 2014 at 3:33 pm
I'll save you some time...
I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.
SELECT t.cID
,t.cTime
,cGM = ca.cGM
FROM dbo.TestSQL t
CROSS APPLY (SELECT TOP 1 ta.cGM
FROM dbo.TestSQL ta
WHERE ta.cID = t.cID
AND ta.cTime <= t.cTime
AND ta.cGM IS NOT NULL
ORDER BY ta.cTime DESC) ca (cGM)
ORDER BY t.cID,t.cTime
;
In the presence of correct indexes, it should be comparable to LAG in performance.
There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 1, 2014 at 4:34 pm
Jeff Moden (2/1/2014)
sqldriver (2/1/2014)
The one I have in mind had solutions from all the big guns.BWAA-HAAA!!!! Was I there? 😛
Also, Michael, I'll see how your solution adapts when I have some more time.
Thanks! :Wow:
I strongly recommend that you take a look at the simplicity of the solution I posted on the URL I gave you.
Man, it was like you, Dwain, Sean, Lowell, Cadavre, and then Chris M posted something and Dwain was like "I'm packing up my test harness and going home."
February 1, 2014 at 4:40 pm
Jeff Moden (2/1/2014)
I'll save you some time...I don't currently have a 2K12 machine available so I can't demonstrate the LAG method (I don't like to post code I haven't tested) but the following will work quite nicely in all versions of SQL Server from 2005 and up and it takes care of those pesky edge cases.
SELECT t.cID
,t.cTime
,cGM = ca.cGM
FROM dbo.TestSQL t
CROSS APPLY (SELECT TOP 1 ta.cGM
FROM dbo.TestSQL ta
WHERE ta.cID = t.cID
AND ta.cTime <= t.cTime
AND ta.cGM IS NOT NULL
ORDER BY ta.cTime DESC) ca (cGM)
ORDER BY t.cID,t.cTime
;
In the presence of correct indexes, it should be comparable to LAG in performance.
There is a method that'll blow the doors off of even LAG but a lot of people are afraid of it.
I was messing with LAG mostly because I've been re-re-re-reading Itzik's book on 2K12 window functions and it seemed to get me mostly where I was going. Cross apply is like that girl I always see at the same bar but never talk to. I can't figure out what my mental block is when considering it in code. I've read every article on it possible.
The world really needs a Jeff Moden book.
(And do you really not have a 2K12 dev or express install to mess with?!)
EDIT! Forgot to say thanks! :blush:
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply