December 6, 2012 at 5:36 pm
I'm trying to update gaps in a table using a set based approach, but have been struggling with the solution. I've read a few gaps and islands posts in this forum, and was hopeful that I could figure it out, but I'm at a loss. Here's the sql to create the table and populate it with data.
CREATE TABLE dbo.UpdateGaps(
MachineID int NOT NULL,
RecordedDate datetime NOT NULL,
Pressure float NULL,
CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)
)
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);
GO
What I need to do is fill in the zero values with the value that precedes it (grouping by MachineID). Non-zero values do not change.
MachineIDRecordedDatePressure
-----------------------------
60 2012-10-020<-- no change (nothing precedes this record)
60 2012-10-03300<-- no change
60 2012-10-04300<-- no change
60 2012-10-050<-- update to 300
60 2012-10-060<-- update to 300
60 2012-10-070<-- update to 300
60 2012-10-08330<-- no change
60 2012-10-090<-- update to 330
60 2012-10-10300<-- no change
97 2012-10-02380<-- no change (new MachineID)
97 2012-10-030<-- update to 380
97 2012-10-040<-- update to 380
97 2012-10-05350<-- no change
97 2012-10-06350<-- no change
97 2012-10-070<-- update to 350
97 2012-10-080<-- update to 350
97 2012-10-09400<-- no change
97 2012-10-100<-- update to 400
I read this yesterday, http://www.manning.com/nielsen/nielsenMEAP_freeCh5.pdf, which gave me hope, but I still can't figure out how to update the table without using a cursor. Using the information in the PDF, I created the query below which gives a date range for the gaps. This is where I was hopeful that I could figure it out, but I still keep falling back to a cursor based approach.
WITH C AS
(
SELECT MachineID, RecordedDate, Pressure,
ROW_NUMBER() OVER(ORDER BY MachineID, RecordedDate)
- ROW_NUMBER() OVER(ORDER BY Pressure,MachineID,RecordedDate) AS grp
FROM dbo.UpdateGaps
)
SELECT MachineID, MIN(RecordedDate) AS mn, MAX(RecordedDate) AS mx, Pressure
FROM C
GROUP BY MachineID, Pressure, grp
ORDER BY MachineID, mn;
If anyone has done this sort of thing before, I would greatly appreciate any help you have to offer. And thank you, capnhector, for linking to the PDF in the thread, "Solve Problems Using Recursive CTE". That's a great source of information on gaps and islands. Thanks to Itzik Ben-Gan for writing it.
December 6, 2012 at 5:50 pm
You can try a quirky update (QU).
DECLARE @MachineID INT = 0
,@Pressure FLOAT = 0
UPDATE u WITH(TABLOCKX)
SET Pressure = CASE WHEN Pressure = 0
THEN @Pressure ELSE Pressure END
,@Pressure = CASE WHEN Pressure = 0 AND @MachineID = MachineID
THEN @Pressure ELSE Pressure END
,@MachineID = MachineID
FROM dbo.UpdateGaps u
OPTION(MAXDOP 1)
Edit: Ooops! Initially forgot to account for change in Machine ID.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 6, 2012 at 6:34 pm
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.
I don't believe the following will be quite as fast as a Quirky Update, but it'll blow the doors off a cursor especially in the face of the correct index. And thank you VERY much for posting such clear requirements and readily consumable data!
UPDATE ug
SET Pressure = ca.Pressure
FROM dbo.UpdateGaps ug
CROSS APPLY (SELECT TOP 1 Pressure
FROM dbo.UpdateGaps ugca
WHERE ugca.MachineID = ug.MachineID
AND ugca.RecordedDate <= ug.RecordedDate
AND ugca.Pressure > 0
ORDER BY ugca.MachineID,ugca.RecordedDate DESC) ca
;
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2012 at 6:44 pm
Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...
Easy for you to say...
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 6, 2012 at 6:45 pm
with updatebase_cte (MachineID,RecordedDate,Pressure,ind) as (
select s.MachineID,s.RecordedDate,s.Pressure, 0 as ind
from yourtable s
where s.RecordedDate='2012-10-02'
union all
select s.MachineID,s.RecordedDate,
case when s.Pressure = 0 and sc.Pressure <> 0 then sc.Pressure else s.Pressure end as Pressure,
case when s.Pressure = 0 and sc.Pressure <> 0 then 1 else 0 end as ind
from yourtable s
inner join updatebase_cte sc
on (s.RecordedDate = dateadd(day,1,sc.RecordedDate) and s.MachineID=sc.MachineID)
)
-- this will return data need be updated part, you can use it to update by a join.
select * from updatebase_cte where ind=1
Just a quick coding, have not got time to test. Does this work?:-)
December 6, 2012 at 8:33 pm
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!
Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.
You two were quick too!
Thanks again!
December 6, 2012 at 8:45 pm
You're welcome!
I note that I neglected to mention mine only works because you happened to already have the clustered index needed to order the QU.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 6, 2012 at 9:56 pm
Here is another possibility. Let me say before hand that Jeff Moden's solution is better than this for the specific task you asked for; (on my laptop, my solution is 85% of the cost, whereas jeff's solution is 15% of the cost)
But in case you want to assign sequential ids to each group and sequential ids to rows within a group, you can consider this approach; With this approach, you can answer, for example, what is the average pressure of the 15th group after 10/24/2012.
; WITH R (MachineId, Pressure, RecordedDate, Rid) AS
(
SELECT MachineId, Pressure, RecordedDate,
ROW_NUMBER() OVER(PARTITION BY MachineId ORDER BY MachineId, RecordedDate) AS Rid
FROM dbo.UpdateGaps
),
GRPROW(MachineId, Pressure, RecordedDate, GroupId, RowId) AS
(
SELECT G1.MachineId, G1.Pressure, G1.RecordedDate,
DENSE_RANK() OVER (PARTITION BY G1.MachineId ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid)) AS GroupId,
ROW_NUMBER() OVER (PARTITION BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid) ORDER BY G1.MachineId, ISNULL(MIN(G2.Rid) - 1, G1.Rid), G1.RecordedDate DESC) AS RowId
FROM R G1
LEFT JOIN R G2 ON (G1.MachineID = G2.MachineID
AND G2.Rid > G1.Rid
AND G1.Pressure <> G2.Pressure)
GROUP BY G1.MachineId, G1.Pressure, G1.RecordedDate, G1.Rid
)
SELECT G1.MachineId, G1.Pressure, G1.RecordedDate, G1.GroupId, G1.RowId AS DescendingRowId,
CASE WHEN ISNULL(G2.Pressure, 0) = 0 THEN G1.Pressure ELSE G2.Pressure END AS NewPressure
FROM GRPROW G1
LEFT JOIN GRPROW G2 ON (G1.MachineId = G2.MachineId
AND G1.GroupId - 1 = G2.GroupId
AND G2.RowId = 1)
ORDER BY G1.MachineId, G1.RecordedDate
December 6, 2012 at 10:44 pm
Here is another solution to the problem:
CREATE TABLE dbo.UpdateGaps(
MachineID int NOT NULL,
RecordedDate datetime NOT NULL,
Pressure float NULL,
CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)
)
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);
GO
with Base0Data as (
select
MachineID,
RecordedDate,
Pressure,
rn = row_number() over (partition by MachineID, case when Pressure = 0 then 0 else 1 end order by RecordedDate)
from
dbo.UpdateGaps
)
,BaseData as (
select
MachineID,
RecordedDate,
Pressure,
GrpDate = dateadd(dd,-rn,RecordedDate),
rn,
rn1 = row_number() over (partition by MachineID, dateadd(dd,-rn,RecordedDate) order by dateadd(dd,-rn,RecordedDate))
from
Base0Data
)
--select * from BaseData where Pressure = 0
select
bd1.MachineID,
bd1.RecordedDate,
Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end
from
BaseData bd1
left outer join BaseData bd2
on (bd1.MachineID = bd2.MachineID
and dateadd(dd, -bd1.rn1, bd1.RecordedDate) = bd2.RecordedDate)
order by
bd1.MachineID,
bd1.RecordedDate;
go
drop table dbo.UpdateGaps;
go
December 6, 2012 at 11:23 pm
dwain.c (12/6/2012)
Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...Easy for you to say...
Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2012 at 11:27 pm
oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.
You two were quick too!
Thanks again!
The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2012 at 12:29 am
Jeff Moden (12/6/2012)
oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.
You two were quick too!
Thanks again!
The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.
+1 to that!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 7, 2012 at 2:12 am
Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.
I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!
December 7, 2012 at 8:42 am
Jeff Moden (12/6/2012)
dwain.c (12/6/2012)
Jeff Moden (12/6/2012)
I don't believe the following won't be quite as fast as a Quirky Update...Easy for you to say...
Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. 😛 I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up. :hehe:
Ha! I don't care how you speak as long as you're fluent in T-SQL. :laugh:
Thanks to everyone who replied. I've added a few new tools to my toolbox.
December 7, 2012 at 12:14 pm
Cadavre (12/7/2012)
Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past 🙂 ). Now you have me wondering if I picked up the term from you!
Dunno... I've been using the term on these forums for a long time. Might simply be that great minds think alike. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply