October 6, 2012 at 10:22 pm
Hi experts,
I have a scenario to fill in the GAPS between the dates with previousdate+1 day.
here is the table DDL ,sample data and expected output
CREATE TABLE #SAMPLETABLE
(
DATECOL DATETIME,
WEIGHTS float
)
INSERT INTO #SAMPLETABLE
SELECT '08/09/2012',8.2 UNION ALL
SELECT '08/10/2012',9.4 UNION ALL
SELECT '08/14/2012',10 UNION ALL
SELECT '08/15/2012',9.6 UNION ALL
SELECT '08/16/2012',9.3 UNION ALL
SELECT '08/19/2012',9.7
SELECT *
FROM #SAMPLETABLE
ORDER BY DATECOL
DATECOLWEIGHTS
2012-08-09 00:00:00.0008.2
2012-08-10 00:00:00.0009.4
2012-08-14 00:00:00.00010
2012-08-15 00:00:00.0009.6
2012-08-16 00:00:00.0009.3
2012-08-19 00:00:00.0009.7
What i need is to fill in the GAPS between the dates with previousdate+1 day and weights is same value as previous record values.
-- Expected OutPut
2012-08-09 00:00:00.0008.2
2012-08-10 00:00:00.0009.4
2012-08-11 00:00:00.0009.4
2012-08-12 00:00:00.0009.4
2012-08-13 00:00:00.0009.4
2012-08-14 00:00:00.00010
2012-08-15 00:00:00.0009.6
2012-08-16 00:00:00.0009.3
2012-08-17 00:00:00.0009.3
2012-08-18 00:00:00.0009.3
2012-08-19 00:00:00.0009.7
Please help me.
Thanks,
October 7, 2012 at 3:05 am
Try this
WITH Range(MINDATE,TotalDays) AS (
SELECT MIN(DATECOL),
DATEDIFF(Day,MIN(DATECOL),MAX(DATECOL))
FROM #SAMPLETABLE),
CTE AS (
SELECT n.number+1 AS number,
ROW_NUMBER() OVER(PARTITION BY CASE WHEN s.DATECOL IS NULL THEN 1 END ORDER BY n.number) AS rn,
DATEADD(Day,n.number,r.MINDATE) AS DATECOL,
s.WEIGHTS
FROM master.dbo.spt_values n
INNER JOIN Range r ON n.number BETWEEN 0 AND r.TotalDays
LEFT OUTER JOIN #SAMPLETABLE s ON s.DATECOL = DATEADD(Day,n.number,r.MINDATE)
WHERE n.type='p')
SELECT a.DATECOL,
COALESCE(b.WEIGHTS,a.WEIGHTS) AS WEIGHTS
FROM CTE a
LEFT OUTER JOIN CTE b ON a.WEIGHTS IS NULL
AND b.WEIGHTS IS NOT NULL
AND b.rn = a.number - a.rn
ORDER BY a.number;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537October 7, 2012 at 8:32 am
Careful, Mark. I don't know if the optimizer would make better use of a different plan for a larger number of rows but, right now, the execution plan has not one but two accidental cross-joins in it that produce 66 rows each (6 from the original data * 11 for all the desired dates).
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2012 at 7:29 pm
I don't think this has the same issue that Jeff is reporting about Mark's but it's just another option:
;WITH Tally AS (
SELECT n=number
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND Number BETWEEN 1 AND 100),
MyData AS (
SELECT DATECOL, WEIGHTS
,rn=ROW_NUMBER() OVER (ORDER BY DATECOL)
FROM #SAMPLETABLE
)
SELECT DATECOL=CASE WHEN c.DATECOL IS NULL THEN a.DATECOL ELSE c.DATECOL END
, a.WEIGHTS
FROM MyData a
OUTER APPLY (
SELECT TOP 1 DATECOL, WEIGHTS
FROM MyData b
WHERE b.rn = 1 + a.rn) b
OUTER APPLY (
SELECT DATEADD(day, n-1, a.DATECOL)
FROM Tally
WHERE DATEADD(day, n, a.DATECOL) BETWEEN a.DATECOL AND b.DATECOL) c(DATECOL)
Note: Should work for any gaps of 100 days or less.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 7, 2012 at 8:07 pm
dwain.c (10/7/2012)
I don't think this has the same issue that Jeff is reporting about Mark's but it's just another option:
Gosh. Good bit of code, Dwain, but it also has a full cross join in it according to the actual execution plan. Again, I don't know if these things will "convert" to something else if the row count gets bigger or not. Guess I'll have to give it a try.
I'm really interested in this problem because I normally use the Quirky Update for "data smears" like this and I'd love to see a different method that didn't end up with a cross join in it.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2012 at 8:18 pm
Jeff Moden (10/7/2012)
dwain.c (10/7/2012)
I don't think this has the same issue that Jeff is reporting about Mark's but it's just another option:Gosh. Good bit of code, Dwain, but it also has a full cross join in it according to the actual execution plan. Again, I don't know if these things will "convert" to something else if the row count gets bigger or not. Guess I'll have to give it a try.
I'm really interested in this problem because I normally use the Quirky Update for "data smears" like this and I'd love to see a different method that didn't end up with a cross join in it.
Hmmm. I looked for that but it must have escaped my notice.
I did have a recursive CTE solution (gone now) but I expected it would be a dog so didn't post it.
I thought about a QU but didn't quite figure how it could be used to create additional records. Perhaps a Quirky Merge?
In any event, I am most curious what you come up with...
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 7, 2012 at 8:34 pm
I haven't come up with anything yet. I've been watchig you guys with great interest because even the previous row stuff in 2012 won't solve this problem. I've also confirmed that even larger number of rows still maintain the full cross join. If you add a unique clustered index to DATECOL, it cuts it down to triangular joins but 1000 dates still creates more tha 500,000 internal rows spinning off of the source table.
I believe even a "counting" rCTE would beat the cross joins that have occurred so far.
Here's the data generator that I've been using for this problem if you're interested.
CREATE TABLE #SAMPLETABLE
(
DATECOL DATETIME,
WEIGHTS float
)
INSERT INTO #SAMPLETABLE
SELECT TOP 1000
DateCol = DATEADD(mm,(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1),'1900'),
Weights = RAND(CHECKSUM(NEWID()))*10
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CREATE UNIQUE CLUSTERED INDEX ByDate ON #SampleTable (DateCol)
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2012 at 8:41 pm
Here is the another solution ;
;WITH DigitsCTE AS
(
SELECT digit
FROM (VALUES(0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS D(digit)
)
, AllDatesCTE AS
(
SELECT DATEADD(DAY, N.number - 1, T.min_date) AS date
FROM (SELECT MIN(T.DATECOL) AS min_date, MAX(T.DATECOL) AS max_date
FROM #SAMPLETABLE AS T) AS T
CROSS APPLY
(SELECT TOP(DATEDIFF(DAY, T.min_date, T.max_date) + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM DigitsCTE AS D0, DigitsCTE AS D1, DigitsCTE AS D2,
DigitsCTE AS D3, DigitsCTE AS D4, DigitsCTE AS D5) AS N
)
SELECT CONVERT(varchar(20), N.date , 101) AS DATECOL, T.WEIGHTS
FROM AllDatesCTE AS N
CROSS APPLY
(SELECT TOP(1) DATECOL, WEIGHTS
FROM #SAMPLETABLE AS T
WHERE T.DATECOL <= N.date
ORDER BY DATECOL DESC) AS T
ORDER BY 1 ASC
October 7, 2012 at 8:54 pm
Nagaram (10/7/2012)
Here is the another solution ;
;WITH DigitsCTE AS
(
SELECT digit
FROM (VALUES(0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS D(digit)
)
, AllDatesCTE AS
(
SELECT DATEADD(DAY, N.number - 1, T.min_date) AS date
FROM (SELECT MIN(T.DATECOL) AS min_date, MAX(T.DATECOL) AS max_date
FROM #SAMPLETABLE AS T) AS T
CROSS APPLY
(SELECT TOP(DATEDIFF(DAY, T.min_date, T.max_date) + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM DigitsCTE AS D0, DigitsCTE AS D1, DigitsCTE AS D2,
DigitsCTE AS D3, DigitsCTE AS D4, DigitsCTE AS D5) AS N
)
SELECT CONVERT(varchar(20), N.date , 101) AS DATECOL, T.WEIGHTS
FROM AllDatesCTE AS N
CROSS APPLY
(SELECT TOP(1) DATECOL, WEIGHTS
FROM #SAMPLETABLE AS T
WHERE T.DATECOL <= N.date
ORDER BY DATECOL DESC) AS T
ORDER BY 1 ASC
By jove, you've got it. All I had to do on the larger example to make it work was to remove the convert on N.date so that it would sort correctly.
Also be advised that ORDER BY ORDINAL has been deprecated.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2012 at 9:00 pm
thanks jeff for your valuable advice .
I should change the order by n.date final select as below :
SELECT CONVERT(varchar(20), N.date , 101) AS DATECOL, T.WEIGHTS
FROM AllDatesCTE AS N
CROSS APPLY
(SELECT TOP(1) DATECOL, WEIGHTS
FROM #SAMPLETABLE AS T
WHERE T.DATECOL <= N.date
ORDER BY DATECOL DESC) AS T
ORDER BY N.date ASC
October 7, 2012 at 9:34 pm
I find it extremely annoying that this does not work:
DECLARE @Weight FLOAT = 0
,@STDate DATETIME
,@EDate DATETIME
SELECT @STDate=MIN(DATECOL), @EDate=MAX(DATECOL) FROM #SAMPLETABLE
;WITH Tally AS (
SELECT TOP (1+DATEDIFF(day, @STDate, @EDate)) n=number-1
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND number BETWEEN 1 AND 100)
MERGE #SAMPLETABLE t
USING Tally s
ON t.DATECOL = DATEADD(day, n, @STDate)
WHEN MATCHED THEN
UPDATE SET @Weight = WEIGHTS
WHEN NOT MATCHED THEN
INSERT (DATECOL, WEIGHTS)
VALUES (DATEADD(day, n, @STDate), @Weight);
SELECT *
FROM #SAMPLETABLE
ORDER BY DATECOL
DROP TABLE #SAMPLETABLE
When BOL (http://technet.microsoft.com/en-us/library/bb510625.aspx) says that you should be able to SET assign to a local variable.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 7, 2012 at 10:05 pm
Jeff Moden (10/7/2012)
Nagaram (10/7/2012)
Here is the another solution ;
;WITH DigitsCTE AS
(
SELECT digit
FROM (VALUES(0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS D(digit)
)
, AllDatesCTE AS
(
SELECT DATEADD(DAY, N.number - 1, T.min_date) AS date
FROM (SELECT MIN(T.DATECOL) AS min_date, MAX(T.DATECOL) AS max_date
FROM #SAMPLETABLE AS T) AS T
CROSS APPLY
(SELECT TOP(DATEDIFF(DAY, T.min_date, T.max_date) + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM DigitsCTE AS D0, DigitsCTE AS D1, DigitsCTE AS D2,
DigitsCTE AS D3, DigitsCTE AS D4, DigitsCTE AS D5) AS N
)
SELECT CONVERT(varchar(20), N.date , 101) AS DATECOL, T.WEIGHTS
FROM AllDatesCTE AS N
CROSS APPLY
(SELECT TOP(1) DATECOL, WEIGHTS
FROM #SAMPLETABLE AS T
WHERE T.DATECOL <= N.date
ORDER BY DATECOL DESC) AS T
ORDER BY 1 ASC
By jove, you've got it. All I had to do on the larger example to make it work was to remove the convert on N.date so that it would sort correctly.
Also be advised that ORDER BY ORDINAL has been deprecated.
Jeff - I don't think this solution works with your test harness.
But I think this one does:
CREATE TABLE #SAMPLETABLE
(
DATECOL DATETIME,
WEIGHTS float
)
INSERT INTO #SAMPLETABLE
SELECT TOP 1000
DateCol = DATEADD(mm,(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1),'1900'),
Weights = RAND(CHECKSUM(NEWID()))*10
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CREATE UNIQUE CLUSTERED INDEX ByDate ON #SampleTable (DateCol)
DECLARE @StartDT DATETIME = GETDATE()
CREATE TABLE #Temp
(
DATECOL DATETIME,
WEIGHTS float
)
CREATE UNIQUE CLUSTERED INDEX ByDate ON #Temp (DateCol)
;WITH DigitsCTE AS
(
SELECT digit
FROM (VALUES(0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) AS D(digit)
)
, AllDatesCTE AS
(
SELECT DATEADD(DAY, N.number - 1, T.min_date) AS date
FROM (SELECT MIN(T.DATECOL) AS min_date, MAX(T.DATECOL) AS max_date
FROM #SAMPLETABLE AS T) AS T
CROSS APPLY
(SELECT TOP(DATEDIFF(DAY, T.min_date, T.max_date) + 1)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
FROM DigitsCTE AS D0, DigitsCTE AS D1, DigitsCTE AS D2,
DigitsCTE AS D3, DigitsCTE AS D4, DigitsCTE AS D5) AS N
)
SELECT CONVERT(varchar(20), N.date , 101) AS DATECOL, T.WEIGHTS
FROM AllDatesCTE AS N
CROSS APPLY
(SELECT TOP(1) DATECOL, WEIGHTS
FROM #SAMPLETABLE AS T
WHERE T.DATECOL <= N.date
ORDER BY DATECOL DESC) AS T
ORDER BY 1 ASC
SELECT StartDate=@StartDT, EndDate=GETDATE(), ElapsedMS=DATEDIFF(ms, @StartDT, GETDATE())
SELECT @StartDT = GETDATE()
DECLARE @Weight FLOAT = 0
,@STDate DATETIME
,@EDate DATETIME
SELECT @STDate=MIN(DATECOL), @EDate=MAX(DATECOL) FROM #SAMPLETABLE
;WITH Tally (n) AS (
SELECT TOP (1+DATEDIFF(day, @STDate, @EDate))
DATEADD(day, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1, @STDate)
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
INSERT INTO #Temp
SELECT DATECOL=n, WEIGHTS
FROM Tally
LEFT OUTER JOIN #SAMPLETABLE ON n = DATECOL
UPDATE t
SET @Weight = WEIGHTS = CASE WHEN WEIGHTS IS NULL THEN @Weight ELSE WEIGHTS END
FROM #Temp t
SELECT * FROM #Temp
SELECT StartDate=@StartDT, EndDate=GETDATE(), ElapsedMS=DATEDIFF(ms, @StartDT, GETDATE())
DROP TABLE #SAMPLETABLE, #Temp
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 8, 2012 at 12:41 am
Hi Dwain, I came up with this last night but ran out of time to post;
;WITH WholeRange AS (
SELECT datecol = DATEADD(day,n,d.Startdate)
FROM (SELECT Startdate = MIN(datecol), daysSpan = 1+DATEDIFF(day,MIN(datecol), MAX(datecol)) FROM #SAMPLETABLE) d
CROSS APPLY (SELECT TOP(daysSpan) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 FROM sys.columns a, sys.columns b) x
)
SELECT * --left like this for testing
FROM WholeRange w
CROSS APPLY (
SELECT TOP 1 s.*
FROM #SAMPLETABLE s
WHERE s.datecol <= w.datecol
ORDER BY s.datecol DESC
) x
You'll notice it's exactly the same as Nagaram's query - except for the number generator.
I can't find anything wrong with it?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 8, 2012 at 1:10 am
ChrisM@Work (10/8/2012)
Hi Dwain, I came up with this last night but ran out of time to post;
;WITH WholeRange AS (
SELECT datecol = DATEADD(day,n,d.Startdate)
FROM (SELECT Startdate = MIN(datecol), daysSpan = 1+DATEDIFF(day,MIN(datecol), MAX(datecol)) FROM #SAMPLETABLE) d
CROSS APPLY (SELECT TOP(daysSpan) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 FROM sys.columns a, sys.columns b) x
)
SELECT * --left like this for testing
FROM WholeRange w
CROSS APPLY (
SELECT TOP 1 s.*
FROM #SAMPLETABLE s
WHERE s.datecol <= w.datecol
ORDER BY s.datecol DESC
) x
You'll notice it's exactly the same as Nagaram's query - except for the number generator.
I can't find anything wrong with it?
My mistake! I didn't take into account that row ordering of the OP's query was different. Correct number of rows is returned by his, yours and my Quirky Update.
As usual, yours seems to be the fastest. :angry:
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 8, 2012 at 1:29 am
dwain.c (10/8/2012)
ChrisM@Work (10/8/2012)
Hi Dwain, I came up with this last night but ran out of time to post;
;WITH WholeRange AS (
SELECT datecol = DATEADD(day,n,d.Startdate)
FROM (SELECT Startdate = MIN(datecol), daysSpan = 1+DATEDIFF(day,MIN(datecol), MAX(datecol)) FROM #SAMPLETABLE) d
CROSS APPLY (SELECT TOP(daysSpan) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 FROM sys.columns a, sys.columns b) x
)
SELECT * --left like this for testing
FROM WholeRange w
CROSS APPLY (
SELECT TOP 1 s.*
FROM #SAMPLETABLE s
WHERE s.datecol <= w.datecol
ORDER BY s.datecol DESC
) x
You'll notice it's exactly the same as Nagaram's query - except for the number generator.
I can't find anything wrong with it?
My mistake! I didn't take into account that row ordering of the OP's query was different. Correct number of rows is returned by his, yours and my Quirky Update.
As usual, yours seems to be the fastest. :angry:
Nagaram's is the fastest so far - it's a good 20% faster than mine. I'm having a look to find out why - probably the tally-table generation. Yours does an update (and very quickly too) so it's a bit "oranges and apples".
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 41 total)
You must be logged in to reply to this topic. Login to reply