October 4, 2011 at 1:16 pm
Hi all,
Kinda stuck on finding a way to optimise this solution.
Let's say I have a table which contains a bunch of dates, as well as another table which has a bunch of dates and some values:
CREATE TABLE #Records
(
[Date] DATETIME,
[JoinValue] INT
)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-01', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-02', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-03', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-04', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-05', 1)
CREATE TABLE #MoreRecords
(
[Date] DATETIME,
[JoinValue] INT,
[Value] INT
)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-01', 1, NULL)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-02', 1, NULL)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-03', 1, 1)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-04', 1, 2)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-05', 1, 3)
What I want to do is, for each date in #Records, find the most recent value from #MoreRecords which is not null, but is not more recent than the date in #Records.
Bit of a convoluted question I know, so I'll post what I have so far:
SELECT *,
(
SELECT TOP 1 [Value]
FROM #MoreRecords
WHERE#MoreRecords.JoinValue = #Records.JoinValue
AND #MoreRecords.Date <= #Records.Date
AND #MoreRecords.[Value] IS NOT NULL
ORDER BY #MoreRecords.Date DESC
) AS [Value]
FROM #Records
DateJoinValueValue
2011-10-01 00:00:00.0001NULL
2011-10-02 00:00:00.0001NULL
2011-10-03 00:00:00.00011
2011-10-04 00:00:00.00012
2011-10-05 00:00:00.00013
So as you can see from the data, until the 3rd of october, there are no non-null values in #MoreRecords. After the 3rd, they begin to show up.
The problem with this query is just that it is *really* slow. I have a table of about 500000 records. Now, if I have to, I'll just brute force it - leave it running overnight and hopefully itt'l be done when I get in the next day. But I'd like a cleaner solution - at worst case itt'l give me a solution if I run in to the problem again.
Anyone have any suggestions?
October 4, 2011 at 1:54 pm
Another possible solution:
WITH cte AS
(
SELECT
#Records.Date,
#MoreRecords.[Value],
ROW_NUMBER() OVER (PARTITION BY #Records.JoinValue, #Records.Date ORDER BY #MoreRecords.[Value] DESC) AS rowNum
FROM #Records
LEFT JOIN #MoreRecords ON#MoreRecords.JoinValue = #Records.JoinValue
AND #MoreRecords.Date <= #Records.Date
AND #MoreRecords.[Value] IS NOT NULL
)
SELECT [Date], [Value] FROM cte WHERE rowNum = 1
Haven't been able to test it though, on real data, since there's other queries running ... gonna try tomorrow and see what I can get.
October 4, 2011 at 3:36 pm
For performance issues, it really helps to have the actual and/or estimated execution plans.
One thing you can try on your own is turning on XML statistics before the rest of your query
SET STATISTICS XML ON
and reviewing that for "missingindexes".
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 5, 2011 at 7:45 am
I'd already checked that part, but I can post the execution logs for you to see if you can find an improvement. The first version took 45 minutes to run. Going to run the second one now and see how long it takes.
October 5, 2011 at 10:35 am
In the end I ended up solving my problem by just investigating it a bit more and discovering it wasn't quite as extreme as the question that I'm asking in this thread. However, I'm still curious if there is a better solution than either of the two I've proposed here, 'cause neither of those is very elegant.
October 5, 2011 at 11:23 am
I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 13, 2011 at 10:02 am
How about using RANK() OVER?
See how this performs:
WITH FullResults (RecordsDate, RecordsJoinValue, MoreRecordsDate, MoreRecordsJoinValue,MoreRecordsValue, MoreRecordsRank)
AS
(
SELECT #Records.[Date] AS RecordsDate
, #Records.JoinValue AS RecordsJoinValue
, #MoreRecords.[Date] AS MoreRecordsDate
, #MoreRecords.JoinValue AS MoreRecordsJoinValue
, #MoreRecords.Value AS MoreRecordsValue
, RANK() OVER (PARTITION BY #Records.[Date], #Records.JoinValue ORDER BY #MoreRecords.[Date] DESC) AS MoreRecordsRank
FROM #Records
INNER JOIN #MoreRecords ON #MoreRecords.JoinValue = #Records.JoinValue
AND #MoreRecords.[Date] <= #Records.[Date]
AND #MoreRecords.[Value] IS NOT NULL
)
SELECT * FROM FullResults WHERE MoreRecordsRank = 1
October 13, 2011 at 10:06 am
drew.allen (10/5/2011)
I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.Drew
Post your code? You may find this solution works better. Always worth investigating.
October 13, 2011 at 11:32 am
Lynn Pettis (10/13/2011)
Post your code? You may find this solution works better. Always worth investigating.
I didn't save the code.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 14, 2011 at 7:14 pm
select *
from #Records A
left outer join
(select join_value, max(value)
from #Records M
left outer join #MoreRecords N on M.join_value = N.join_value
where M.date >= N.date
group by join_value
) R on A.join_value = R.join_value
Hope this helps
October 14, 2011 at 9:07 pm
I had posted the incomplete query by mistake.
sorry for the confusion, here is the query that I wanted to post.
select A.*, B.Value
from #Records A
left outer join (select M.joinvalue, M.date, Max(N.date) as Recent_date
from #Records M
left outer join #MoreRecords N on M.joinvalue = N.joinvalue
where M.date >= N.date
and N.value is not null
group by M.joinvalue
,M.date
) R on A.joinvalue = R.joinvalue and A.date = R.date
left outer join #MoreRecords B on R.joinvalue = B.joinvalue and R.Recent_date = B.date
Hope this helps
October 14, 2011 at 11:21 pm
Next time a mistake like that happens, it's OK to just edit the post. 🙂
Also, it's much easier to read a query when it's posted inside a sql code block. Like this, without the space between the first bracket and the word "code"
[ code="sql"]SELECT 'Hello World' [/code].
October 15, 2011 at 1:09 pm
I am new to this forum. I didn't know the post can be edited. thanks for the tip.
There was indentation in the query before I posted. after it was posted, all the indentation is gone. the block, would that preserve the indentation of the original query after posting.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply