Finding "Max" Value from subquery

Question

Post reply

Finding "Max" Value from subquery

kramaswamy

SSCoach

Points: 18181
More actions
October 4, 2011 at 1:16 pm

#258485

Hi all,
Kinda stuck on finding a way to optimise this solution.
Let's say I have a table which contains a bunch of dates, as well as another table which has a bunch of dates and some values:
CREATE TABLE #Records
(
[Date] DATETIME,
[JoinValue] INT
)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-01', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-02', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-03', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-04', 1)
INSERT INTO #Records ([Date], [JoinValue])
VALUES('2011-10-05', 1)
CREATE TABLE #MoreRecords
(
[Date] DATETIME,
[JoinValue] INT,
[Value] INT
)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-01', 1, NULL)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-02', 1, NULL)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-03', 1, 1)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-04', 1, 2)
INSERT INTO #MoreRecords ([Date], [JoinValue], [Value])
VALUES ('2011-10-05', 1, 3)
What I want to do is, for each date in #Records, find the most recent value from #MoreRecords which is not null, but is not more recent than the date in #Records.
Bit of a convoluted question I know, so I'll post what I have so far:
SELECT *,
(
SELECT TOP 1 [Value]
FROM #MoreRecords
WHERE#MoreRecords.JoinValue = #Records.JoinValue
AND #MoreRecords.Date <= #Records.Date
AND #MoreRecords.[Value] IS NOT NULL
ORDER BY #MoreRecords.Date DESC
) AS [Value]
FROM #Records
DateJoinValueValue
2011-10-01 00:00:00.0001NULL
2011-10-02 00:00:00.0001NULL
2011-10-03 00:00:00.00011
2011-10-04 00:00:00.00012
2011-10-05 00:00:00.00013
So as you can see from the data, until the 3rd of october, there are no non-null values in #MoreRecords. After the 3rd, they begin to show up.
The problem with this query is just that it is *really* slow. I have a table of about 500000 records. Now, if I have to, I'll just brute force it - leave it running overnight and hopefully itt'l be done when I get in the next day. But I'd like a cleaner solution - at worst case itt'l give me a solution if I run in to the problem again.
Anyone have any suggestions?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

kramaswamy SSCoach Points: 18181 More actions · Answer 1

Another possible solution:

WITH cte AS

(

SELECT

#Records.Date,

#MoreRecords.[Value],

ROW_NUMBER() OVER (PARTITION BY #Records.JoinValue, #Records.Date ORDER BY #MoreRecords.[Value] DESC) AS rowNum

FROM #Records

LEFT JOIN #MoreRecords ON#MoreRecords.JoinValue = #Records.JoinValue

AND #MoreRecords.Date <= #Records.Date

AND #MoreRecords.[Value] IS NOT NULL

)

SELECT [Date], [Value] FROM cte WHERE rowNum = 1

Haven't been able to test it though, on real data, since there's other queries running ... gonna try tomorrow and see what I can get.

drew.allen SSC Guru Points: 76998 More actions · Answer 2

For performance issues, it really helps to have the actual and/or estimated execution plans.

One thing you can try on your own is turning on XML statistics before the rest of your query

SET STATISTICS XML ON

and reviewing that for "missingindexes".

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

kramaswamy SSCoach Points: 18181 More actions · Answer 3

I'd already checked that part, but I can post the execution logs for you to see if you can find an improvement. The first version took 45 minutes to run. Going to run the second one now and see how long it takes.

kramaswamy SSCoach Points: 18181 More actions · Answer 4

In the end I ended up solving my problem by just investigating it a bit more and discovering it wasn't quite as extreme as the question that I'm asking in this thread. However, I'm still curious if there is a better solution than either of the two I've proposed here, 'cause neither of those is very elegant.

drew.allen SSC Guru Points: 76998 More actions · Answer 5

I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

clubbavich Default port Points: 1449 More actions · Answer 6

How about using RANK() OVER?

See how this performs:

WITH FullResults (RecordsDate, RecordsJoinValue, MoreRecordsDate, MoreRecordsJoinValue,MoreRecordsValue, MoreRecordsRank)

AS

(

SELECT #Records.[Date] AS RecordsDate

, #Records.JoinValue AS RecordsJoinValue

, #MoreRecords.[Date] AS MoreRecordsDate

, #MoreRecords.JoinValue AS MoreRecordsJoinValue

, #MoreRecords.Value AS MoreRecordsValue

, RANK() OVER (PARTITION BY #Records.[Date], #Records.JoinValue ORDER BY #MoreRecords.[Date] DESC) AS MoreRecordsRank

FROM #Records

INNER JOIN #MoreRecords ON #MoreRecords.JoinValue = #Records.JoinValue

AND #MoreRecords.[Date] <= #Records.[Date]

AND #MoreRecords.[Value] IS NOT NULL

)

SELECT * FROM FullResults WHERE MoreRecordsRank = 1

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

drew.allen (10/5/2011)
I did look at using an APPLY rather than a subquery, but the execution plans for both were essentially the same. The only difference I found was that the subquery had a "Compute Scalar" that was missing from the APPLY.
Drew

Post your code? You may find this solution works better. Always worth investigating.

drew.allen SSC Guru Points: 76998 More actions · Answer 8

Lynn Pettis (10/13/2011)
Post your code? You may find this solution works better. Always worth investigating.

I didn't save the code.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

VSP SSC-Addicted Points: 467 More actions · Answer 9

select *

from #Records A

left outer join

(select join_value, max(value)

from #Records M

left outer join #MoreRecords N on M.join_value = N.join_value

where M.date >= N.date

group by join_value

) R on A.join_value = R.join_value

Hope this helps

VSP SSC-Addicted Points: 467 More actions · Answer 10

I had posted the incomplete query by mistake.

sorry for the confusion, here is the query that I wanted to post.

select A.*, B.Value

from #Records A

left outer join (select M.joinvalue, M.date, Max(N.date) as Recent_date

from #Records M

left outer join #MoreRecords N on M.joinvalue = N.joinvalue

where M.date >= N.date

and N.value is not null

group by M.joinvalue

,M.date

) R on A.joinvalue = R.joinvalue and A.date = R.date

left outer join #MoreRecords B on R.joinvalue = B.joinvalue and R.Recent_date = B.date

Hope this helps

clubbavich Default port Points: 1449 More actions · Answer 11

Next time a mistake like that happens, it's OK to just edit the post. 🙂

Also, it's much easier to read a query when it's posted inside a sql code block. Like this, without the space between the first bracket and the word "code"

[ code="sql"]SELECT 'Hello World' [/code].

VSP SSC-Addicted Points: 467 More actions · Answer 12

I am new to this forum. I didn't know the post can be edited. thanks for the tip.

There was indentation in the query before I posted. after it was posted, all the indentation is gone. the block, would that preserve the indentation of the original query after posting.