How do I get rid of this correlated sub-query and avoid the RBAR?

Question

How do I get rid of this correlated sub-query and avoid the RBAR?

rhandloff

Old Hand

Points: 365
More actions
May 9, 2008 at 1:15 pm

#380395

I have two tables in a SQL 2005 database. One has a row for each data item and day in a period of time and the other has rows for only selected days. I've written a query which joins the two such that for each day I have the specified row from the first table joined to the row in the second table such that it gives me the data for the day in the most recent past relative to the day in the first row.
I know that sounds confusing but it's really pretty straightforward. See the sample queries below. I think they'll illustrate what I am trying to say much better than I can.
Anyway, the query returns the exact result for which I'm looking but it performs horribly. There are about five million rows in one table and several hundred thousand in the sparse table. I believe the bottleneck is the correlated sub-query. I think it is creating a form of RBAR. Is there anyway to write a similar query without the correlated sub-query? I think the answer lies in windowing functions but I cannot seem to get them to work the way I want.
Any help would be greatly appreciated.
if object_id(N'tempdb..#sparseData') is not null
drop table #sparseData
if object_id(N'tempdb..#drivingTable') is not null
drop table #drivingTable
select 1 as keyid, convert(datetime,'2008-01-01') as dateId
into #drivingTable
union
select 2, '2008-01-02'
union
select 3, '2008-01-03'
union
select 2, '2008-01-04'
union
select 4, '2008-01-04'
union
select 5, '2008-01-05'
union
select 6, '2008-01-06'
union
select 7, '2008-01-07'
union
select 8, '2008-01-08'
union
select 9, '2008-01-09'
union
select 10, '2008-01-10'
select 3 as testVal, convert(datetime,'2008-01-01') as dateId
into #sparseData
union
select 10, '2008-01-05'
union
select 7, '2008-01-08'
select dt.*, sd.dateid, sd.testVal
from #drivingTable dt
inner join #sparseData sd
on sd.dateid = (
select max(dateId)
from #sparseData
where dateId <= dt.dateid
)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

The only way I can think of to do this more efficiently would be to select the max(sd.DateID) for each dt.DateID in one query, then join that to the sparse data table for the test value.

Something like:

create table #DT (

DTDateID datetime,

SDDateID datetime)

insert into #dt (dtdateid, sddateid)

select dt.dateid, max(sd.dateid)

from drivingTable dt

inner join sparseData sd

on dt.dateid >= sd.dateid

group by dt.dateid

select dt.*, sd.testVal

from #dt dt

inner join sparseData sd

on dt.sddateid = sd.dateid

You could play around with adding indexes to the #DT temp table to get better performance out of it, depending on the number of rows that are going into it. You might also need to include the "KeyID" from the driving table in it, depending on how that works with your data.

Try that, see if it improves things for you.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Carl Federl One Orange Chip Points: 25384 More actions · Answer 2

The correlated sub-query is definitely the culprit but there may not be an alternative, but try the last SQL statement below.

if object_id(N'sparseData') is not null drop table sparseData

if object_id(N'drivingTable') is not null drop table drivingTable

create table drivingTable

(KeyIdinteger not null

,drivingDate datetime not null

, constraint drivingTable_P primary key (KeyId , drivingDate)

)

create table sparseData

(testVal integer not null

,sparseDate datetime not null

constraint sparseData_P primary key (sparseDate)

)

go

select drivingTable.*, sparseData.*

from drivingTable

join sparseData

on sparseData.sparseDate = (

select max(sd.sparseDate)

from sparseDatasd

where sd.sparseDate <= drivingTable.drivingDate

)

select drivingTable.*

, sparseData.*

from drivingTable

join(select drivingTable.KeyId

, drivingTable.drivingDate

,MAX(sparseDate) as sparseDate

fromdrivingTable

join sparseData

on sparseData.sparseDate <= drivingTable.drivingDate

group by drivingTable.KeyId

, drivingTable.drivingDate

) as DSL

on DSL.KeyId = drivingTable.KeyId

and DSL.drivingDate = drivingTable.drivingDate

join sparseData

on sparseData.sparseDate = DSL.sparseDate

SQL = Scarcely Qualifies as a Language

rhandloff Old Hand Points: 365 More actions · Answer 3

Thanks! Both of these suggestions do what I need. I look forward to playing around with them and examining their performance relative to the correlated sub-query. I'm sure they are much faster.

Adam Haines SSC-Insane Points: 23217 More actions · Answer 4

rhandloff (5/9/2008)
Thanks! Both of these suggestions do what I need. I look forward to playing around with them and examining their performance relative to the correlated sub-query. I'm sure they are much faster.

The simple solution is to use the row_number function. See below.

select keyid,

dateid1,

dateid2,

testval

from(

select dt.keyid,

dt.dateid as dateid1,

sd.dateid as dateid2,

sd.testVal,

row_number()

over(partition by dt.keyid, dt.dateid

order by sd.dateid desc) as row

from #drivingTable dt

inner join #sparseData sd

on sd.dateid <= dt.dateid

) as a

where row = 1

My blog: http://jahaines.blogspot.com

RyanRandall SSChampion Points: 13623 More actions · Answer 5

Another option:

select * from #drivingTable a cross apply

(select top 1 * from #sparseData where dateId <= a.dateid order by dateId desc) b

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

RyanRandall SSChampion Points: 13623 More actions · Answer 6

If you add an index (clustered or not) to #sparseData...

create clustered index ix_sd1 on #sparseData (dateId)

...the cross apply has the best performance for the small dataset you give (at least on my machine). Please let us know how the various options perform on your full data (with/without indexes)... 🙂

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 7

Isn't CROSS APPLY just another name for "correlated sub-query"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RyanRandall SSChampion Points: 13623 More actions · Answer 8

Jeff Moden (5/12/2008)
Isn't CROSS APPLY just another name for "correlated sub-query"?

😛

It sure is, but if it's the fastest method, who cares? 🙂

It's something else to try, anyway, and I'm interested how it fares in this case.

More info here, although I'm sure you (Jeff) know about this...

http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

rhandloff Old Hand Points: 365 More actions · Answer 9

Thanks for all the solutions! I don't know if I will have time to adequately test them all as I'm under a pretty intensive deadline but just from a very cursory testing each of these solutions performs much better than my original query. If I have time I will report back which one worked best.

RyanRandall SSChampion Points: 13623 More actions · Answer 10

rhandloff (5/12/2008)
Thanks for all the solutions! I don't know if I will have time to adequately test them all as I'm under a pretty intensive deadline but just from a very cursory testing each of these solutions performs much better than my original query. If I have time I will report back which one worked best.

Thanks rhandloff - and good luck! 🙂

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 11

RyanRandall (5/12/2008)
Jeff Moden (5/12/2008)
Isn't CROSS APPLY just another name for "correlated sub-query"?
😛
It sure is, but if it's the fastest method, who cares? 🙂
It's something else to try, anyway, and I'm interested how it fares in this case.
More info here, although I'm sure you (Jeff) know about this...
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx

Heh... not my blog, Ryan. I think that's Jeff Smith's, but I'm not sure.

I agree... if Cross Apply is the most performant, then use it. But Cross Apply typically has the same performance as a correlated subquery. If the correlated subquery is non-performant, the Cross Apply won't be any better. Both work seemingly well in the face of small rowcounts and both can work very well in the face of large rowcounts... sometimes. It may be so, in this case... the triangular join in both is severly constrained by the TOP 1... dunno if there may be something faster in this case because I've not studied the problem yet.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RyanRandall SSChampion Points: 13623 More actions · Answer 12

Jeff Moden (5/12/2008)
Heh... not my blog, Ryan. I think that's Jeff Smith's, but I'm not sure.

LOL - yes I know it's not your blog. I guess you missed the "Welcome to my weblog. My name is Jeff Smith, and I am a software developer in Boston, MA." bit on the left 😀

We agree on the rest of it... 🙂

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 13

RyanRandall (5/12/2008)
Jeff Moden (5/12/2008)
Heh... not my blog, Ryan. I think that's Jeff Smith's, but I'm not sure.
LOL - yes I know it's not your blog. I guess you missed the "Welcome to my weblog. My name is Jeff Smith, and I am a software developer in Boston, MA." bit on the left 😀
We agree on the rest of it... 🙂

Yep... not enough coffee.......... 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 14

Jeff Moden (5/12/2008)
Isn't CROSS APPLY just another name for "correlated sub-query"?

Actually, in the right (wrong) circumstances, Cross Apply can be worse than a regular correlated sub-query. It depends on the function being called. A correlated sub-query (in most cases) pretty much has to be at least set-based in itself, while a multi-select table-valued function can have pretty much unlimited inefficiency.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon