April 17, 2008 at 12:07 pm
since I have so much time on my hands while this monster crunches data....
I am still working to see if I can find a set based solution...
enter the correlated sub query....
Is is possible to have a Correlated Nested Sub Query???
Make sense? I'm confused just writing the question...
I need to reference a.Datefilled...inside the nested query for table Y...I bolded the place I need to do it...
I'm not sure it's even possible...But I thought I'd at least pose the question.
example query
=======================================
SELECT a.DateFilled, *
FROM ClaimsData a
INNER JOIN
(-- joining the transaction table to itself...
-- to get the latest transaction record...for the current claim record (table a)
SELECT x.*
FROM TransactionTable x
INNER JOIN (
SELECT ClientID, GroupID, MAX(timeStamp) AS timeStamp
FROM TransactionTable
WHERE timeStamp< a.DateFilled
GROUP BY ClientID, GroupID
) y
ON x.ClientID = y.ClientID
AND x.GroupID = y.GroupID
AND x.timeStamp= y.timeStamp
) b
ON b.timeStamp= a.DateFilled
April 17, 2008 at 12:54 pm
Well...you CAN, meaning it's a query that is "legal" and will get past the parser...I'm just not sure you should.
Inequality joins (otherwise known by some on this board as "triangular joins") like that tend to create a REALLY large temporary set of data to look at. considering I happen to know that you're working with 18 Million rows... it looks to me that that would KILL your server, since it would need to process something to the tune of (18M)^2, or 324 Trillion rows (give or take a few million rows). Let's just say I'd thinkg your TempDB will be about the size of Texas before you're done (if it ever finishes).... Even if you manage to put in some better linking to cut that number down - it's STILL going to turn into something huge....
You may care to read through a few techniques outlined over here. Should be a lot less hazardous, IMO.
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/[/url]
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 17, 2008 at 12:58 pm
thanks again.
i'm trying to see if I can get the correct transaction record without a nested query.......then a standard correlated subquery will work...but I know there has to be a better way.
Thanks for the link to the article... I'll do some more homework!
April 17, 2008 at 6:15 pm
Keep in mind what I said in the conclusion of the article on triangular joins...
Not all Triangular Joins are bad. With some restraint and the right criteria, Triangular Joins can be used for some pretty remarkable things... make finite schedules... do high speed dupe checks... etc. But, you've really got to be careful. Improperly written Triangular Joins are worse than even Cursors or While Loops and can bring a CPU and Disk System right to it's knees.
It looks to me like this particular triangular join may be correctly constrained.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 17, 2008 at 10:36 pm
Jeff Moden (4/17/2008)
Keep in mind what I said in the conclusion of the article on triangular joins...Not all Triangular Joins are bad. With some restraint and the right criteria, Triangular Joins can be used for some pretty remarkable things... make finite schedules... do high speed dupe checks... etc. But, you've really got to be careful. Improperly written Triangular Joins are worse than even Cursors or While Loops and can bring a CPU and Disk System right to it's knees.
It looks to me like this particular triangular join may be correctly constrained.
Perhaps - but we're still doing 18M self-joined to 18M aren't we? So - you're right - the cardinality estimate is high from earlier,but still - we're likely looking at several billion in the work table. Even with the grouping in place, and the relations...
Just bringing up yet another option for using the running totals logic....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 17, 2008 at 10:41 pm
I was able to rewrite my joins...so they didn't have to be nested....so the correlated sub query is working much faster.....
Thank you again to everyone!!!!
April 18, 2008 at 12:28 am
Highwayman (4/15/2008)
isn't that the sad truth....damn them once for creating the whole mess...
damn me for not jumping ship as soon as I realize what a steaming pile I jumped head first into;....
My days are numbered...
can you show your update statement, i have one method, it will replace the cursor concept
April 18, 2008 at 5:54 am
Highwayman (4/17/2008)
I was able to rewrite my joins...so they didn't have to be nested....so the correlated sub query is working much faster.....Thank you again to everyone!!!!
Would you mind posting your solution? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2008 at 6:12 am
Please post your solution. It would be nice reference for lot of us.:)
April 18, 2008 at 10:29 am
Here is the SQL for my solution...I'm sure there are ways to further optimize it...but its getting things done...10 times faster than my earlier attempts...
The key lessons learned were that breaking my 18m rows into smaller chunks (i.e. by cycledate) and processing them sped things up..
The second thing was finally figuring out how to make this thing set based..instead of having to do my joins 18m times. now I only have to do it 200 times...
Hope this rats nest can help somebody else. If anyone want's or needs further explanation...I'm always glad to discuss. 🙂
/*
Below is the select statement that I am using inside a loop...that loop is pulling 2 weeks worth of claims for each iteration...
*/
select
-- existing claims data
a.*
-- CoverageStratedyID---it's a PK...will be added to the Claims data table as an FK
, CSID=bh.Id
-- this is just the actual Coverage code...that the csid represents.
, CoverageStrategyCode=coalesce(nullif(m.coveragestrategyid, ''), g.coveragestrategyid)
from ClaimsData a
/*
the first 2 left joins below were the statements that replaced the joining of the transaction tables on themselves...
the allow me to return the appropriate coverage code for the claim record at the time of service.
*/
left join (
select maxtxts=max(c.txts), c.clientid, c.groupid, c.coveragestrategyid
from tpagrouptx2 c
group by c.clientid, c.groupid, c.coveragestrategyid
) g
on a.clientid=g.clientid
and a.groupnumber=g.groupid
left join (
select mtx.clientid, mtx.groupid, mtx.ssnid, mtx.personcode, mtx.coveragestrategyid, mtx.txts
from tpamembertx2 mtx
group by mtx.clientid, mtx.groupid, mtx.ssnid, mtx.personcode, mtx.coveragestrategyid, mtx.txts
) m
on m.clientid=a.clientid
and m.groupid=a.groupnumber
and m.ssnid=a.ssnid
and m.personcode=a.personcode
and m.txts<a.datefilledfull
/*
this is a join to a table that contains my historical coverage codes...
so I can return the correct CoverageCodeID...This is the field that I am adding to the 18m row table.
The current table doesn't contain a term date...so I had to derive it based on the next records effective date...
I just did this once and dumped it into the temp table.
*/
left join (
select a.id, a.CoverageCode, a.DateLastUpdated, b.TermDate
from ##BenefitHistory a join ##BenefitHistoryTerm b on a.id=b.id
) bh
on bh.CoverageCode=coalesce(nullif(m.coveragestrategyid, ''), g.coveragestrategyid)
and a.DateFilledFull BETWEEN bh.DateLastUpdated AND bh.TermDate
where g.maxtxts <= a.datefilledfull
and m.txts <= a.datefilledfull
and a.financialcutoffdate = @CutOff -- @CutOff is the current cycle date for the Loop that this whole thing is being executed in.
April 18, 2008 at 2:35 pm
Thanks ........:)
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply