Conditional Aggregation

Question

Conditional Aggregation

CanuckBuck

Hall of Fame

Points: 3915
More actions
April 21, 2008 at 9:19 am

#76964

I have a table with records which are flagged as passing or failing an initial criteria test.
The records look like this;
For any given well I need to test each record with an OreFlag of N which is bounded by records with an OreFlag of Y and if;
- The length of the N record is less than 1 (meter). This condition may be a variable
- The length of the N record is less than the smaller of the to Y records
- and the Length weighted average of the MBit column would be greater than 0.06 (This the initial - variable - critera which termined whether a record was initially set to Y or N).
if you look at records 10 through 12 of the screen shot you will see that these should be aggregated. This, of course, leads to the next issue. Recursion. Conditions exist (see wellNum4) where, once the first aggregation is completed, It creates new records which should then also be aggregated until, at some point, no more aggregation can be done.
Aggregation involves taking the minimum TopDepth, maximum BaseDepth and the length weighted average of MBit, PHIE, Vsh, SwE (if you're a geologist you'll know what those attributes are ;-)), and the sum of length.
The data set I'm working with has 367,000 rows. If you're interested in helping me solve this problem and having the data set I can up-load it (it's about 4Mb zipped).

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply

mrpolecat SSCertifiable Points: 7014 More actions · Answer 1

THe picture is a broken link. Please provide the table structure, small set of sample records, and an example of your expected results.

mrpolecat SSCertifiable Points: 7014 More actions · Answer 2

mrpolecat

SSCertifiable

Points: 7014

April 21, 2008 at 10:42 am

#805095

Can you also provide a create table script?

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 3

Ah. I should just mention that in the assembled intervals PNG the 2nd example is a case where there would be recursion. The first three records would be aggregated (shown as the row in red), and then that newly aggregated row should be combined with the two above it to produce the row shown in blue.

mrpolecat SSCertifiable Points: 7014 More actions · Answer 4

For "bound by" can we assume the sequence number is +1 and -1 for bound records?

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 5

The table I'm working with is acutally a temporary table I've created from the source data which has been quite processed to get it to this point.

The table definition below will give you a table which matches the data I posted and has the appropriate PK.

The column called Seq is one I've added for convenience of processing (makes it able to do previous row, next row processing)

CREATE TABLE [EUB\CB238].[TEMP](

[HOLEID] [varchar](13) NOT NULL,

[EvalNum] [int] NOT NULL,

[Seq] [int] NOT NULL,

[TopDepth] [numeric](6, 2) NOT NULL,

[BaseDepth] [numeric](6, 2) NOT NULL,

[MBit] NUMERIC(6,2) NULL,

[PHIE] NUMERIC(6,2) NULL,

[SwE] NUMERIC(6,2) NULL,

[Vsh] NUMERIC(6,2) NULL,

[Length] [numeric](6, 2) NULL,

[OreFlag] [varchar](1)

)

CREATE UNIQUE CLUSTERED INDEX TEMP_PK ON TEMP(HoleId, EvalNum, TopDepth)

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 6

I'm not sure what you mean by bound by. If you mean that I can join the table to it's self like this

FROM ##TEMP CUR

LEFT JOIN ##TEMP PREV

ON PREV.HoleId = CUR.HoleId

AND PREV.EvalNum = CUR.EvalNum

AND PREV.Seq = CUR.Seq - 1

LEFT JOIN ##TEMP NEX

ON NEX.HoleId = CUR.HoleId

AND NEX.EvalNum = CUR.EvalNum

AND NEX.Seq = CUR.Seq + 1

to get the previous and next rows - then yes.

Again - Thanks!

mrpolecat SSCertifiable Points: 7014 More actions · Answer 7

Here is what I have so far to gather the records that need to be aggregated. How do I determine the lenght weighed average of the MBit column?

select a.*,b.seq prec,c.seq nrec from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

union

select d.*,0,0 from wells d

join

(select a.*,b.seq prec,c.seq nrec from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

) set1 on d.seq=set1.prec

union

select d.*,0,0 from wells d

join

(select a.*,b.seq prec,c.seq nrec from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

) set1 on d.seq=set1.nrec

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 8

Length (it should actually be [interval] thickness rather than length - My bad on the naming) weighted average is calculated like this;

((Prev.MBit * Prev.Length)

+ (Cur.MBit * Cur.Length)

+ (Nex.MBit * Nex.Length))

/(Prev.Length + Cur.Length + Nex.Length)

mrpolecat SSCertifiable Points: 7014 More actions · Answer 9

Let's try this

select

a.holeid,a.evalnum,c.seq

/*a.seq,a.topdepth,a.basedepth,a.mbit,a.phie,a.swe,a.vsh,a.length,a.oreflag,b.seq prec,c.seq nrec,a.mbit*a.length wambit,a.phie*a.length waphie,a.vsh*a.length wavsh,a.swe*a.length waswe*/

,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end maxbasedepth

,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end mintopdepth

,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) lwambit

,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) lwaphie

,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) lwaswe

,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) lwavsh

, (a.length+b.length + c.length) sumlength

,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag

from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

mrpolecat SSCertifiable Points: 7014 More actions · Answer 10

This adds the new aggregated records into the other records.

select * from (

select

a.holeid,a.evalnum,c.seq

/*a.seq,a.topdepth,a.basedepth,a.mbit,a.phie,a.swe,a.vsh,a.length,a.oreflag,b.seq prec,c.seq nrec,a.mbit*a.length wambit,a.phie*a.length waphie,a.vsh*a.length wavsh,a.swe*a.length waswe*/

,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end maxbasedepth

,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end mintopdepth

,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) lwambit

,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) lwaphie

,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) lwaswe

,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) lwavsh

, (a.length+b.length + c.length) sumlength

,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag

from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

union

select

holeid,evalnum,wells.seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag

from wells

left join

(select a.seq seq1,b.seq pseq,c.seq nseq

from wells a

join wells b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wells c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1and a.length < b.length and a.length < c.length

) aggs on wells.seq = aggs.seq1 or wells.seq = aggs.pseq or wells.seq = aggs.nseq

where aggs.seq1 is null

) n

order by seq

How may times do you think these would need to be aggregated?

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 11

Hi Old Hand;

Thanks for your time on this.

I don't know how many times the aggregation would have to be itterated. It sort of depends on what the initial criteria value of MBit is. If it's set low (like 0.03) then most of the intervals get aggregated right off the bat and their's likely not more than one iteration. If it's set high (0.09 or more) then there could be many itterations.

How long do you think the script you posted above should take? I stopped it after 20 minutes.

mrpolecat SSCertifiable Points: 7014 More actions · Answer 12

It only took 1 second on the test data you gave me and that was without any indexes on the table. To run multiple iterations I think we will need to insert ithe data from each iteration into a new table to build a new sequence because after each iteration the sequence has gaps in it. My thoughts are having 2 staging tables and flipping back and forth until the number of rows match between the 2. If you want to post your full data set I can see how long it takes to run.

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 13

Ive been trying to upload the full data file (4mb zipped) and there seems to be something wrong with the upload utility right now.

I'll try again in a couple of hours.

I have to confess that I'm having a little bit of trouble following what the script is acutually doing.

Could you please explain the principles of the approach that you've put together?

Thanks!

mrpolecat SSCertifiable Points: 7014 More actions · Answer 14

I just found an error in my code where I reversed the basedepth and topdepth in the union. I edited the post above to correct this.