Conditional Aggregation

Question

Post reply

Conditional Aggregation

Viewing 12 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply

mrpolecat SSCertifiable Points: 7074 More actions · Answer 1

To build the sequence can we order by wellnum,eval,topdepth or is there more to it?

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 2

Re. Cursor being slow.

I stopped the process after 36 minutes. I then ran the query which finds the N records to be aggregated. It returned about 46,000 at the beginning of the 36 minutes it was 55,000 that means that in 36 minutes it had only processed about 9,000 records.

I think I can speed it up quite a bit by doing one query in the update statement. That code that I scavenged is some very early code I wrote about 4 years ago. I'd do it much differently now.

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 3

Re. Building the Sequence number.

You've got it right.

mrpolecat SSCertifiable Points: 7074 More actions · Answer 4

ok here is the winner. 12 minutes for 65K rows.

here are the tables

CREATE TABLE [wells] (

[HOLEID] [varchar] (25) NOT NULL ,

[EvalNum] [int] NOT NULL ,

[Seq] [int] NULL ,

[TopDepth] [numeric](9, 2) NOT NULL ,

[BaseDepth] [numeric](9, 2) NULL ,

[MBit] [numeric](9, 2) NULL ,

[PHIE] [numeric](9, 2) NULL ,

[Swe] [numeric](9, 2) NULL ,

[Vsh] [numeric](9, 2) NULL ,

[Length] [numeric](9, 2) NOT NULL ,

[OreFlag] [varchar] (5) NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [wellsagg] (

[HOLEID] [varchar] (25) NOT NULL ,

[EvalNum] [int] NOT NULL ,

[Seq] [int] NULL ,

[TopDepth] [numeric](9, 2) NOT NULL ,

[BaseDepth] [numeric](9, 2) NULL ,

[MBit] [numeric](9, 2) NULL ,

[PHIE] [numeric](9, 2) NULL ,

[Swe] [numeric](9, 2) NULL ,

[Vsh] [numeric](9, 2) NULL ,

[Length] [numeric](9, 2) NOT NULL ,

[OreFlag] [varchar] (5) NOT NULL ,

[pseq] [int] NOT NULL ,

[nseq] [int] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [WELLSA] (

[HOLEID] [varchar] (13) NOT NULL ,

[EvalNum] [int] NOT NULL ,

[Seq] [int] IDENTITY (1, 1) NOT NULL ,

[TopDepth] [numeric](6, 2) NOT NULL ,

[BaseDepth] [numeric](6, 2) NOT NULL ,

[MBit] [numeric](6, 2) NULL ,

[PHIE] [numeric](6, 2) NULL ,

[SwE] [numeric](6, 2) NULL ,

[Vsh] [numeric](6, 2) NULL ,

[Length] [numeric](6, 2) NULL ,

[OreFlag] [varchar] (1) NULL

) ON [PRIMARY]

GO

CREATE TABLE [WELLSB] (

[HOLEID] [varchar] (13) NOT NULL ,

[EvalNum] [int] NOT NULL ,

[Seq] [int] IDENTITY (1, 1) NOT NULL ,

[TopDepth] [numeric](6, 2) NOT NULL ,

[BaseDepth] [numeric](6, 2) NOT NULL ,

[MBit] [numeric](6, 2) NULL ,

[PHIE] [numeric](6, 2) NULL ,

[SwE] [numeric](6, 2) NULL ,

[Vsh] [numeric](6, 2) NULL ,

[Length] [numeric](6, 2) NULL ,

[OreFlag] [varchar] (1) NULL

) ON [PRIMARY]

GO

here is the code

truncate table wellsa

insert into wellsa (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag from wells

order by holeid,evalnum,topdepth

declare @wellsacnt int, @wellsbcnt int, @x int,@starttime datetime,@secs int

set @x = 1

set @starttime = getdate()

print 'Iteration ' + convert(varchar,@x) + ' : ' + convert(varchar,getdate())

select @wellsacnt = 0

select @wellsbcnt = 1

while @wellsacnt <> @wellsbcnt

begin

--move aggreagated batch from wellsa to wellsagg

truncate table wellsagg

insert into wellsagg (holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq)

select holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq from (

select

a.holeid,a.evalnum,c.seq

,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end basedepth

,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end topdepth

,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) mbit

,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) phie

,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) swe

,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) vsh

, (a.length+b.length + c.length) length

,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag,

a.seq pseq,b.seq nseq

from wellsa a

join wellsa b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wellsa c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

join

(select a.holeid,a.evalnum,min(a.seq) seq

from wellsa a

join wellsa b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wellsa c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

group by a.holeid,a.evalnum) fs on fs.seq = a.seq

where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

) n

order by seq asc

--move aggs and remaining to wellsb for seqencing

truncate table wellsb

insert into wellsb (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from (

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from wellsagg

union

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from wellsa w

left join (

select seq from wellsagg

union select pseq from wellsagg

union select nseq from wellsagg

)a on w.seq = a.seq

where a.seq is null

)n order by holeid,evalnum,topdepth

--move aggreagated batch from wellsb to wellsagg

truncate table wellsagg

insert into wellsagg (holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq)

select holeid,evalnum,seq,basedepth,topdepth,mbit,phie,swe,vsh,length,oreflag,pseq,nseq from (

select

a.holeid,a.evalnum,c.seq

,case when case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end > c.basedepth then case when a.basedepth > b.basedepth then a.basedepth else b.basedepth end else c.basedepth end basedepth

,case when case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end < c.topdepth then case when a.topdepth < b.topdepth then a.topdepth else b.topdepth end else c.topdepth end topdepth

,((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) mbit

,((a.phie*a.length) +(b.phie*b.length) + (c.phie*c.length)) /(a.length+b.length + c.length) phie

,((a.swe*a.length) +(b.swe*b.length) + (c.swe*c.length)) /(a.length+b.length + c.length) swe

,((a.vsh*a.length) +(b.vsh*b.length) + (c.vsh*c.length)) /(a.length+b.length + c.length) vsh

, (a.length+b.length + c.length) length

,case when ((a.mbit*a.length) +(b.mbit*b.length) + (c.mbit*c.length)) /(a.length+b.length + c.length) > .06 then 'Y' else 'N' end oreflag,

a.seq pseq,b.seq nseq

from wellsb a

join wellsb b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wellsb c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

join

(select a.holeid,a.evalnum,min(a.seq) seq

from wellsb a

join wellsb b on a.evalnum = b.evalnum and a.holeid=b.holeid and a.seq = b.seq -1 and b.oreflag = 'y'

join wellsb c on a.evalnum = c.evalnum and a.holeid=c.holeid and a.seq = c.seq +1 and c.oreflag = 'y'

where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

group by a.holeid,a.evalnum) fs on fs.seq = a.seq

where a.oreflag = 'n' and a.length < 1 and a.length < b.length and a.length < c.length

) n

order by seq asc

--move aggs and remaining to wellsa for seqencing

truncate table wellsa

insert into wellsa (holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag )

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from (

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from wellsagg

union

select holeid,evalnum,topdepth,basedepth,mbit,phie,swe,vsh,length,oreflag

from wellsb w

left join (

select seq from wellsagg

union select pseq from wellsagg

union select nseq from wellsagg

)a on w.seq = a.seq

where a.seq is null

)n order by holeid,evalnum,topdepth

select @wellsacnt = count(*) from wellsa

print 'wellsa rowcount = ' + convert(varchar,(@wellsacnt))

select @wellsbcnt = count(*) from wellsb

print 'wellsb rowcount = ' + convert(varchar,(@wellsbcnt))

select @x=@x+1

print 'Iteration ' + convert(varchar,@x) + ' : ' + convert(varchar,getdate())

end

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 5

I'll run this against the whole dataset and see how long it takes.

I revised that cursor-based code that I put up yesterday to use one query in the update statement. For some reason when I did it that way it messed with the cursor and so I was getting a @@fetch_status of -2. Not sure why that was happening so I built the cursor off of a copy of the table (not the one I was deleting records out of) and it seems to work.

It took 1:54:30 to run the whole thing. Still not very impressive. I don't know how long the old code would have taken but I'm sure it would have been longer.

mrpolecat SSCertifiable Points: 7074 More actions · Answer 6

I forgot to mention that I played around with a few indexing schemes and found that no indexes worked the fastest. Likely because they slow the inserts more than they speed up the selects. Not sure what indexes you were using on the tables with your cursor version but that may help as well.

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 7

I haven't run your code yet but I think I've found my solution - I had an epiphany this morning (I have to check specific wells to make sure that the math is done right so I'm not quite sold yet. It completes in about 45 seconds!

It works by;

- marking the rows which need to be aggregated

- figuring out a grouping number (using a VERY slick method which I learned on this forum!)

- grouping the records

Voila!

As long as the math product of aggregating all contiguous rows at once is the same as aggregating one group of three and then the next group of three then I think that this solution works.

IF OBJECT_ID(N'TempDb..##TEST',N'U') IS NOT NULL DROP TABLE ##TEST

SELECT *, MBit * Length AS MBit_Length_Product, PHIE * Length AS PHIE_Length_Product, SwE * Length AS SwE_Length_Product, Vsh * Length AS Vsh_Length_Product, ' ' AS Assemble, 0 AS GroupNum INTO ##TEST FROM ##Aggregated_Intervals

CREATE UNIQUE CLUSTERED INDEX TEST_PK ON ##TEST(HoleId, EvalNum, Seq)

UPDATE PREV

SET Assemble = 'Y'

FROM ##TEST CUR

LEFT JOIN ##TEST PREV

ON PREV.HoleId = CUR.HoleId

AND PREV.EvalNum = CUR.EvalNum

AND PREV.Seq = CUR.Seq - 1

LEFT JOIN ##TEST NEX

ON NEX.HoleId = CUR.HoleId

AND NEX.EvalNum = CUR.EvalNum

AND NEX.Seq = CUR.Seq + 1

WHERE

CUR.OreFlag = 'N'

AND CUR.LENGTH <= 1

AND CUR.LENGTH <= PREV.LENGTH

AND CUR.LENGTH <= NEX.LENGTH

AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

UPDATE CUR

SET Assemble = 'Y'

FROM ##TEST CUR

LEFT JOIN ##TEST PREV

ON PREV.HoleId = CUR.HoleId

AND PREV.EvalNum = CUR.EvalNum

AND PREV.Seq = CUR.Seq - 1

LEFT JOIN ##TEST NEX

ON NEX.HoleId = CUR.HoleId

AND NEX.EvalNum = CUR.EvalNum

AND NEX.Seq = CUR.Seq + 1

WHERE

CUR.OreFlag = 'N'

AND CUR.LENGTH <= 1

AND CUR.LENGTH <= PREV.LENGTH

AND CUR.LENGTH <= NEX.LENGTH

AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

UPDATE NEX

SET Assemble = 'Y'

FROM ##TEST CUR

LEFT JOIN ##TEST PREV

ON PREV.HoleId = CUR.HoleId

AND PREV.EvalNum = CUR.EvalNum

AND PREV.Seq = CUR.Seq - 1

LEFT JOIN ##TEST NEX

ON NEX.HoleId = CUR.HoleId

AND NEX.EvalNum = CUR.EvalNum

AND NEX.Seq = CUR.Seq + 1

WHERE

CUR.OreFlag = 'N'

AND CUR.LENGTH <= 1

AND CUR.LENGTH <= PREV.LENGTH

AND CUR.LENGTH <= NEX.LENGTH

AND (((PREV.MBit * PREV.Length) + (CUR.MBit * CUR.Length) + (NEX.MBit * NEX.Length)) / (PREV.Length + CUR.Length + NEX.Length)) >= 0.06

DECLARE

@PrevHoleId AS VARCHAR(13)

,@PrevEvalNum AS INT

,@PrevAssemble AS VARCHAR(1)

,@GroupNum AS INT

SET @GroupNum = 0

SET @PrevHoleId = ' '

SET @PrevEvalNum = 0

SET @PrevAssemble = 'N'

-- Beginning of cool code from the forum

UPDATE ##TEST

SET

@GroupNum = GroupNum = CASE

WHEN @PrevHoleId <> HoleId OR @PrevEvalNum <> EvalNum THEN 1

WHEN Assemble = ' ' THEN @GroupNum + 1

WHEN @PrevHoleId = HoleId AND @PrevEvalNum = EvalNum AND @PrevAssemble = Assemble THEN @GroupNum

ELSE @GroupNum + 1

END

,@PrevHoleId = HoleId

,@PrevEvalNum = EvalNum

,@PrevAssemble = Assemble

FROM ##TEST WITH (tablock, INDEX(TEST_PK))

-- End of the cool code

SELECT

HoleId

,EvalNum

,MIN(TopDepth) AS TopDepth

,MAX(BaseDepth) AS BaseDepth

,SUM(MBit_Length_Product) / SUM(Length) AS MBit

,SUM(PHIE_Length_Product) / SUM(Length) AS PHIE

,SUM(SwE_Length_Product) / SUM(Length) AS SwE

,SUM(Vsh_Length_Product) / SUM(Length) AS Vsh

,MAX(OreFlag) AS OreFlag

,SUM(Length) AS Length

FROM ##TEST

GROUP BY

HoleId

,EvalNum

,GroupNum

ORDER BY

HoleId

,EvalNum

,TopDepth

mrpolecat SSCertifiable Points: 7074 More actions · Answer 8

Your new method comes up with less rows in the result than mine on t my test data. I am running a compare now to see where the differences lie. Also I noticed another error in my code where I left Vsh out of a few selects and it gets lost. I compared some of my result rows to yours and the math seemed to be the same.

Out of 65550 rows in the test data my method results in 50079 and yours comes up with 48253. We match on 46927 rows.

here is one of the differences

wellnum1001 evalnum 3

mine

WellNum1001376369.45369.75.07.26.40NULL.30Y

WellNum1001377369.75370.35.05.19.48NULL.60N

WellNum1001378370.35372.15.13.35.23NULL1.80Y

WellNum1001379372.15373.05.01.08.74NULL.90N

WellNum1001380373.05373.95.07.26.39NULL.90Y

WellNum1001381373.95375.15.02.13.68NULL1.20N

yours

WellNum10013369.45369.75.070000.260000.400000.190000Y.30

WellNum10013369.75370.35.050000.190000.480000.350000N.60

WellNum10013370.35373.95.085000.260000.397500.227500Y3.60

WellNum10013373.95375.15.020000.130000.680000.490000N1.20

CanuckBuck Hall of Fame Points: 3915 More actions · Answer 9

I think that the difference is attributable to the where clause where we are comparing the length value of the N row to the length of the bounding Y rows.

I'm using <= and you're using <

mrpolecat SSCertifiable Points: 7074 More actions · Answer 10

ok, I am changing it and running it again. All of the other calculations look like they match and it is definitely much faster 1m45s to 12m44s on my server with the test data. Also a lot less overhead having 1 temp table instead of 3.

mrpolecat SSCertifiable Points: 7074 More actions · Answer 11

I ran mine with >= and and < and ended up with 17 passes resulting in 45351 rows again I see a difference in wellnum1001 evalnum3 which resulted in 69 rows vs 71 rows from the new method.

mine

WellNum100134353.8554.45.02.13.61.50.60N

WellNum100134454.4555.95.06.22.43.271.50N

WellNum100134555.9556.85.04.16.51.42.90N

WellNum100134656.8557.15.06.24.42.23.30Y

WellNum100134757.1558.65.04.17.50.391.50N

yours

WellNum1001353.8554.45.020000.130000.610000.500000N.60

WellNum1001354.4555.35.060000.220000.430000.270000Y.90

WellNum1001355.3555.65.050000.210000.440000.300000N.30

WellNum1001355.6555.95.060000.240000.420000.240000Y.30

WellNum1001355.9556.85.040000.160000.510000.420000N.90

WellNum1001356.8557.15.060000.240000.420000.230000Y.30

mrpolecat SSCertifiable Points: 7074 More actions · Answer 12

I am guessing that the differnce may be rounding (my columns are decimal(9,2)) or averaging multiple rows at once vs averaging 3 rows and then using the resulting average to average another 3 rows.