September 16, 2010 at 10:52 am
Hey folks, hoping I can shortcut the full test harness thing and see if anyone's done some multi-column (to single result) aggregation recently. A quickie example:
CREATE TABLE #tmp
(tIDINT IDENTITY(1, 1),
Acct_IDINT,
Category VARCHAR(20),
Value1MONEY,
Value2MONEY
)
GO
SELECT
Acct_ID,
SUM( SumMe) As Value
FROM
(SELECT
Acct_ID,
CASE WHEN Category IN ( 'a', 'bdq', 'rr') THEN Value1
WHEN Category IN ( 'zz', 'xx', 'qrq') THEN Value2
ELSE 0
END AS sumMe
FROM
#tmp
)
GROUP BY
Acct_ID
What I'm doing is dealing with variable entry cycles where a 'positive increase to an account' could be in either value 1 or value 2 depending on the category, because the category may apply to another account (either taking money from, or sending money to) the account I'm looking for.
The real query is wrapped around dates and a few other conditions, but my primary concern is this is a 60 million row+ recordset that's wide and while I can control the indexing, I'm hoping someone will share their research/results/links to a well defined, already done, generic optimization in this style. I just can't be the first person to bang my head on this wall. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 16, 2010 at 2:57 pm
Generic optimizations are difficult to give since how long something takes will vary depending on the schema. One thing that may come in handy is using CTE to handle some of this. You may be able to get the sums of the values separately that way and then sum them after that.
declare @Tmp table (tID INT IDENTITY(1, 1),
Acct_ID INT,
Category VARCHAR(20),
Value1 MONEY,
Value2 MONEY
);
with sum_stage as (select acct_id, 'value1' [col], sum(value1) [value] from @Tmp where category in ('a', 'bdq', 'rr')
group by acct_id
union all
select acct_id, 'value2' [col], sum(value2) [value] from @Tmp where category in ('zz', 'xx', 'qrq')
group by acct_id)
select acct_id, sum(value) from sum_stage
If you're taking in an account ID as a parameter then that can be included in the CTE. Or, since you were a little vague on the description, this may end up being to difficult to do in your environment depending on the number of columns.
September 16, 2010 at 3:07 pm
cfradenburg (9/16/2010)
Generic optimizations are difficult to give since how long something takes will vary depending on the schema.
Yeah, I know, it's painful, but I was figuring this had been done a billion times and I could just apply it.
One thing that may come in handy is using CTE to handle some of this. You may be able to get the sums of the values separately that way and then sum them after that.
declare @Tmp table (tID INT IDENTITY(1, 1),
Acct_ID INT,
Category VARCHAR(20),
Value1 MONEY,
Value2 MONEY
);
with sum_stage as (select acct_id, 'value1' [col], sum(value1) [value] from @Tmp where category in ('a', 'bdq', 'rr')
group by acct_id
union all
select acct_id, 'value2' [col], sum(value2) [value] from @Tmp where category in ('zz', 'xx', 'qrq')
group by acct_id)
select acct_id, sum(value) from sum_stage
If you're taking in an account ID as a parameter then that can be included in the CTE. Or, since you were a little vague on the description, this may end up being to difficult to do in your environment depending on the number of columns.
Yeah, the problem is I already know I'm scanning the index/table i'll be working from, because as your last comment indicates, I'm not passing any form of filtering other then some date work. This is for an ETL process and I'm working off original schema that is vendor controlled. I'm trying to keep it simple because instead of ending up lost in the details I was hoping to find a good baseline. The inside of that WITH statement is equivalently what I'm looking at the moment, with a wrapper sum on the external that's pretty cheap.
I'm still building out the general first pass against the structure. Once I do I'll post up my code/structures/etc. I do appreciate the initial attempt, however. Thank you.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 16, 2010 at 3:32 pm
Would that be an easy baseline?
SELECT
Acct_ID,
SUM( CASE WHEN Category IN ( 'a', 'bdq', 'rr') THEN Value1
WHEN Category IN ( 'zz', 'xx', 'qrq') THEN Value2
ELSE 0
END) AS VALUE
FROM #tmp
GROUP BY Acct_ID
It still performs a table scan and the execution plan is identical to your original query.
All that's changed is syntax (maybe readability...).
So I'm not sure if it will be of any value for you...:unsure:
September 16, 2010 at 3:54 pm
Yeah, Lutz, pretty much the same result, but I see what you're driving at. I had been originally tempted not to put in any code at all and ask for resources, but I knew some folks would get a better idea from a sample example. 🙂
Here's a sample of what I'm trying to optimize (realize I'm incomplete and this isn't what I'd want to optimize finally, but it shows the beginning of the complexity). I believe the problem is I was hoping to find a few ideas towards the principals of multi-field aggregation optimization.
CREATE TABLE [dbo].[vPortfolioTransaction](
[PortfolioID] [int] NOT NULL,
[PortfolioTransactionID] [int] NOT NULL,
[TradeDate] [datetime] NULL,
[SequenceNo] [int] NOT NULL,
[RecID] [tinyint] NOT NULL,
[TransactionCode] [char](2) NOT NULL,
[TranCodeLabel] [varchar](50) NULL,
[Comment] [varchar](70) NULL,
[SecTypeCode1] [char](2) NULL,
[SecurityID1] [int] NULL,
[SettleDate] [datetime] NULL,
[OriginalCostDate] [datetime] NULL,
[Quantity] [float] NULL,
[ClosingMethodCode] [char](1) NULL,
[SecTypeCode2] [char](2) NULL,
[SecurityID2] [int] NULL,
[TradeDateFX] [float] NULL,
[SettleDateFX] [float] NULL,
[OriginalFX] [float] NULL,
[MarkToMarket] [bit] NULL,
[TradeAmount] [float] NULL,
[OriginalCost] [float] NULL,
[WithholdingTax] [float] NULL,
[ExchangeID] [tinyint] NULL,
[ExchangeFee] [float] NULL,
[Commission] [float] NULL,
[ImpliedCommission] [bit] NULL,
[OtherFees] [float] NULL,
[CommissionPurposeID] [tinyint] NOT NULL,
[IsPledge] [bit] NULL,
[CustodianID] [int] NULL,
[IsDestPledge] [bit] NULL,
[DestCustodianID] [int] NULL,
[OriginalFace] [float] NULL,
[YieldOnCost] [float] NULL,
[DurationOnCost] [float] NULL,
[TransUserDef1ID] [tinyint] NOT NULL,
[TransUserDef2ID] [tinyint] NOT NULL,
[TransUserDef3ID] [tinyint] NOT NULL,
[TranID] [int] NULL,
[IPCounter] [varchar](16) NULL,
[SourceID] [int] NULL,
[PostDate] [datetime] NULL,
[LotNumber] [smallint] NULL,
[ReclaimAmount] [float] NULL,
[StrategyID] [int] NULL,
[RecordDate] [datetime] NULL,
[DivTradeDate] [datetime] NULL,
[PerfContributionOrWithdrawal] [bit] NULL,
[VersusDate] [datetime] NULL,
[FeePeriodDate] [datetime] NULL,
[BrokerFirmID] [int] NULL,
[BrokerRepSecurityID] [int] NULL,
[TradeBlotterLineID] [int] NULL
) ON [PRIMARY]
The initial aggregation code I'm currently working off of (I'm still fleshing out the logic, trying to remove # of passes, etc):
(Note to self, test even half done code before posting, sorry, it's better now...)
DECLARE @DataUntilDate DATETIME
SET @DataUntilDate = GETDATE()
SELECT
ISNULL( drvPos.pID, drvNeg.pID) AS pID,
ISNULL( drvPos.sID, drvNeg.sID) AS sID,
ISNULL( drvPos.qty, 0) - ISNULL( drvNeg.qty, 0) AS qty,
ISNULL( drvPos.amt, 0) - ISNULL( drvNeg.amt, 0) AS amt
FROM
(SELECT
pID,
sID,
ISNULL( sum( qty), 0) AS qty,
sum( amt) as amt
FROM
(
select
portfolioID AS pID,
securityID1 AS sID,
sum( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,
SUM( TradeAmount) AS amt
FROM
vPortfolioTransaction
WHERE
transactionCode IN ( 'ti', 'li', 'dp', 'by', 'cs')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
securityID1
UNION ALL
SELECT
PortfolioID,
securityID2,
SUM( CASE WHEN sectypecode2 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) ,
SUM( TradeAmount)
FROm
vPortfolioTransaction
WHERE
transactionCode IN ( 'sl', 'wd', 'in', 'dv')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
securityID2
) AS drv
GROUP BY
pID,
sID
) AS drvPos
FULL OUTER JOIN
(SELECT
pID,
sID,
ISNULL( sum( qty), 0) AS qty,
sum( amt) as amt
FROM
(
select
portfolioID AS pID,
securityID1 AS sID,
sum( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,
SUM( TradeAmount) AS amt
FROM
vPortfolioTransaction
WHERE
transactionCode IN ( 'to', 'lo', 'wd', 'sl', 'ss')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
securityID1
UNION ALL
SELECT
PortfolioID,
securityID2,
SUM( CASE WHEN sectypecode2 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) ,
SUM( TradeAmount)
FROm
vPortfolioTransaction
WHERE
transactionCode IN ( 'by', 'dp')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
securityID2
) AS drv
GROUP BY
pID,
sID
) AS drvNeg
ONdrvPos.pID = drvNeg.pID
AND drvPos.sID = drvNeg.sID
ORDER BY
ISNULL( drvPos.pID, drvNeg.pID),
ISNULL( drvPos.sID, drvNeg.sID)
No, it's not pretty, and I'm sure I'll come up with ways to reduce the # of passes, but this gives you an idea of where I'm going with it. It will end up a lot more complex, which is part of why I was hoping for generic principals help. :blink:
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 16, 2010 at 4:34 pm
What I'd basically do is to move the decision whether to use securityID1 or securityID2 into a CASE statement instead of a UNION to avoid running a table scan/seek more often than needed.
I'm not sure if the FULL JOIN can be replaced with a CASE statement as well. At the moment it looks like it's possible. It would be great to have some sample data and expected result set to play with.
DECLARE @DataUntilDate DATETIME
SET @DataUntilDate = GETDATE()
SELECT
ISNULL( drvPos.pID, drvNeg.pID) AS pID,
ISNULL( drvPos.sID, drvNeg.sID) AS sID,
ISNULL( drvPos.qty, 0) - ISNULL( drvNeg.qty, 0) AS qty,
ISNULL( drvPos.amt, 0) - ISNULL( drvNeg.amt, 0) AS amt
FROM
(SELECT
portfolioID AS pID,
CASE WHEN transactionCode IN ( 'sl', 'wd', 'in', 'dv') THEN securityID2 ELSE securityID1 END AS sID,
SUM( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,
SUM( TradeAmount) AS amt
FROM
vPortfolioTransaction
WHERE
transactionCode IN ( 'ti', 'li', 'dp', 'by','sl', 'wd', 'in', 'dv')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
CASE WHEN transactionCode IN ( 'sl', 'wd', 'in', 'dv') THEN securityID2 ELSE securityID1 END
) AS drvPos
FULL OUTER JOIN
(SELECT
portfolioID AS pID,
CASE WHEN transactionCode IN ( 'by', 'dp') THEN securityID2 ELSE securityID1 END AS sID,
SUM( CASE WHEN sectypecode1 IN ( 'ca', 'ex', 'ep') THEN TradeAmount ELSE quantity END) AS qty,
SUM( TradeAmount) AS amt
FROM
vPortfolioTransaction
WHERE
transactionCode IN ( 'to', 'lo', 'wd', 'sl', 'by', 'dp')
AND tradeDate <= @DataUntilDate
GROUP BY
portfolioID,
CASE WHEN transactionCode IN ( 'by', 'dp') THEN securityID2 ELSE securityID1 END
) AS drvNeg
ON drvPos.pID = drvNeg.pID
AND drvPos.sID = drvNeg.sID
ORDER BY
ISNULL( drvPos.pID, drvNeg.pID) ,
ISNULL( drvPos.sID, drvNeg.sID)
September 16, 2010 at 4:38 pm
LutzM (9/16/2010)
I'm not sure if the FULL JOIN can be replaced with a CASE statement as well. At the moment it looks like it's possible. It would be great to have some sample data and expected result set to play with.
Yeah, that's part of the whole NDA problem at the moment. What results I'm looking for and the sample data are proprietary, and I'd need to setup a whole separate build to create the test data. I'll see what I can come up with on that front but it's not simple.
Still going through your code, and I believe you're right... Get back to you on that after a few rounds with the data. 🙂
EDIT: I should have said manually create the test data. I have no way of automating test data that will also give intelligent results.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 16, 2010 at 5:00 pm
I think all that's needed so far are 14 sample rows (one for each transactionCode value). If there are more than those 14 values mentioned so far you should include a sample of the ones not mentioned yet.
From my point of view, the easy part is to set up the sample data. To come up with values for all remaning columns leading to a result set that can be verified to either be correct or not is a totally different story.... Absolutely understood.
From my point of view, you'll have plenty of time: I'm going to take a nap (it's 1am over here) followed by a few hours at the place where I make the money for living. So it's going to be at least 15hrs before I'll have a chance to get back here.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply