December 20, 2012 at 4:06 am
Hi folks,
I'm struggling to come up with ideas of how to improve the performance of the following query:
SELECT
datename(mm, dbo.Trade.TradeDate),
dbo.DataSource.Name,
datename(yy, dbo.Trade.TradeDate),
sum(dbo.Trade.SalesValueCcy * SalesAttribution.Percentage),
SUM(dbo.Trade.SalesCreditCcy * SalesAttribution.Percentage),
sum(dbo.Trade.VolumeCcy * SalesAttribution.Percentage),
SUM(dbo.Trade.MarkupCcy * SalesAttribution.Percentage),
TradingDepartment.Description,
TradingDesk.Description,
SalesDepartment.Description,
SalesDesk.Description,
TPIndustrySubSector.Description,
TPReportingResponsibility.Description,
ImmediateParent.Description,
TopParent.Description,
Customer.Description,
CustReportingCountry.Description,
CustReportingResponsibility.Description,
CustRepCustomerIndustry.Description,
ParentProduct.Description,
Product.Description
FROM
dbo.Trade
INNER JOIN dbo.TradeMember ON dbo.Trade.TradeID=dbo.TradeMember.TradeID
INNER JOIN dbo.Member Product ON dbo.TradeMember.ProductID=Product.MemberID
INNER JOIN dbo.Dimension ParentProduct ON Product.ParentDimensionID=ParentProduct.DimensionID
INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID
INNER JOIN SalesAttribution ON dbo.Trade.SalesAttributionGroupID=SalesAttribution.SalesAttributionGroupID
INNER JOIN dbo.Member SalesDesk ON SalesDesk.MemberID=SalesAttribution.SalespersonDeskID
INNER JOIN dbo.Dimension SalesDepartment ON SalesDepartment.DimensionID=SalesDesk.ParentDimensionID
INNER JOIN dbo.Member Customer ON Customer.MemberID=dbo.TradeMember.CustomerID
INNER JOIN dbo.CustomerMemberDetail TradeCustomer ON TradeCustomer.MemberID=Customer.MemberID
INNER JOIN dbo.Member CustRepCustomerIndustry ON CustRepCustomerIndustry.MemberID=TradeCustomer.ReportingIndustryID
INNER JOIN dbo.Dimension CustRepCustomerIndustrySubSector ON CustRepCustomerIndustrySubSector.DimensionID=CustRepCustomerIndustry.ParentDimensionID
INNER JOIN dbo.Member CustReportingCountry ON CustReportingCountry.MemberID=TradeCustomer.ReportingCountryID
INNER JOIN dbo.Dimension CustReportingResponsibility ON CustReportingResponsibility.DimensionID=CustReportingCountry.ParentDimensionID
INNER JOIN dbo.Member ImmediateParent ON ImmediateParent.MemberID=TradeCustomer.ImmediateParentID
INNER JOIN dbo.CustomerMemberDetail ImmediateParentDetails ON ImmediateParentDetails.MemberID=ImmediateParent.MemberID
INNER JOIN dbo.Member IPRepCustomerIndustry ON IPRepCustomerIndustry.MemberID=ImmediateParentDetails.ReportingIndustryID
INNER JOIN dbo.Dimension IPRepCustomerIndustrySubSector ON IPRepCustomerIndustrySubSector.DimensionID=IPRepCustomerIndustry.ParentDimensionID
INNER JOIN dbo.Member IPReportingCountry ON IPReportingCountry.MemberID=ImmediateParentDetails.ReportingCountryID
INNER JOIN dbo.Dimension IPReportingResponsibility ON IPReportingResponsibility.DimensionID=IPReportingCountry.ParentDimensionID
INNER JOIN dbo.Member TopParent ON TopParent.MemberID=TradeCustomer.TopParentID
INNER JOIN dbo.CustomerMemberDetail TopParentDetails ON TopParentDetails.MemberID=TopParent.MemberID
INNER JOIN dbo.Member TPIndustry ON TPIndustry.MemberID=TopParentDetails.IndustryID
INNER JOIN dbo.Dimension TPIndustrySubSector ON TPIndustrySubSector.DimensionID=TPIndustry.ParentDimensionID
INNER JOIN dbo.Member TPReportingCountry ON TPReportingCountry.MemberID=TopParentDetails.ReportingCountryID
INNER JOIN dbo.Dimension TPReportingResponsibility ON TPReportingResponsibility.DimensionID=TPReportingCountry.ParentDimensionID
INNER JOIN dbo.Member TradingDesk ON TradingDesk.MemberID=dbo.TradeMember.TradingDeskID
INNER JOIN dbo.Dimension TradingDepartment ON TradingDepartment.DimensionID=TradingDesk.ParentDimensionID
WHERE
(
( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))
)
GROUP BY
datename(mm, dbo.Trade.TradeDate),
dbo.DataSource.Name,
datename(yy, dbo.Trade.TradeDate),
TradingDepartment.Description,
TradingDesk.Description,
SalesDepartment.Description,
SalesDesk.Description,
TPIndustrySubSector.Description,
TPReportingResponsibility.Description,
ImmediateParent.Description,
TopParent.Description,
Customer.Description,
CustReportingCountry.Description,
CustReportingResponsibility.Description,
CustRepCustomerIndustry.Description,
ParentProduct.Description,
Product.Description
Here are the tables involved:
dbo.Member
CREATE TABLE [dbo].[Member](
[MemberID] [int] IDENTITY(1,1) NOT NULL,
[Code] [nvarchar](100) NOT NULL,
[Description] [nvarchar](200) NOT NULL,
[ParentDimensionID] [int] NOT NULL,
[DimensionTypeID] [int] NOT NULL,
CONSTRAINT [PK__Member] PRIMARY KEY CLUSTERED
(
[MemberID] ASC
)
dbo.Dimension
CREATE TABLE [dbo].[Dimension](
[DimensionID] [int] IDENTITY(1,1) NOT NULL,
[ParentDimensionID] [int] NULL,
[Code] [nvarchar](100) NOT NULL,
[Description] [nvarchar](200) NOT NULL,
[DimensionLevelID] [int] NOT NULL,
CONSTRAINT [PK__Dimension] PRIMARY KEY CLUSTERED
(
[DimensionID] ASC
)
dbo.Trade and dbo.TradeMember
Two tables that consist of captured trade features: Trade table mainly includes trade values and TradeMember stores all the IDs that are referenced to Member table(things like Product, Customer etc)
These tables are quite big(several millions rows usually) and wide: Trade table has about 70 columns, TradeMember - 15.
dbo.SalesAttribution
Again quite wide table(about 15 columns), but here are the most important ones
CREATE TABLE [dbo].[SalesAttribution](
[SalesAttributionID] [int] IDENTITY(1,1) NOT NULL,
[SalespersonID] [int] NOT NULL,
[SalespersonDeskID] [int] NOT NULL,
[Percentage] [numeric](11, 10) NOT NULL,
[SalesAttributionGroupID] [int] NOT NULL,
CONSTRAINT [PK_SalesAttributionID] PRIMARY KEY CLUSTERED
(
[SalesAttributionID] ASC
)
So every row in Trade table is tagged with SalesAttributionGroupID, which can correspond to at least one(could be more) rows in SalesAttribution table.
dbo.CustomerMemberDetail
Again really wide table(about 25 columns) that includes more detailed info about Customer. I will not include this table description here to save space, but just to mention that all the columns used in joins are indexed.
When I pull all data from big and most important tables(Trade JOIN TradeMember JOIN SalesAttribution) everything goes fine(takes appropriate amount of time), but when I start adding Member hierarchical info on top(joins with Member and Dimension tables, and therefore more grouping) this is when it all starts to go nuts.
Here are IO stats:
Table 'DataSource'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dimension'. Scan count 0, logical reads 5580024, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Member'. Scan count 1860008, logical reads 11160048, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesAttribution'. Scan count 465002, logical reads 930592, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CustomerMemberDetail'. Scan count 0, logical reads 2790012, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TradeMember'. Scan count 0, logical reads 1395006, physical reads 1029, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Trade'. Scan count 3, logical reads 204444, physical reads 335, read-ahead reads 169197, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
I am concerned about stats on Member and Dimension table(too many scans and logical reads). It's interesting that Clustered index seek is being used on Dimension table(I guess this is why scans = 0) and NonClustered on Member. I did try to play a bit with those, but no luck.
The problem with the query in general is that I can't implement covering indexes on Trade and TradeMember tables as this is Business Objects generated query and the fields are picked by user(so can be different every time). And in this case the index on TradeDate would not be selective enough anyway. And as I said before when I drop joins with Member and Dimension tables( and therefore grouping), everything works fine.
Please find the actual execution plan attached. I didn't include the indexes existing on used table, as then the post would be too long(it already is, sorry), but if needed I can provide that info.
Can I do anything to get rid of those scans and reduce the reads on Member and Dimension tables? Or do anything else to speed up the query? Any help or ideas will be highly appreciated.
Regards,
Igor
December 20, 2012 at 7:02 am
Hmm, odd.
Im interested in this part of the plan
Which is pretty much the very first join between datasource and trade.
The estimate from the hash join is 1 , the actual row count is 465,002 !
Not much of a difference.
2005 is getting a bit old now as is my memory of its differences over 2008/2012.
You may be better of 'caching' the result of this join into a temp table and driving the rest from that.
Edit:
Just to elaborate on why this is bad :
As it thinks that there is 1 row, it has chosen to do loop joins to the other tables in the plan.
This would be good if there were 1 row but theres not, theres 465,000.
If it knew there were 465,000 it would have chosen a hash (or merge) join which will be more efficient
Dave
December 20, 2012 at 7:56 am
Hello Dave,
Thank you very much for reply.
I was wondering about the fact that you noticed. I tried to execute a similar query in our different client's TEST environment
Here is the code(just to emphasize that the queries are similar)
SELECT
datename(mm, dbo.Trade.TradeDate),
dbo.DataSource.Name,
datename(yy, dbo.Trade.TradeDate),
sum(dbo.Trade.TradeValueCcy),
SUM(dbo.Trade.SalesCreditCcy),
sum(dbo.Trade.TradeVolumeCcy),
SUM(dbo.Trade.MarginValue),
TradingDepartment.Description,
TradingDesk.Description,
SalesDepartment.Description,
SalesDesk.Description,
Customer.Description,
ParentProduct.Description,
Product.Description
FROM
dbo.Trade
INNER JOIN dbo.TradeMember ON dbo.Trade.TradeID=dbo.TradeMember.TradeID
INNER JOIN dbo.Member Product ON dbo.TradeMember.ProductID=Product.MemberID
INNER JOIN dbo.Dimension ParentProduct ON Product.ParentDimensionID=ParentProduct.DimensionID
INNER JOIN dbo.Member SalesDesk ON SalesDesk.MemberID=TradeMember.SalesDeskID
INNER JOIN dbo.Dimension SalesDepartment ON SalesDepartment.DimensionID=SalesDesk.ParentDimensionID
INNER JOIN dbo.Member Customer ON Customer.MemberID=dbo.TradeMember.CustomerID
INNER JOIN dbo.CustomerMemberDetail TradeCustomer ON TradeCustomer.MemberID=Customer.MemberID
INNER JOIN dbo.Member TradingDesk ON TradingDesk.MemberID=dbo.TradeMember.TradingDeskID
INNER JOIN dbo.Dimension TradingDepartment ON TradingDepartment.DimensionID=TradingDesk.ParentDimensionID
INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID
WHERE
(
( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))
)
GROUP BY
datename(mm, dbo.Trade.TradeDate),
dbo.DataSource.Name,
datename(yy, dbo.Trade.TradeDate),
TradingDepartment.Description,
TradingDesk.Description,
SalesDepartment.Description,
SalesDesk.Description,
Customer.Description,
ParentProduct.Description,
Product.Description
And the execution plan proves that your idea might be right. Please find it attached.
Here are the IO stats:
Table 'DataSource'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dimension'. Scan count 9, logical reads 5694, physical reads 4, read-ahead reads 1715, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Member'. Scan count 12, logical reads 11492, physical reads 4, read-ahead reads 2620, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TradeMember'. Scan count 3, logical reads 224456, physical reads 398, read-ahead reads 204206, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Trade'. Scan count 3, logical reads 2451785, physical reads 485, read-ahead reads 2176232, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
I appreciate that the databases and queries are different and the performance difference can be caused by that.
Dave Ballantyne (12/20/2012)
Just to elaborate on why this is bad :As it thinks that there is 1 row, it has chosen to do loop joins to the other tables in the plan.
This would be good if there were 1 row but theres not, theres 465,000.
If it knew there were 465,000 it would have chosen a hash (or merge) join which will be more efficient
Sorry for probably stupid question, but how to let SQL engine know that there are 465,000 rows? I rebuilt all the indexes(thought it might help), but had no luck with it.
And one not related question: what program do you use to view SQL executions plans? Really liked number of estimated rows displayed next to the edge.
Thanks a lot once again.
Small update: the query I'm working on is generated by Business objects(reporting tool), so can't really use a temporary table or modify SQL(this is not 100 correct, I can change joins in BOBJ designer, which affects the way it generates SQL)
Igor
December 20, 2012 at 8:28 am
I would be interested in seeing the plan for this ..
Select TradeID,DataSource.DataSourceID
FROM
dbo.Trade
INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID
WHERE
(
( dbo.Trade.TradeDate >= DATEADD(DD, -1 * DAY(GETDATE()), DATEADD(MM, -1 * MONTH(GETDATE()) + 1, GETDATE())))
)
As the orignal plan timed-out this *could* come back with a decent plan.
Sorry for probably stupid question, but how to let SQL engine know that there are 465,000 rows? I rebuilt all the indexes(thought it might help), but had no luck with it.
Thats the problem, both of the tables here have a decent estimate of rows but its the join output that doesent.
This could be a bug/hole in the optimizer the above will help prove/disprove that.
Im anycase it still sounds to me a better option to use the above query , store the results in a temp table and drive the remainder from that.
HTH
The plan view i use is sql sentry plan explorerer , its a free download too 🙂
Dave
December 20, 2012 at 1:19 pm
It looks like cardinality estimation error. Please update statistics WITH FULLSCAN on DataSource and Trade tables.
Also in the original query you can try to replace
INNER JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID
with
LEFT JOIN dbo.DataSource ON dbo.DataSource.DataSourceID=dbo.Trade.DataSourceID
December 21, 2012 at 4:34 am
Thank you both very much.
Updating statistics on Trade and DataSource tables resolved the issue.
Regards,
Igor
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply