June 14, 2010 at 5:51 am
Hi,
I have some tables that are used as fact tables for a SSAS cube.
For space reasons, the fact table is normalized into two tables, and two dimension tables that are referenced via foreign keys.
All tables have one column 'Version' that contains an integer that is unique.
Both fact tables have only one clustered index that contains only this version field.
The dimension tables have at least one key where the version is the first field.
The Cube uses a view to access the data that is a JOIN between all fact and dimension tables mentioned above. For selection, the Cube uses a 'where Version = x' restriction on the view.
The result from the view contains about 4 million records for each version, whereas the two fact tables have app. 60k records each and the dimension tables have 4k and 60k records, resp.
My problem is that the view is extremely slow when it comes to queries like
SELECT COUNT(1) FROM View WHERE Version=@version
The estimated execution plan tells me that it would use onjly index seeks, parallelism, bitmaps and hash matches.
But when it comes to the actual execution plan, it tells me about loops that it used and an enormous amount of intermediate results (up to 30 billion rows).
What can I do to improve the view's performance?
Guenter from Frankfurt / Germany
June 14, 2010 at 7:02 am
Are you able to post the code for the view and the actual execution plan?
How many distinct values of version make up the clustered key on the fact table?
Also using Count(1) serves no advantage as the query optimizer converts this to a Count(*)
June 15, 2010 at 12:07 am
Hi Steve,
all tables for the view are made up of about 100 distinct versions; all are loaede together; versions are always loaded in full.
The view is (about) as follows:
select
sen.DatumID
,sen.PositionID
,sen.Riskofaktor1ID
,sen.Riskofaktor2ID
,pos.InstrumentID
,sen.GeschaeftID as RisikofaktorGeschaeftID
,pos.GeschaeftID
,pos.KategorieID
,pos.PortfolioID
,sen.SensiTypID
-- more columns from sen
,pa.AggregationID
from
dbo.FaktSensitivitaet AS sen
inner join dbo.FaktPosition pos on sen.PositionID = pos.PositionID and sen.DatumID = pos.DatumID
inner join dbo.ZuoPositionAggregation pa on pos.PositionID = pa.PositionID and pos.DatumID = pa.DatumID
inner join dbo.ZuoRisikofaktorAggregation ra on sen.Riskofaktor1ID = ra.RisikofaktorID and sen.DatumID = ra.DatumID
and ra.AggregationID = pa.AggregationID and ra.DatumID = pa.DatumID
The underlying tables contain roughly the following number of records per version (DatumID):
The primary use for the view is for Cube partition processing, where the partition query statement is something like
select * from dbo.VWFaktSensitivitaetAggregation where DatumID <= 2010053199 and DatumID > 2010052700
This query starts delivering records from the first second on.
To examine if there is any data in the partition, I take the query from the cube definition and put a wrapper around it:
select top 1 1 from
(
select * from dbo.VWFaktSensitivitaetAggregation where DatumID=2010053100
) X
This produces the first execution plan (Execution Plan ViewSensiAgg TOP 1.sqlplan). Strange for me is that it takes about 3 Minutes before it delivers it's only "1" as a result.
Even stranger: if I modify my query with a "TOP 1000"
select top 1000 1 from
(
select * from dbo.VWFaktSensitivitaetAggregation where DatumID=2010053100
) X
it is significantly faster and produces the second execution plan (Execution Plan ViewSensiAgg TOP 1000.sqlplan).
Unfortunately, it seems that with growing data in the tables even the "top 1000" query takes minutes to respond...
Guenter
June 15, 2010 at 1:27 am
The problem here seems to be caused by bad statistics.
The seek from ZuoPositionAggregation has estimated rows = 18, but actual rows=113566
The seek from ZouRisikoFactorAggregation has estimated rows=1.8 M but actual rows=14.5 M
This results in a very bad estimate on the number of rows on the bottom input to the nested loop:
estimated rows=1.8M, actual rows=2165 M
So, to get a better execution plan you should make sure that the statistics on ZuoPositionAggregation and ZouRisikoFactorAggregation are fully updated on the DatumID column.
June 15, 2010 at 1:31 am
have you updated the statistics ?
i guess you need to update the statistics for
ZuoPositionAggregation 3.8 Mio
ZuoRisikofaktorAggregation 1.8 Mio
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
June 15, 2010 at 2:43 am
Thankx for your responses.
I have seen the bad statistics you pointed me on.
Until that I believed I had updated the statistics; between every load into the database and processing of the cube the following stored procedure is called:
ALTER PROCEDURE [dbo].[up_UpdateStatisticsOnFactTables]
with execute as self -- needs elevated rights, so use the installer user
AS
BEGIN
update statistics dbo.ZuoPositionAggregation WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.ZuoRisikofaktorAggregation WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.FaktPosition WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.FaktSensitivitaet WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
-- and all the other fact tables
END
I hoped that would do the job...
What's wrong with it?
June 15, 2010 at 2:52 am
gtschech (6/15/2010)
Thankx for your responses.I have seen the bad statistics you pointed me on.
Until that I believed I had updated the statistics; between every load into the database and processing of the cube the following stored procedure is called:
ALTER PROCEDURE [dbo].[up_UpdateStatisticsOnFactTables]
with execute as self -- needs elevated rights, so use the installer user
AS
BEGIN
update statistics dbo.ZuoPositionAggregation WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.ZuoRisikofaktorAggregation WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.FaktPosition WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
update statistics dbo.FaktSensitivitaet WITH SAMPLE 30000 ROWS, ALL, NORECOMPUTE
-- and all the other fact tables
END
I hoped that would do the job...
What's wrong with it?
It is possible that sample 30000 rows is not enough.
Try using FULLSCAN instead and see if it helps.
The tables are fairly small so the time required for a FULLSCAN should probably be acceptable.
June 15, 2010 at 6:08 am
Stefan and Bhuvnesh,
your guess was right. I did an UPDATE STATISTICS WWIT FULLSCAN and the query runs again.
Thanks for the hint - now I know a sample size of 30.000 is obviously too small.
Unfortunately, Stefans remark "the tables are fairly small" does not get the point - the Zuo-Tables are 40 rsp. 30 GB each, containing 735 mio and 350 mio records, so the update took about one hour...
So I have to find a way to update the statistic quicker, but with the same quality.
June 15, 2010 at 6:24 am
I see.
I never understood what you meant with your "Version"
I now understand that the number of rows you were talking about originally should be multiplied by the number of versions.
I suppose you need to find a balance between statistics accuracy and time to update the statistics.
June 15, 2010 at 7:15 am
Hi,
I started playing with the sample size. But even the 30.000 sample size now gives a super performance for my query...
It seems I have to delete or some sql server cache before requerying?
Besides that, I played with the TOP operator.
It looks very strange...
When I do a TOP 1000 on the query, the execution plan show me an expected number of 400 records on the FaktPosition table, whereas the actual number returned is 133879.
When I do a TOP 100 on the very same query, the execution plan expects 1.200.000 records, but gets 17.438.000!
These figures are the more strange as the query only filters by DatumID, so it should expect AND deliver the very same number of records for both approaches.
To make this even better: the actual number of rows in the FaktPosition table that match the DatumID used are 6.800.000 ...
SQL Server seems to pick randomly data :ermm:
I have attached the SQLPalns for the TOP 100 and TOP 1000 query, resp.
June 15, 2010 at 7:47 am
I think that you have to accept that in this case it is very difficult for the server to get good estimates of the number of rows needed.
When you use TOP 100 the server tries to estimate how many loop iterations will be needed to get 100 rows in the result set. If the guess is wrong, the results might get very bad.
Estimating the number of rows returned by a single condition is fairly easy with good statistics. It is much more difficult to estimate the number of rows resulting from a join between two large tables. It gets even more difficult when there are more than two tables involved.
Maybe you should try another approach.
If I understand you correctly you are only really trying to detect if there is data for a certain date in all the tables.
Perhaps you could just use a count of the number of matching rows based on the date in each source table. At least this will be very fast since you have an index on the DatumID column.
The problem is caused by the join of three really big tables.
June 15, 2010 at 7:57 am
Stefan,
i'm afraid you're right.
The bad thing is that using my approach is the only way to be somewhat generic about the view. I am using this inside a custom SSIS component that creates partitions in a cube. The only information about the underlying data is the partition query of the cube partition, and that is just the Select * from View. The component has no idea about how the view is structured, not even if it's a view anyway; for other facts the respective partition goes directly to a table...
So it comes to the old fact - don't be generic when it comes to performance 🙁
Nevertheless - many thanks for your suggestions!
June 15, 2010 at 12:17 pm
Günter,
did you try using an EXISTS query to see if there are any data in your view?
Something like
IF EXISTS (select 1 from dbo.VWFaktSensitivitaetAggregation where DatumID=2010053100) ...
Since all you want to do is to check for existence I'd give it a try and compare the duration with the other options.
June 15, 2010 at 11:08 pm
So finally you got some improvement after updating statistics. right ?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
June 15, 2010 at 11:41 pm
Yes,
updating the statistics did help.
But I wonder if maybe I should switch to an indexed view for better View retrieval performance.
Have you ever done that?
But maybe I should first browse the site 😉
tnx for your suggestions!
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply