October 19, 2015 at 8:15 pm
Hi All.
I was asked to create a report on very large data. Below are the requirements. Any help would be greatly appreciated. Thanks.
We have a very large fact table with 100 millions of records. I will have to create a report on that table. I just created a query and it is taking more than an hour to retrieve Month's worth of data. As this report will be used by our customers, it should not take longer than 15 seconds. Would creating tables with precalculated values help improve the performance.
Thanks.
October 19, 2015 at 8:36 pm
Make sure you have a clustered index on "record date time" column.
Then selecting of a monthly range of records will work really fast, and the rest of conditions would apply to that small subset, so whatever they are they could not affect performance of the query.
_____________
Code for TallyGenerator
October 20, 2015 at 12:50 am
Sergiy (10/19/2015)
Make sure you have a clustered index on "record date time" column.
I would only re-create a clustered index if there isn't one already - any existing clustered index may well be critical to something else.
If a clustered index already exists then I would try creating a non-clustered index on "record date time" to improve the report
October 20, 2015 at 2:43 am
Table definitions, index definitions, query and actual execution plan please.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 20, 2015 at 5:44 am
with 100 million rows your better off using a Columnstore index or moving the table to SSAS. If the report requires data on a monthly basis partitioning it by month could help improve the performance. What you need to understand in this case is that most of the performance issue your facing is likely because of the massive IO of fetching data from disk , or aggregating large data sets (CPU issues), indexes help search for an eliminate unwanted rows, but if you still have millions of rows after the fact your will still face performance issues that require a suitable technology to be used.
Summary tables are a good solution but you will loose granularity and might need to create many different summaries at different grouping for different reports and this can become unmanageable over a period of time.
October 20, 2015 at 6:58 am
A fact table with 100 million rows is not that large comparing it to the tables I had in my previous company. However, good indexing is important. We can't give any advice without the information asked by Gail. For more detailed explanation on what we ask, read the following article: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
October 20, 2015 at 7:48 am
Thanks all for your suggestions. Here is more information on the table design etc:
I will need to create a report on below tables.
Order Facts Table has below columns:
ID bigint Clustered, Date(datetime), StoreItemID int, CustomerID bigint
We also have a non clustered index on Date includes (Storeitemid,CustomerId)
Store Items table:
ItemID int Clustered, ItemName nvarchar(100), Item Category nvarchar(100)
The same Customer may purchase multiple times on the same day plus we have about 100 millions of active users. As I said facts table is pretty big with more than several millions of transactions each day.
Report should show customers by Item purchased in a month and Customers by Item Category in a month.
October 20, 2015 at 7:53 am
GilaMonster (10/20/2015)
Table definitions, index definitions, query and actual execution plan please.
Table and indexes as their CREATE statements please, it makes things easier to read
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 20, 2015 at 7:53 am
sreeya (10/20/2015)
Thanks all for your suggestions. Here is more information on the table design etc:I will need to create a report on below tables.
Order Facts Table has below columns:
ID bigint Clustered, Date(datetime), StoreItemID int, CustomerID bigint
We also have a non clustered index on Date includes (Storeitemid,CustomerId)
Store Items table:
ItemID int Clustered, ItemName nvarchar(100), Item Category nvarchar(100)
The same Customer may purchase multiple times on the same day plus we have about 100 millions of active users. As I said facts table is pretty big with more than several millions of transactions each day.
Report should show customers by Item purchased in a month and Customers by Item Category in a month.
GilaMonster (10/20/2015)
Table definitions, index definitions, query and actual execution plan please.
The details you have posted are insufficient. Gail's requests represent the minimum requirement for a decent chance of offering you any help.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 20, 2015 at 7:59 am
Hi,
Index creation on date column and table partition would be a better option for creating reports on a big table.
October 20, 2015 at 8:42 am
sreeya (10/20/2015)
The same Customer may purchase multiple times on the same day plus we have about 100 millions of active users. As I said facts table is pretty big with more than several millions of transactions each day.
thats a lot of sales a day...being curious what industry are you involved in?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
October 20, 2015 at 6:43 pm
Kristen-173977 (10/20/2015)
Sergiy (10/19/2015)
Make sure you have a clustered index on "record date time" column.I would only re-create a clustered index if there isn't one already - any existing clustered index may well be critical to something else.
If a clustered index already exists then I would try creating a non-clustered index on "record date time" to improve the report
As we can see from the following post there is nothing important about existing clustered indexes. Just common junior mistake of putting a clustered index on an identity column.
But I would not easily suggest recreating/changing a clustered index on a table with 100 mil rows. Might be a quite long excercise.
There is still an option of using existing clustered index:
Select MaxID = MAX(ID), MinID = MIN(ID)
From ... Where -[Date] between @MinDate and @MaxDate
And then use those MaxID and MinID to limit the number of Ross processed by all other conditions in the WHERE clause.
Still need to apply the filter against the transaction date.
_____________
Code for TallyGenerator
October 21, 2015 at 8:24 am
Sergiy (10/20/2015)
Kristen-173977 (10/20/2015)
Sergiy (10/19/2015)
Make sure you have a clustered index on "record date time" column.I would only re-create a clustered index if there isn't one already - any existing clustered index may well be critical to something else.
If a clustered index already exists then I would try creating a non-clustered index on "record date time" to improve the report
As we can see from the following post there is nothing important about existing clustered indexes. Just common junior mistake of putting a clustered index on an identity column.
But I would not easily suggest recreating/changing a clustered index on a table with 100 mil rows. Might be a quite long excercise.
There is still an option of using existing clustered index:
Select MaxID = MAX(ID), MinID = MIN(ID)
From ... Where -[Date] between @MinDate and @MaxDate
And then use those MaxID and MinID to limit the number of Ross processed by all other conditions in the WHERE clause.
Still need to apply the filter against the transaction date.
Why not index Date in one table, Item in the other?
Assuming that anyone can see all Customers and Stores.
It would be interesting to see requested information for table definition.
And query plan in Profiler.
Would be very helpful to getting better feedback.
October 21, 2015 at 10:13 am
Jayanth_Kurup (10/20/2015)
with 100 million rows your better off using a Columnstore index or moving the table to SSAS. If the report requires data on a monthly basis partitioning it by month could help improve the performance. What you need to understand in this case is that most of the performance issue your facing is likely because of the massive IO of fetching data from disk , or aggregating large data sets (CPU issues), indexes help search for an eliminate unwanted rows, but if you still have millions of rows after the fact your will still face performance issues that require a suitable technology to be used.Summary tables are a good solution but you will loose granularity and might need to create many different summaries at different grouping for different reports and this can become unmanageable over a period of time.
Careful now... Partitioning it by month won't help anymore than having the proper index on a monolithic table and a proper query to take advantage of it. All else being equal, Partititioning can actually slow down queries thanks to the multiple B-Trees involved behind the scenes.
Not sure how you think moving a table to SSAS will actually improve performance.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2015 at 11:33 am
Jeff Moden (10/21/2015)
Jayanth_Kurup (10/20/2015)
with 100 million rows your better off using a Columnstore index or moving the table to SSAS. If the report requires data on a monthly basis partitioning it by month could help improve the performance. What you need to understand in this case is that most of the performance issue your facing is likely because of the massive IO of fetching data from disk , or aggregating large data sets (CPU issues), indexes help search for an eliminate unwanted rows, but if you still have millions of rows after the fact your will still face performance issues that require a suitable technology to be used.Summary tables are a good solution but you will loose granularity and might need to create many different summaries at different grouping for different reports and this can become unmanageable over a period of time.
Careful now... Partitioning it by month won't help anymore than having the proper index on a monolithic table and a proper query to take advantage of it. All else being equal, Partititioning can actually slow down queries thanks to the multiple B-Trees involved behind the scenes.
Not sure how you think moving a table to SSAS will actually improve performance.
Partitioning is more for having sliding windows of data for archiving. Surprising how many think it will improve performance.
If users are dumping details for all customers, all order lines anyways, you would be missing the whole point of SSAS.
Although just in setting up a cube to run, they would have to fix the core performance issue.
100 million rows in order table
100 million users who can have multiple orders per day
Millions of rows per day
Without a cross join between the tables, this just doesn't make sense to me
Hope the poster comes back with some details
Viewing 15 posts - 1 through 15 (of 104 total)
You must be logged in to reply to this topic. Login to reply