January 27, 2004 at 9:04 am
I have a sales summary table that is used to keep track of our sales with the following simplified structure. The table contains about 250 mil records (sales history for 3 years). This table is loaded as part of a DW job.
Sales
-------
SaleWeek (PK)
StoreId (PK) (150 possible values)
ItemId (PK) (~500,000 possible values)
SaleQuantity
SalesAmount
There are Item and Store tables too.
Item
------
ItemID
Category
Store
-------
StoreId
StoreChain
The clustered index (and PK) is on Saleweek, Store and Item. There is a NC index on Item, Store and saleweek. Most of the queries use the Clustered Index. However recently there is a requirement to calculate the 52 week sales sum for certain items (identified by category) for certain stores (identified by Chain). The query, simplified below, takes long and the query plan indicates it is using the NC index (as expected) and spending a good amount of time (over 80%) in the bookmark lookup.
select sum(SalesAmount) SalesSum
from Sales join Item on Sales.ItemId = Item.itemId
join Store on Sales.StoreID = Store.StoreID
where
saleweek between Dateadd (ww, -52, saleweek ) and saleweek
and item.Category = @Category
and Store.StoreChain = @Chain
I made a copy of the table, transposed the indexes (ie made the non-clustered index -- > clustered and vice versa), now it takes a fraction of what it did before. The question is, we need to do the 52 week running sales sum for each Category every week. If I were to keep this solution, I would need to keep two tables that are identical except different index structures.
I also need to modify the load process to load the weekly sales data into two tables.
Are there any other reasonable solutions?
Thanks.
January 28, 2004 at 4:40 am
First I would consider dropping the NC index. The reason is this.
The NC index does not contain pointers to pages it only contains the associated value in the Clustered index unless there is not one.
So say you have an record
store 2831
salesweek 10
item 784
The NC index looks like this based on your first described index structure
784,2831,10 10,2831,784
And thus it crosses the Clustered index looking for 10,2831,784.
Although composite indexes are nice they are a bane and consider poor candidates for Clustered indexes becuase of what I just pointed out, they would be replicated to all NC indexes to have a lookup thus increasing the size based on the width on the Clustered index in each NC index.
Also, for best performance during index selection it is better to put the most unique item at the front of a composite index becuase only the first column decides the index statistics which is what the Query Engine uses for decision making. The more unique the denser the index the higher likelyhood it will be used.
What I would consider for the index structure may not give best performance for the time you need to perform your current reporting needs but I would keep the PK and make it NC and index only the salesweek column for the Clustered index and see how things perform. If you need to query Store often then another NC index for Store may be usefully but that requires testing to know for sure.
Ultimately if this is a DW function you need to look at using OLAP (Analysis Services) to present the data in a cude for your specific needs in this reporting. Especially if you think they may request this again in the future.
January 28, 2004 at 7:09 am
I typically use a date diimension when I build data warehouses. Performance is a big reason for this. Your query requires the scanning of a large number of index entries to find the dates that fall within the date range. With a date dimension the scan to find the dates which satisfy the date range is very small when compared to the entire fact table. The equijoin between the date dimension and the fact table should then yield fairly good perforamnce.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply