July 27, 2016 at 9:03 pm
Hi,
We have a Dimension table in our SQL Server 2012 Datawarehouse that has over 6.5 billion records and sits currently at 1.5TB with 800GB of data and 700GB of Indexes.
This table hold prices changes for Bet records in our Fact tables (we are an online bookmakers). It is Page Compressed. Changes are recorded as Type 2.
We are running on a Hitachi VSP G400 Storage Array Storage array.
My question is what to do with this table? Its far and away the biggest table we have and it is a cause of performance issues across the environment. I don't really see partitioning as helping as there isnt an obvious partition key - would archiving me the only way forward?
Would be interesting to hear suggestions.
And just as a more general question do people still partition Fact tables even when running on a SAN? does it still matter?
Thanks for reading!
July 28, 2016 at 8:40 am
JayK (7/27/2016)
Hi,We have a Dimension table in our SQL Server 2012 Datawarehouse that has over 6.5 billion records and sits currently at 1.5TB with 800GB of data and 700GB of Indexes.
This table hold prices changes for Bet records in our Fact tables (we are an online bookmakers). It is Page Compressed. Changes are recorded as Type 2.
We are running on a Hitachi VSP G400 Storage Array Storage array.
My question is what to do with this table? Its far and away the biggest table we have and it is a cause of performance issues across the environment. I don't really see partitioning as helping as there isnt an obvious partition key - would archiving me the only way forward?
Would be interesting to hear suggestions.
And just as a more general question do people still partition Fact tables even when running on a SAN? does it still matter?
Thanks for reading!
1) Yes, partitioning absolutely matters on a SAN. But remember (as almost everyone initially assumes) that partitioning was NOT created to make queries run faster. It was created to facilitate large-scale data load/movement/maintenance.
2) Having said that, it certainly CAN make appropriately-crafted queries run much faster and have better concurrency if partition elimination can be achieved. But given your statement that there isn't an obvious partitioning key(s) this probably isn't relevant.
3) The design I am envisioning from your post seems flawed. At first blush I would think the actual bet values would be in the fact table, with a dimension that was the actual thing being bet on and a dimension of bettors. Maybe your dimension table is storing the changes in odds as the bets fluctuate? But I would think that may not need all values stored for ever - once the event was complete and the final odds calculated you could archive out all the values necessary for it's computation.
4) Which brings me into DEFINITE agreement with your thought about data archival. Any rows that don't need to be in that monster table should absolutely be deleted/archived out!!!! Disregarding the massive problem of having to scan that beast, even a single byte lookup in a nonclustered index could take a dozen or more pages in the balanced binary tree with 6.5B rows in the table!
5) Sounds like an exceptionally fun system to work on!! If you need someone to take a look and help out I would give you a discount rate to have a crack at it. 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply