Partition of a large Table

  • I have a very large table with 3 millions of records, where the columns are stored as rows. i.e instead of having 150+ columns for a business entity, the data for each of these entity is stored as 150+ rows in key/value pairs.

    These rows are then exposed through 4-5 views with N number of columns - N being a subset of these rows for each entity - using pivot statement. I will not able to revert back the table design as this is the business requirement.

    Problem:

    1. Operations such as selects/deletes are getting slower as the table size increases

    I am thinking of partition this table on the key value but need your valuable suggestions.

    Cheers,

    Got an idea..share it !!

    DBA_Vishal

  • I don't think partitioning the table on the Key column will speed up anything, you'll still end up doing 150+ scans for each query.

    You need to find a partition function that will group the records together by business entity, by date or by some kind of ID so any query end's up querying a single partition (or as few as possible). Keep in mind you can have a maximum of 1000 partitions.

    If you could give us the table structure and a small data sample (or a way to generate some of it) we could give you a more accurate answer.

    Your main problem is obviously the design required by the business, it's just plain bad, some of the normalization rules are there for a reason.

    If you are using views to feed the output to the business, maybe it's possible to change the underlying data to a more efficient form without them having toe change their application in any way.

  • Try some indexes. 3 millions records... it is small table for partitioning.

  • 3 million rows is nothing. I have a table with 60 billion row of a similar design.

    But this table is not used like your usage.

    I have 6 other tables with 200+ columns and more than 1 billion rows. With proper indexing things run quite fast.

    If your data is mostly Read, or almost ReadOnly, and

    as disk space is quite cheap, you might consider copying your data to one or more new tables, and make your views point to this table/these tables

    Table partition is IMHO most about ease of maintenace, not about improving speed. And it is Enterprise Edition only. Yes, we use table partitioning, but only so that we can insert, reindex and delete data without days of downtime.

    HIH,

    Henrik Staun Poulsen

  • This requires classic tuning procedure.

    Collect few typical queries, and start with the one that is used the most.

    Measure logical reads, eg. like this:

    SET STATISTICS IO ON

    SELECT * FROM sys.all_objects -- put your query here instead of this

    Run the query and look at the "Messages" tab of SQL Server Management Studio.

    There you should see something like this:

    (9670 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'sysobjrdb'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syspalnames'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'sysschobjs'. Scan count 1, logical reads 121, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syssingleobjrefs'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Total logical reads here is 0+29+4+121+8 = 162 pages (1 page = 8KB).

    Your goal is to alter physical structure to make that number at least 10 times lower.

    You can try different things:

    1) Create one index that covers all 3 columns e.g. (ID, KEY, VALUE)

    2) Create clustered index on (ID,KEY)

    3) UPDATE STATISTICS yourtablename WITH FULLSCAN

    4) Maybe query optimizer choosed suboptimal execution plan - check the execution plan

    5) Maybe other tables involved are the problem

    After each of these steps please MEASURE total logical reads to see if your changes made any good. Do that on test system or copy of the table. If you achieved 10 or 100x less logical reads - you did it!

    Good luck!

    P.S. It is really not a big table. Below 10 000 000 rows there is no much point to consider partitioning. Partitioning is not like setting "FAST=TRUE" option. By partitioning you make one set of commands running faster, while other set will run SLOWER after partitioning. Those kind of queries that have partition key in WHERE clause to they stick to one (or very few) partition will become faster, but other queries will become slower.

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • You have unfortunately chosen a horrible storage construct for your data known as Entity Attribute Value Pairs. Your use of it is absolutely unscalable and there is essentially nothing you can do about it other than refactor your application to stop using it. For "normal" access patterns EAVP is orders of magnitude more costly than row-based storage and retrieval. I have had to help several startups deal with this same issue. Works great on dev/test data volumes then dies a horrible death once the rowcounts start increasing. Sorry to have to be the bearer of bad news.

    As for partitioning, as another poster said, it is primarily for management benefits, not performance. You CAN see performance benefits if the stars align (i.e. you can routinely get partition elimination to reduce the data you need to hit) but with EAVP that is a) unlikely and b) still won't keep you from having to rehydrate a row of data every time you need it.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I have had to deal with audit data stored like that. then they (supplier) decided it would be a good idea to add a tab to show the user the audit data... yeah, that lasted all of a couple of hours.

    As above, fine used in DEV/TEST environments, but when large amounts of data is thrown at it (and 3 million records is not large, I have multiple tables with 200 million rows and 200+ columns, works fine :cool:) it simply does not work.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply