July 7, 2010 at 11:49 am
Hello All.
I am using 2005 Standard Edition. I am NOT a DBA but have enough knowledge to do damage. I will do some testing on the question that will follow. I am looking for someone that may have some experience doing this and give me a quick yes/no/maybe.
I store very large sets of sales transaction data. A typical table will have 100 million rows in it. Some have reached 500 million. The table looks similar to:
TRN_DT PK, INDEX
ITEM_NUM PK, INDEX
LOC_NUM INDEX
QTY
PRICE
COST
TRN_TYP
A typical query would be to search for a transaction with the WHERE clause containing a date and item number.
A thought i had was this -
If i added a column - lets call it PRTN_KY (Partition Key) and gave it a value based on the date. Example - all records with a transaction date in Jan & Feb have a PRTN_KY = 1; Mar & Apr, PRTN_KY = 2 etc.
If this column is indexed and included in queries, should i get a performance benefit??
A query today may take a minute to return values. So i'm not dealing with much pain.
So to repeat, i'm being lazy right now and looking probably for a theoretical "yes that will improve" or "do it the right way and test it dummy"
thanks in advance for your attention
July 7, 2010 at 12:28 pm
PeterG-377490 (7/7/2010)
I am NOT a DBA but have enough knowledge to do damage.
Don't worry. We all started that way.. @=)
PeterG-377490 (7/7/2010)
A typical query would be to search for a transaction with the WHERE clause containing a date and item number....
If this column is indexed and included in queries, should i get a performance benefit??
Why do you need to add a column when you have your index right there? Date and Item #.
Because you already have at least two indexes, possibly three, I doubt you'll do anything but slow down your queries if you add another index. The table is so small, that covering it with indexes only makes it slower, not faster.
If the PK index is a composite index (two columns combined) and the Loc_Num column is not usually used in indexing, I advise removing the Loc_Num index and create a non-clustered index for Item # and INCLUDE the date field. That way most of your queries are covered by the new index and you're not adding an index so much as substituting one. Then maybe it will improve performance.
Adding more indexes is not always a good thing. And you should never have so many that you have half as many indexes as you do columns in your table. Speaking from experience, it only causes issues.
BIG EDIT: I just re-read your OP and noticed you already have the two columns in questioned indexed. So, no, adding another index will NOT help your performance. Sounds like maybe you need a statistics updates if you're having problems or you need to seriously review your query designs.
July 7, 2010 at 12:45 pm
Thanks for the response. I like hearing from smart people!!!
My thought process was due to the table would carry records that spanned 18 months and again possibly 500 million rows. So i would have used the fake partition key column and maybe used 4 to 6 values. If i included the partition key (indexed) in my query along with the date, i thought there may be a chance of better performance.
I will try your suggestions. They make a lot of sense.
While i have your attention Brandie - I mentioned the typical query. That query will also return the sales information. Should i use those columns as an included column in an existing index??
thanks again
Peter
July 7, 2010 at 3:38 pm
A query retrieving a unique row by PK should be a subsecond query don't matter if the table has 1 million or 1 billion rows.
Would you mind in tracing the query and publishing the execution plan?
On the other hand, partitioning is usually a great tool for administrative purposes like for implementing archiving and/or purging strategies but usually doesn't help that much with performance. There is a big exception to this, exception is when queries have to return large numbers of rows that can be sitting in a single partition - in this kind of scenario a full scan on a partition may be cheaper than index-scan plus table-access.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 8, 2010 at 4:20 am
PeterG-377490 (7/7/2010)
While i have your attention Brandie - I mentioned the typical query. That query will also return the sales information. Should i use those columns as an included column in an existing index??
Find the query columns most commonly used and put those in the INCLUDE clause, but do NOT try to index based on every single query you might get. Otherwise you're shooting yourself in the foot and doing the same thing as you'd be doing if you indexed every column in the table.
Are you actually having performance issues on your queries? Or are you trying to be proactive?
July 8, 2010 at 8:35 am
Trying to be proactive.
I have left out some detail. Let me give the full scenario:
Sales transaction database. 100's millions of rows. I built an access front end for users to view reporting. One such report is an "attachment" report. They want to know affinities between products. "When a customer buys a printer, how many buy ink?" would be an example. I allow any choice so there is no pre-aggregation. I have a stored procedure and pass a date range (CLD_DT - Indexed) and Item number (indexed). I return the sales measures (units, dollars, cost). My WHERE clause is a query that selects distinct TRANS_IDs that contain the Item they chose. The main query counts the number of transactions and sums the measures. Using ADO I dump the store procedure into a recordset and the recordset into a local table on the users desktop that is the source of the report. The report comes back on average in about 2 minutes maybe 3 on a slow day. Considering the "asked question", i'm happy with the performance.
So as i mentioned my thought was if i had a column of data that immediately segmented the data on chunks of time, i could help things. But your right, as i look at the table, index space could grow to 2 times or more of the data space.
Sounds like if i let the sales columns be included columns, i could speed things up. And again, I don't have unhappy users...just looking to "blow" their minds with performance.
thanks again
Peter
July 8, 2010 at 8:45 am
PeterG-377490 (7/8/2010)
Trying to be proactive.Sales transaction database. 100's millions of rows. I built an access front end for users to view reporting.
Sounds like if i let the sales columns be included columns, i could speed things up. And again, I don't have unhappy users...just looking to "blow" their minds with performance.
If you're looking to blow their minds with performance, the last front end tool I'd use for reporting is MS Access. There's a serious performance issue between using Access front ends for a SQL database. And the database itself is so huge that Access would have a hard time processing that kind of report anyway.
If you're being proactive, and this isn't a problem, then why haven't you tested this solution yet? It sounds like you're fishing for a rubber-stamp agreement on your idea. Unfortunately, we don't know anything about your hardware and database structure, let alone your queries, so we can't just say "Yes, that will work" because it varies depending on those things.
The best way to find out if it'll work is to test it both before and after you've made your changes. Then let us inquiring minds know after the tests which worked best.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply