May 22, 2014 at 10:00 am
I'm at a loss on query behavior for an environment I've inherited... and I'll admit now that there are parts about the environment I do not understand.
Here are the particulars... SQL 2005 running in a VCenter VM with a fibre attached Compellant SAN for physical storage. The box has 14g allocated to SQL and 20g allocated to the OS (I don't know why it is configured that way). MAXDOP is currently at Zero... I'm planning on changing that shortly.
Anyway... Here's the crazy part. I have a table with 92 million rows. The table is partitioned. The query is not complex and the index plan shows the index is being used.
Select load_date, account_nbr, open_date from share where load_date ='2014-05-21 00:00:00'
Load_date, account_nbr, open_date are indexed (not clustered).
This query returns the first row instantaneously, and fetches 400K rows in an impressive 45 seconds. I'm VERY happy with that.
Now, If I change the query to include a column outside the index (of any data type)... I won't see my first row for 3+ minutes, and it takes another 3 minutes to fetch the same 400K rows.
According to the Compellant guy... reads are occurring @ 400m/s. Processors on the box never get above 4%. There are no significant wait states visible in Activity Monitory
I'm at a loss as to where to look next... why would including a single column outside the index send me into such a tail spin.
Any ideas on what to try?
May 22, 2014 at 10:05 am
The query might be experiencing the bad type of parameter sniffing. Not sure if the table has a Clustered Index on it or not, but you are likely experiencing a key or RID lookup. The query is also probably going from a seek to a scan. That will add an extra table hit and extra cost.
But to really be able to help, we'd need to see the actual execution plans and table structure.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 10:21 am
I agree with SQLRNNR, and to test this you can add the extra field you are selecting to the current nonclustered index as an included field and see if that improves performance.
Be still, and know that I am God - Psalm 46:10
May 22, 2014 at 10:51 am
Thank you Jason. There are no clustered indexes... (there could be... I just don't know about clustered indexes with the partitions so I have not ventured down that road yet).
You are correct on the seek to scan (I checked the query plan for my "first" speedy version, and saw the index was being used, but never looked back after adding columns to see if it was still being used...turns out ... it's not).
My table is very wide... If seeing the layout of the table adds insight after this, I can post that as well. Based on your tip... here is what I see.
The fast version reports the following:
select sh.dl_load_date, member_nbr, share_nbr, open_date from share sh
where sh.dl_load_date = '2014-05-21 00:00:00'
and closed = 0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Index Seek(OBJECT:([VCUWarehouse].[dbo].[SHARE].[idx_Share_Mbr] AS [sh]), SEEK:([sh].[DL_LOAD_DATE]='2014-05-21 00:00:00.000'), WHERE:([VCUWarehouse].[dbo].[SHARE].[CLOSED] as [sh].[CLOSED]=(0)) ORDERED FORWARD PARTITION ID:((180)))
When I add a column to the select that is not present in the index... we end up doing a table scan... which I now know is on Tier 3 of the compellant (10K drives) for 92 million rows... no wonder it takes 3 minutes. The million dollar question I have now is... why does adding a non indexed column to the select statement cause the index plan to not use the index on the first 2 keys in one of the indexes (the one it used before)...
StmtText
-------------------------------------------------------------------------------------------------------------------------------------------------------
select sh.dl_load_date, member_nbr, share_nbr, open_date, share_type
from share sh
where sh.dl_load_date = '2014-05-21 00:00:00'
and closed = 0
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Table Scan(OBJECT:([VCUWarehouse].[dbo].[SHARE] AS [sh]), WHERE:([VCUWarehouse].[dbo].[SHARE].[DL_LOAD_DATE] as [sh].[DL_LOAD_DATE]='2014-05-21 00:00:00.000' AND [VCUWarehouse].[dbo].[SHARE].[CLOSED] as [sh].[CLOSED]=(0)) PARTITION ID:((180)))
Modifying the index to include Share_Type does cause the index to get used again...and things are all good. Then I add balance, and blamo, back to table scan.
May 22, 2014 at 10:58 am
In this case you are going to a query that doesn't have a covering index. It also appears you do not have a clustered index because you are reverting to a table scan.
But this isn't the full execution plan.
Please post that so we can get the full picture.
Here is an article on how to do that.
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 1:09 pm
Attached are the query plans...one that shows the index seek (when only indexed columns are included in the select statement), and the second with the table scan (a numeric column outside the index) is included in the select statement.
Thank you again for your time.
Does it make sense to you though, that the query plan would change to a table scan just because one of the 4 columns in the select is not in the index? That's the part that I'm hung up on.
P.S. The query plan with the table scan does recommend a new index... to add the "non indexed" column to a new index. But it does that for every column I add to the query, and it is unrealistic to add every column on the table to an index...
May 22, 2014 at 1:13 pm
jchapman (5/22/2014)
Does it make sense to you though, that the query plan would change to a table scan just because one of the 4 columns in the select is not in the index? That's the part that I'm hung up on.
Yes it makes sense.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 1:17 pm
If the added field in the select list is not in the index, the query optimizer only has two options:
1. Look up the rows you need with the nonclustered index, then use the rowIDs to go back to the clustered index/data page and look up the additional field needed (called a key lookup)
2. Run through the clustered index/heap, which has all the data, and find all the rows needed.
The query optimizer will choose what it believes is the cheapest of those two options depending on stats.
Bottom line is that if you want to solely use a nonclustered index, you need to make sure any fields that are being used for filtering are part of the index, and any fields that are being returned are at least included.
Be still, and know that I am God - Psalm 46:10
May 22, 2014 at 1:19 pm
So looking at the queries, it really is nothing more than the lack of a covering index. This is normal behavior. The query is returning enough results that a table scan in the absence of a covering index is more efficient.
If you create a covering index for the second of the two queries, both queries should be able to use (better yet, modify the first index to add the extra column(s)).
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 1:54 pm
I've been experimenting... riddle me this...
My misbehaving share table is partitioned (with LightSpeed, this is 2005 environment) into 183ish partitions.
Here is what I did I just copied my 92 million row table into a new table, all in a single partition... I cannot make it table scan regardless of the number of columns I add to my query... indexes are the same.
Does this make sense (is this maybe a sql 2005 partitioning issue?) could it be that when I include a column not in the index (forcing a merge of data from multiple file groups) ... I am forced into a table scan.
My non partitioned table is subsequently faster than the partitioned table.
Here is my theory... Since partitioning is not native to SQL2005... I believe the plan to use the index breaks down when columns in the query are not all in the same filegroup. (The index with my 4 columns is in the default file group, when I include more columns, then I'm bounced into one of the other partition file groups... and sql server decides to not use the index because it has to work from a different file group?) Does this make any sense at all?
May 22, 2014 at 2:03 pm
jchapman (5/22/2014)
I've been experimenting... riddle me this...My misbehaving share table is partitioned (with LightSpeed, this is 2005 environment) into 183ish partitions.
Here is what I did I just copied my 92 million row table into a new table, all in a single partition... I cannot make it table scan regardless of the number of columns I add to my query... indexes are the same.
Is it doing an index scan and RID lookup instead?
Does this make sense (is this maybe a sql 2005 partitioning issue?) could it be that when I include a column not in the index (forcing a merge of data from multiple file groups) ... I am forced into a table scan.
My non partitioned table is subsequently faster than the partitioned table.
Here is my theory... Since partitioning is not native to SQL2005... I believe the plan to use the index breaks down when columns in the query are not all in the same filegroup. (The index with my 4 columns is in the default file group, when I include more columns, then I'm bounced into one of the other partition file groups... and sql server decides to not use the index because it has to work from a different file group?) Does this make any sense at all?
Your columns would all be available in all of the files of the filegroups. Your rows would be in different files/filegroups for the partitioning.
Since you are partitioning, the partitions are probably aligned with the columns of that initial query. Adding a new column to the query now causes the QO to have to use a table scan because it is cheaper to execute that way.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 2:22 pm
RID Index: Yes my all in one partition is...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1007]) WITH UNORDERED PREFETCH)
|--Index Seek(OBJECT:([VCUWarehouse].[dbo].[share_jc3].[idx_Share_Mbr]), SEEK:([VCUWarehouse].[dbo].[share_jc3].[DL_LOAD_DATE]='2014-05-21 00:00:00.000'), WHERE:([VCUWarehouse].[dbo].[share_jc3].[CLOSED]=(0)) ORDERED FORWARD)
|--RID Lookup(OBJECT:([VCUWarehouse].[dbo].[share_jc3]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)
So in the absense of indexing a large chunk of the relevant 240 columns ... I can either ditch the partitions, or deal with poor query performance for non-index mentioned colums?
Would partitions in SQL 2008 behave the same?
May 22, 2014 at 2:30 pm
I would expect the behavior in 2008 to be more in line with what your single partition example was.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 22, 2014 at 2:37 pm
Your columns would all be available in all of the files of the filegroups. Your rows would be in different files/filegroups for the partitioning.
I am not 100% certain this is a true statement (but given your answer, it may not matter); so I'll toss this out for my general education. I intentionally put my index in a filegroup called P_FG_Default; there are no tables in this filegroup (it was actually empty until I created my index).
The Share table, broken into all those partitions all live in monthly derived/named partitions. The only columns in the default file group for the share table would be in my index... does that matter?
(Thank you very much once again for your time, and the education).
May 22, 2014 at 2:47 pm
Your question appeared to be about the partitioning and that is what my answer was in reference to.
Filegroups have no impact on table scans or index seeks when just splitting indexes out to different filegroups.
As an aside, I recommend not building your indexes in a separate filegroup. I know there is a lot of places on the internet that make the recommendation to split indexes from data. That was a well and good back in SQL 2000. With larger and larger databases and newer versions of SQL Server - it's not that great.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply