Dropping primary key improves performance

Question

Post reply

Dropping primary key improves performance

Craig Smith

SSChasing Mays

Points: 600
More actions
March 16, 2016 at 1:47 pm

#314120

I am continually plagued by performance problems. When I display the estimated execution plan, the estimated cost and number of rows are usually way off, the plans it chooses are often really poor, and it often suggests new indexes that don't help.
Right now I have a specific recurring issue I'd like to throw out there. I have a 500,000 row table with a two-column clustered primary key, let's call them A and B. I have some queries that join to this table on those two columns and they take an hour to run, returning about 25000 rows. SQL Server suggests I need a new nonclustered index on column B, and include column A. When I drop the primary key entirely, and don't create a new index, the query runs in a few seconds.
First, why would it suggest that index, and second, why does a primary key kill the query performance? I've already tried rebuilding indexes and updating stats with no improvement. I also tried creating that suggested index and it didn't help; it just came up with another suggested index that doesn't help either.
We're on 2008 R2. Thanks.

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 1

Have you tried updating the statistics on that table?

Lowell SSC Guru Points: 323755 More actions · Answer 2

i'd like to see the actual query and an execution plan;

you might have a non-Sargable query, functions in the Where clause, a catch all query containing a lot of OR statements, or something else that is preventing a decent plan and performance.

i have lots of queries joining multiple tables with tens of millions of rows, and they don't perform poorly.

if you can provide more details, i'm willing to bet we can offer some solid guidance.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Craig Smith SSChasing Mays Points: 600 More actions · Answer 3

Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.

I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.

SELECT

PURCH_DOC_FACT.PURCH_DOC_ID

,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID

,PURCH_DOC_FACT.MATERIAL_ID

,PURCH_DOC_FACT.PURCH_DOC_DATE

,EKPO.ADPRI

,MRP_RESULT_FACT.MRP_DATE

FROM

PURCH_DOC_FACT

INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN

AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)

LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM

AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

smithhead (3/16/2016)
Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.
I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.
SELECT
PURCH_DOC_FACT.PURCH_DOC_ID
,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID
,PURCH_DOC_FACT.MATERIAL_ID
,PURCH_DOC_FACT.PURCH_DOC_DATE
,EKPO.ADPRI
,MRP_RESULT_FACT.MRP_DATE
FROM
PURCH_DOC_FACT
INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)
LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

Any chance of you posting the actual execution plans by saving the actual plans to file instead of snipping pictures? It would be interesting to see how many scans are actually being done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110209 More actions · Answer 5

What about the other tables in the query?

What indexes do they have?

Are there any FK constraints between these tables?

_____________
Code for TallyGenerator

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

There's no warning on the result operator to back this up (if the warning string is too long to display in the "hover over" property sheet, it can still be seen in the right-click property sheet), but that pattern of compute scalars, one for each table, between read and join, from a query which has eccentric performance characteristics, is often synonymous with implicit conversions.

What are the data types of the join columns in each table?

Use table aliasing to reduce noise in your queries:

SELECT

pd.PURCH_DOC_ID

,pd.PURCH_LINE_ITEM_ID

,pd.MATERIAL_ID

,pd.PURCH_DOC_DATE

,EKPO.ADPRI

,r.MRP_DATE

FROM PURCH_DOC_FACT pd

INNER JOIN EKPO

ON (pd.PURCH_DOC_ID = EKPO.EBELN

AND pd.PURCH_LINE_ITEM_ID = EKPO.EBELP)

LEFT OUTER JOIN MRP_RESULT_FACT r

ON (pd.PURCH_DOC_ID = r.MRP_ELEM_NUM

AND pd.PURCH_LINE_ITEM_ID = r.MRP_ELEM_LINE_NUM)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Craig Smith SSChasing Mays Points: 600 More actions · Answer 7

Thanks everyone. I can post more details later, but here are some quick comments/questions:

Good to know about compute scalars, I was wondering what that meant.

What do you mean by "warning on the result operator"?

There are no foreign keys in this database. (I didn't create this database.)

I generally use table aliasing; this query was auto-generated by a BI tool (but with a little cleanup by myself).

At any rate, I very much appreciate your expertise and will spend some more time looking at those things, but my main goal with this post is more general. I've used other databases for over 20 years and have some tuning experience, but I am relatively new to SQL Server and have never been so frustrated. It continually makes horrible plan choices, and query or table changes often trigger a strategy change that has nothing whatsoever to do with the change that was made. Its estimates are often way off, but supposedly we rebuild indexes and update statistics every day (I am not the DBA). Could there be something wrong with our configuration or management, or is the optimizer for 2008R2 just really bad? (We are on a path to upgrade in a few months so maybe that will help some?)

Thanks.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

smithhead (3/17/2016)
Thanks everyone. I can post more details later, but here are some quick comments/questions:
Good to know about compute scalars, I was wondering what that meant.
What do you mean by "warning on the result operator"?
There are no foreign keys in this database. (I didn't create this database.)
I generally use table aliasing; this query was auto-generated by a BI tool (but with a little cleanup by myself).
At any rate, I very much appreciate your expertise and will spend some more time looking at those things, but my main goal with this post is more general. I've used other databases for over 20 years and have some tuning experience, but I am relatively new to SQL Server and have never been so frustrated. It continually makes horrible plan choices, and query or table changes often trigger a strategy change that has nothing whatsoever to do with the change that was made. Its estimates are often way off, but supposedly we rebuild indexes and update statistics every day (I am not the DBA). Could there be something wrong with our configuration or management, or is the optimizer for 2008R2 just really bad? (We are on a path to upgrade in a few months so maybe that will help some?)
Thanks.

Any chance you could post up one of those execution plans? As a .sqlplan attachment? There are plenty of questions here which could be answered with just a quick peek...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

The Danish Dynamo SSC-Addicted Points: 464 More actions · Answer 9

The Danish Dynamo

SSC-Addicted

Points: 464

March 17, 2016 at 9:56 am

#1865913

What is the data type of the keys?

ScottPletcher SSC Guru Points: 101238 More actions · Answer 10

You need to review the indexes from scratch. That is, analyze SQL's index usage stats, missing index stats and index operational stats. The longer time period you have these stats for the better, as they will better reflect the actual usage of the table and its indexes.

Hopefully you have another copy of the db, either live or as a backup, where you haven't dropped the existing pk yet, so you can still see all the stats on that index too. When an index is dropped, the corresponding stats are also dropped (except perhaps missing index stats, that one depends).

By analyzing the stats you can determine:

1) the best clustered index on every table -- this is the most critical factor in performance.

2) any nonclus index(es) you still need, with included column(s).

There is an iterative process. But it gets better as it goes along.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 11

smithhead (3/16/2016)
Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.
I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.
SELECT
PURCH_DOC_FACT.PURCH_DOC_ID
,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID
,PURCH_DOC_FACT.MATERIAL_ID
,PURCH_DOC_FACT.PURCH_DOC_DATE
,EKPO.ADPRI
,MRP_RESULT_FACT.MRP_DATE
FROM
PURCH_DOC_FACT
INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)
LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)

Just an FYI, A table scan on a heap (no clustered index) and a clustered index scan are actually the same thing, a table scan.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 12

Lynn Pettis (3/17/2016)
smithhead (3/16/2016)
Thanks, yes, I updated statistics. In the interest of being able to show it conveniently here I reduced the query down to a very simple example. The original was more complicated and had a where clause of just a single column equal to a constant that has no effect on the issue.
I've attached two plan images. The first one is without a primary key; you will see a full table scan on the EKPO table. The second with a primary key (on columns EBELN and EBELP) has a clustered index scan. From what I see here I don't think the access of the EKPO table is the issue; you will see that the access to the MRP_RESULT_FACT table changed. This is what I see all the time -- a change is made to a table or query and triggers the optimizer to choose a completely different plan, changing things that had nothing to do with the actual change that was made.
SELECT
PURCH_DOC_FACT.PURCH_DOC_ID
,PURCH_DOC_FACT.PURCH_LINE_ITEM_ID
,PURCH_DOC_FACT.MATERIAL_ID
,PURCH_DOC_FACT.PURCH_DOC_DATE
,EKPO.ADPRI
,MRP_RESULT_FACT.MRP_DATE
FROM
PURCH_DOC_FACT
INNER JOIN EKPO ON (PURCH_DOC_FACT.PURCH_DOC_ID = EKPO.EBELN
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = EKPO.EBELP)
LEFT OUTER JOIN MRP_RESULT_FACT ON (PURCH_DOC_FACT.PURCH_DOC_ID = MRP_RESULT_FACT.MRP_ELEM_NUM
AND PURCH_DOC_FACT.PURCH_LINE_ITEM_ID = MRP_RESULT_FACT.MRP_ELEM_LINE_NUM)
Just an FYI, A table scan on a heap (no clustered index) and a clustered index scan are actually the same thing, a table scan.

Almost, except you won't get a merge join from a table scan without a sort first.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Wayne West SSC-Insane Points: 22586 More actions · Answer 13

That's not really a lot of records. I'm in the data type mismatch of your join fields camp, but also, how much memory does your server have and what else is it doing? Could your instance be starved for memory?

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]