Advice on an OR statement

Question

Advice on an OR statement

DrJogalog

Old Hand

Points: 333
More actions
March 31, 2010 at 5:22 am

#222721

The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.
If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
(Prefix='C' and Document='STCR124171' and seqno=1) or
(Prefix='C' and Document='STCR124172' and seqno=2)
If I split the query into 2, it returns the same results instantaeneously.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124171' and seqno=1
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124172' and seqno=2
Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1)) on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.
After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?
Anybody any ideas?
If any more info is required, I can easily provide this if it is relevant.
Thanks.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

DrJogalog Old Hand Points: 333 More actions · Answer 1

Also, if the SEQNO is the same value on the 2 clauses, it returns the results instantly too. If they are different, the delay is there. < This is really confusing!

The order of the clustered index in PREFIX, DOCUMENT & SEQNO, all ascending.

I have also just rebuilt the entire index yet the problem still exists.

Paul White SSC Guru Points: 150468 More actions · Answer 2

It all boils down to the way SQL Server estimates cardinality and cost when an OR is present.

Are you using SQL Server 2008? The optimizer frequently manages this type of query much better in 2008.

One solution is to rewrite the query without the OR, using a UNION ALL.

SELECT Document, rc

FROM [AP].[dbo].[ILines]

WHERE Prefix = 'C'

AND Document = 'STCR124171'

AND seqno=1

UNION ALL

SELECT Document, rc

FROM [AP].[dbo].[ILines]

WHERE Prefix = 'C'

AND Document = 'STCR124172'

AND seqno=2;

You might also try the new FORCESEEK hint.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

DrJogalog Old Hand Points: 333 More actions · Answer 3

Thanks for the reply.

No, this test was on a SQL2000 server (using 2008 Managemeent studio). I'll move the data to a 2008 SQL sever to see if the results are returned in the same manner. Although, this won;t be a complete solution as we have hundreds of customers still using 2000 and 2005, very few have actually moved to 2008 at this point.

If the union returns the results quick, like the 2 seperate statements, I will ask our developers to change the code to work in this manner.

As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.

If this has been fixed in 2008, then fair enough but should still be implemented into pevious systems. If it hasn't it seems like a 'real' issue to me as the optimizer doesn't seem to be optimizing anything.

Thanks for the reply. If anyone else has any further suggestions, feel free to let me know.

WayneS SSC Guru Points: 95461 More actions · Answer 4

DrJogalog (3/31/2010)
The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.
If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
(Prefix='C' and Document='STCR124171' and seqno=1) or
(Prefix='C' and Document='STCR124172' and seqno=2)
If I split the query into 2, it returns the same results instantaeneously.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124171' and seqno=1
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124172' and seqno=2
Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1))on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.
After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?
Anybody any ideas?
If any more info is required, I can easily provide this if it is relevant.
Thanks.

In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Paul White SSC Guru Points: 150468 More actions · Answer 5

WayneS (3/31/2010)
In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

Observant readers will have noticed that the code I posted dispensed with that hint 😎

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

DrJogalog Old Hand Points: 333 More actions · Answer 6

In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉

Thanks for the reply.

Our devlopers have a passion for writing 'no lock's' in their statements. The data returned in these queries should not actually change once written so there is no real potential to return inaccurate data in this example. We had quite a few issues with deadlocking a fair while back and the solution seemed to be to use 'no lock' clauses on every statement they could. I personally use them very sparingly, depending on what I am doing, but at this point, its not my choice.

You might also try the new FORCESEEK hint.

Im not at work at the moment but I know the execution plans showed that both queries were already using the clustered index seek. There was no difference in the execution plan itself, just the time it took to return the results. Although I will still give this a try too.

Thanks.

Paul White SSC Guru Points: 150468 More actions · Answer 7

DrJogalog (3/31/2010)
As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.

Could you post the exact definition (CREATE INDEX statements) for *all* the indexes on the table please?

It will probably help explain why this is happening.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

DrJogalog Old Hand Points: 333 More actions · Answer 8

DrJogalog

Old Hand

Points: 333

March 31, 2010 at 12:24 pm

#1143347

Comments removed as no longer valid.

DrJogalog Old Hand Points: 333 More actions · Answer 9

Ok, there wasn't as many as I thought.

The primary key is PREFIX, DOCUMENT, SEQNO in that order, all ascending.

CREATE INDEX [Corder] ON [dbo].[ILines]([COrder]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Part] ON [dbo].[ILines]([Part]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [PG] ON [dbo].[ILines]([PG]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [RC] ON [dbo].[ILines]([Rc]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Supp] ON [dbo].[ILines]([Supp]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [BinCard] ON [dbo].[ILines]([StkPart], [Bincard]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Today] ON [dbo].[ILines]([Today]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [Credit] ON [dbo].[ILines]([Credit]) ON [PRIMARY]

GO

CREATE INDEX [Invoice] ON [dbo].[ILines]([Invoice]) ON [PRIMARY]

GO

CREATE INDEX [BOrder] ON [dbo].[ILines]([BOrder]) ON [PRIMARY]

GO

WayneS SSC Guru Points: 95461 More actions · Answer 10

Paul White NZ (3/31/2010)
WayneS (3/31/2010)
In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉
Observant readers will have noticed that the code I posted dispensed with that hint 😎

Paul, I apologize. I did notice that, and meant to mention it, but it slipped past me.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

vidya_pande SSCertifiable Points: 5275 More actions · Answer 11

it will use only PK. Union all could give improved performance. Please post the results

Paul White SSC Guru Points: 150468 More actions · Answer 12

Thanks for the index definitions - though in this case, they did not help. Never mind.

If your query plan shows a single seek instead of two, it is likely that the SQL Server 2000 optimizer is combining the two searches into a single range seek with a predicate. The range seek will not be very selective, so the predicate has to be evaluated on a large number of rows. Because the seek portion selects a relatively large number of rows (before the predicate), a large number of data pages must be scanned. The effect is made more pronounced by the fact that the clustered index is the widest possible index - it includes all the data at the leaf level.

My guess is that your plan shows a seek on (Prefix = 'C') only, with the predicate evaluating the remaining conditions. Obviously, you have a lot of records where Prefix = 'C', and this is what causes the large number of physical reads. The fillfactor of the recently rebuilt clustered index will also make the scan less efficient, and the presence or absence of read-ahead reads will also make a difference.

Here is a test rig I set up to see if I could reproduce your problem. I tested this on 2005 and 2008, since I do not have an instance of SQL Server 2000. Both are correctly optimized to two seeks and a concatenation (exactly as the UNION ALL form of the query), and therefore complete in milliseconds, even with a cleared buffer pool.

-- For demonstration purposes only

USE tempdb;

GO

-- Drop the test table if it exists

IF OBJECT_ID(N'dbo.Test', N'U')

IS NOT NULL

DROP TABLE dbo.Test;

GO

-- Create the test table

-- Padding added to widen the rows

CREATE TABLE dbo.Test

(

prefix CHAR(1) NOT NULL,

document CHAR(10) NOT NULL,

seq_no INTEGER NOT NULL,

rc INTEGER NOT NULL,

padding CHAR(1000) NOT NULL DEFAULT ''

);

GO

-- Add 100,000 random-ish rows

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT TOP (100000)

CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),

'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),

CHECKSUM(NEWID()) % 5 + 1,

CHECKSUM(NEWID())

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3;

GO

-- Remove any duplicates

-- (should be very few)

WITH CTE

AS (

SELECT *,

rn = ROW_NUMBER() OVER(

PARTITION BY prefix, document, seq_no

ORDER BY (SELECT 0))

FROM dbo.Test

)

DELETE CTE

WHERE rn > 1;

GO

-- Ensure our target rows exist

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT 'C', 'STCR124171', 1, 12345678

WHERE NOT EXISTS

(

SELECT *

FROM dbo.Test T2

WHERE T2.prefix = 'C'

AND T2.document = 'STCR124171'

AND seq_no = 1

);

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT 'C', 'STCR124172', 2, 12345678

WHERE NOT EXISTS

(

SELECT *

FROM dbo.Test T2

WHERE T2.prefix = 'C'

AND T2.document = 'STCR124172'

AND seq_no = 2

);

GO

-- Add the primary key

ALTER TABLE dbo.Test

ADD PRIMARY KEY CLUSTERED

(prefix, document, seq_no)

WITH (FILLFACTOR = 80);

GO

-- Clear the buffer pool

CHECKPOINT;

DBCC DROPCLEANBUFFERS;

GO

-- Test the query

SELECT Document, rc

FROM dbo.Test

WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)

OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);

-- Tidy up

DROP TABLE dbo.Test;

This is the (actual) execution plan:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 13

Another thing I should mention: the current clustered index is not a great choice for the clustered index.

The key is not outrageously wide (at 15 bytes), but inserts are likely to be spread across the index, resulting in either a sub-optimal page density (due to a low fillfactor), or high levels and page splitting and therefore high fragmentation levels. Fragmentation can have a very detrimental effect on range scan performance. The key is included in every non-clustered index (it is the row locator), and this adds 15 bytes to every row of every index too.

I would be tempted to change the primary key to NONCLUSTERED, and create a clustered index on a narrow column which monotonically increases and never changes. An INTEGER IDENTITY column is convenient for this, but by no means the only option.

An updated test rig to demonstrate this follows.

-- For demonstration purposes only

USE tempdb;

GO

-- Drop the test table if it exists

IF OBJECT_ID(N'dbo.Test', N'U')

IS NOT NULL

DROP TABLE dbo.Test;

GO

-- Create the test table

-- Padding added to widen the rows

CREATE TABLE dbo.Test

(

row_id INTEGER IDENTITY NOT NULL

UNIQUE CLUSTERED

WITH (FILLFACTOR = 100),

prefix CHAR(1) NOT NULL,

document CHAR(10) NOT NULL,

seq_no INTEGER NOT NULL,

rc INTEGER NOT NULL,

padding CHAR(1000) NOT NULL DEFAULT ''

);

GO

-- Add 100,000 random-ish rows

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT TOP (100000)

CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),

'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),

CHECKSUM(NEWID()) % 5 + 1,

CHECKSUM(NEWID())

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3;

GO

-- Remove any duplicates

-- (should be very few)

WITH CTE

AS (

SELECT *,

rn = ROW_NUMBER() OVER(

PARTITION BY prefix, document, seq_no

ORDER BY (SELECT 0))

FROM dbo.Test

)

DELETE CTE

WHERE rn > 1;

GO

-- Ensure our target rows exist

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT 'C', 'STCR124171', 1, 12345678

WHERE NOT EXISTS

(

SELECT *

FROM dbo.Test T2

WHERE T2.prefix = 'C'

AND T2.document = 'STCR124171'

AND seq_no = 1

);

INSERT dbo.Test

(prefix, document, seq_no, rc)

SELECT 'C', 'STCR124172', 2, 12345678

WHERE NOT EXISTS

(

SELECT *

FROM dbo.Test T2

WHERE T2.prefix = 'C'

AND T2.document = 'STCR124172'

AND seq_no = 2

);

GO

-- Add the primary key

ALTER TABLE dbo.Test

ADD PRIMARY KEY NONCLUSTERED

(prefix, document, seq_no)

WITH (FILLFACTOR = 80);

GO

-- Clear the buffer pool

CHECKPOINT;

DBCC DROPCLEANBUFFERS;

GO

-- Test the query

SELECT Document, rc

FROM dbo.Test

WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)

OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);

-- Tidy up

DROP TABLE dbo.Test;

The query plan in this case is:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

DrJogalog Old Hand Points: 333 More actions · Answer 14

Thanks for the detailed analysis Paul, I understand why the query is slow from your explanation.

Moving the data to a SQL2008 server stops the speed issue. The updated optimizer must be able to deal with this type of query a little better.

Using the UNION ALL clause returns the data at the same speed as the 2 separate statements but in the same format as the OR statement so, it looks like this is the option we will take.

Here are the execution plans

CHECKPOINT

DBCC DROPCLEANBUFFERS

GO

SELECTDocument, rc

FROM[ILines]

WHERE(Prefix='C' and Document='STCR124171' and seqno=2)

OR(Prefix='C' and Document='STCR124172' and seqno=1)

OPTION (maxdop 1))

The maxdop command is there to keep the comparison the same across the queries.

That took over 15 seconds to complete.

CHECKPOINT

DBCC DROPCLEANBUFFERS

GO

SELECTDocument, rc

FROM[ILines]

WHEREPrefix='C'

ANDDocument='STCR124171'

ANDseqno=2

SELECTDocument, rc

FROM[ILines]

WHEREPrefix='C'

ANDDocument='STCR124172'

ANDseqno=1

That takes milliseconds.

CHECKPOINT

DBCC DROPCLEANBUFFERS

GO

SELECTDocument, rc

FROM[ILines]

WHEREPrefix='C'

ANDDocument='STCR124171'

ANDseqno=2

UNION ALL

SELECTDocument, rc

FROM[ILines]

WHEREPrefix='C'

ANDDocument='STCR124172'

ANDseqno=1

That takes milliseonds again but returns 1 set of data as in the original OR query.

With regard to the clustered index on this table, we are fully aware of the issues that it causes. This has been pointed out quite a while ago by ourselves and external consultants (as well as the 'no lock' clauses too). Again, we can only advise, but do not make the decisions to allow these fundamental flaws to be changed. These databases were designed many years ago when the development team knew very little about SQL design and structure. Newer tables that are created are designed with speed in mind so the indexes are controlled in a different manner. We are starting to have quite a few issues with speed of the queries at the moment so it is inevitable that a BIG change will be on the cards soon. Either that or we grind to a halt eventually! lol

I appreciate ALL your help and will be a regular visitor to this site now.