March 31, 2010 at 5:22 am
The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.
If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
(Prefix='C' and Document='STCR124171' and seqno=1) or
(Prefix='C' and Document='STCR124172' and seqno=2)
If I split the query into 2, it returns the same results instantaeneously.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124171' and seqno=1
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124172' and seqno=2
Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1)) on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.
After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?
Anybody any ideas?
If any more info is required, I can easily provide this if it is relevant.
Thanks.
March 31, 2010 at 5:52 am
Also, if the SEQNO is the same value on the 2 clauses, it returns the results instantly too. If they are different, the delay is there. < This is really confusing!
The order of the clustered index in PREFIX, DOCUMENT & SEQNO, all ascending.
I have also just rebuilt the entire index yet the problem still exists.
March 31, 2010 at 9:59 am
It all boils down to the way SQL Server estimates cardinality and cost when an OR is present.
Are you using SQL Server 2008? The optimizer frequently manages this type of query much better in 2008.
One solution is to rewrite the query without the OR, using a UNION ALL.
SELECT Document, rc
FROM [AP].[dbo].[ILines]
WHERE Prefix = 'C'
AND Document = 'STCR124171'
AND seqno=1
UNION ALL
SELECT Document, rc
FROM [AP].[dbo].[ILines]
WHERE Prefix = 'C'
AND Document = 'STCR124172'
AND seqno=2;
You might also try the new FORCESEEK hint.
March 31, 2010 at 11:49 am
Thanks for the reply.
No, this test was on a SQL2000 server (using 2008 Managemeent studio). I'll move the data to a 2008 SQL sever to see if the results are returned in the same manner. Although, this won;t be a complete solution as we have hundreds of customers still using 2000 and 2005, very few have actually moved to 2008 at this point.
If the union returns the results quick, like the 2 seperate statements, I will ask our developers to change the code to work in this manner.
As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.
If this has been fixed in 2008, then fair enough but should still be implemented into pevious systems. If it hasn't it seems like a 'real' issue to me as the optimizer doesn't seem to be optimizing anything.
Thanks for the reply. If anyone else has any further suggestions, feel free to let me know.
March 31, 2010 at 11:55 am
DrJogalog (3/31/2010)
The PK on my table is PREFIX, DOCMENT and SEQNO. This is the clustered index.If I run this statement after clearing the SQL buffer (DBCC DROPCLEANBUFFERS) etc, it will take over 15 seconds to return the 4 fields.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
(Prefix='C' and Document='STCR124171' and seqno=1) or
(Prefix='C' and Document='STCR124172' and seqno=2)
If I split the query into 2, it returns the same results instantaeneously.
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124171' and seqno=1
Select Document, rc from [AP].[dbo].[ILines] with (NoLock) where
Prefix='C' and Document='STCR124172' and seqno=2
Both execution plans show it is using a clustered index seek. The first query does use parallelism but even if switch this off with (OPTION (maxdop 1))on the query, the results are the same. Both execution plans show exactly the same except there is 2 for the 'split' results.
After the query has been run once, it then also comes back instantly that is why I was clearing the buffer each time. Why is there a 40x increase in the completion time on what would esseentially be the same query?
Anybody any ideas?
If any more info is required, I can easily provide this if it is relevant.
Thanks.
In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 31, 2010 at 12:03 pm
WayneS (3/31/2010)
In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉
Observant readers will have noticed that the code I posted dispensed with that hint 😎
March 31, 2010 at 12:07 pm
In addition to what Paul suggests (using a UNION), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉
Thanks for the reply.
Our devlopers have a passion for writing 'no lock's' in their statements. The data returned in these queries should not actually change once written so there is no real potential to return inaccurate data in this example. We had quite a few issues with deadlocking a fair while back and the solution seemed to be to use 'no lock' clauses on every statement they could. I personally use them very sparingly, depending on what I am doing, but at this point, its not my choice.
You might also try the new FORCESEEK hint.
Im not at work at the moment but I know the execution plans showed that both queries were already using the clustered index seek. There was no difference in the execution plan itself, just the time it took to return the results. Although I will still give this a try too.
Thanks.
March 31, 2010 at 12:17 pm
DrJogalog (3/31/2010)
As stated previously, what confuses me is if the SEQNO is the same on both clauses, it works fine.
Could you post the exact definition (CREATE INDEX statements) for *all* the indexes on the table please?
It will probably help explain why this is happening.
March 31, 2010 at 12:24 pm
Comments removed as no longer valid.
March 31, 2010 at 12:39 pm
Ok, there wasn't as many as I thought.
The primary key is PREFIX, DOCUMENT, SEQNO in that order, all ascending.
CREATE INDEX [Corder] ON [dbo].[ILines]([COrder]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [Part] ON [dbo].[ILines]([Part]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [PG] ON [dbo].[ILines]([PG]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [RC] ON [dbo].[ILines]([Rc]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [Supp] ON [dbo].[ILines]([Supp]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [BinCard] ON [dbo].[ILines]([StkPart], [Bincard]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [Today] ON [dbo].[ILines]([Today]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [Credit] ON [dbo].[ILines]([Credit]) ON [PRIMARY]
GO
CREATE INDEX [Invoice] ON [dbo].[ILines]([Invoice]) ON [PRIMARY]
GO
CREATE INDEX [BOrder] ON [dbo].[ILines]([BOrder]) ON [PRIMARY]
GO
March 31, 2010 at 2:35 pm
Paul White NZ (3/31/2010)
WayneS (3/31/2010)
In addition to what Paul suggests (using a UNION ALL), you should remove the NOLOCK hint. Unless you don't mind sending potentially wrong data back to the user. 😉Observant readers will have noticed that the code I posted dispensed with that hint 😎
Paul, I apologize. I did notice that, and meant to mention it, but it slipped past me.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
March 31, 2010 at 2:44 pm
it will use only PK. Union all could give improved performance. Please post the results
April 1, 2010 at 12:31 am
Thanks for the index definitions - though in this case, they did not help. Never mind.
If your query plan shows a single seek instead of two, it is likely that the SQL Server 2000 optimizer is combining the two searches into a single range seek with a predicate. The range seek will not be very selective, so the predicate has to be evaluated on a large number of rows. Because the seek portion selects a relatively large number of rows (before the predicate), a large number of data pages must be scanned. The effect is made more pronounced by the fact that the clustered index is the widest possible index - it includes all the data at the leaf level.
My guess is that your plan shows a seek on (Prefix = 'C') only, with the predicate evaluating the remaining conditions. Obviously, you have a lot of records where Prefix = 'C', and this is what causes the large number of physical reads. The fillfactor of the recently rebuilt clustered index will also make the scan less efficient, and the presence or absence of read-ahead reads will also make a difference.
Here is a test rig I set up to see if I could reproduce your problem. I tested this on 2005 and 2008, since I do not have an instance of SQL Server 2000. Both are correctly optimized to two seeks and a concatenation (exactly as the UNION ALL form of the query), and therefore complete in milliseconds, even with a cleared buffer pool.
-- For demonstration purposes only
USE tempdb;
GO
-- Drop the test table if it exists
IF OBJECT_ID(N'dbo.Test', N'U')
IS NOT NULL
DROP TABLE dbo.Test;
GO
-- Create the test table
-- Padding added to widen the rows
CREATE TABLE dbo.Test
(
prefix CHAR(1) NOT NULL,
document CHAR(10) NOT NULL,
seq_no INTEGER NOT NULL,
rc INTEGER NOT NULL,
padding CHAR(1000) NOT NULL DEFAULT ''
);
GO
-- Add 100,000 random-ish rows
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT TOP (100000)
CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),
'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),
CHECKSUM(NEWID()) % 5 + 1,
CHECKSUM(NEWID())
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
-- Remove any duplicates
-- (should be very few)
WITH CTE
AS (
SELECT *,
rn = ROW_NUMBER() OVER(
PARTITION BY prefix, document, seq_no
ORDER BY (SELECT 0))
FROM dbo.Test
)
DELETE CTE
WHERE rn > 1;
GO
-- Ensure our target rows exist
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT 'C', 'STCR124171', 1, 12345678
WHERE NOT EXISTS
(
SELECT *
FROM dbo.Test T2
WHERE T2.prefix = 'C'
AND T2.document = 'STCR124171'
AND seq_no = 1
);
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT 'C', 'STCR124172', 2, 12345678
WHERE NOT EXISTS
(
SELECT *
FROM dbo.Test T2
WHERE T2.prefix = 'C'
AND T2.document = 'STCR124172'
AND seq_no = 2
);
GO
-- Add the primary key
ALTER TABLE dbo.Test
ADD PRIMARY KEY CLUSTERED
(prefix, document, seq_no)
WITH (FILLFACTOR = 80);
GO
-- Clear the buffer pool
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
-- Test the query
SELECT Document, rc
FROM dbo.Test
WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)
OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);
-- Tidy up
DROP TABLE dbo.Test;
This is the (actual) execution plan:
April 1, 2010 at 12:43 am
Another thing I should mention: the current clustered index is not a great choice for the clustered index.
The key is not outrageously wide (at 15 bytes), but inserts are likely to be spread across the index, resulting in either a sub-optimal page density (due to a low fillfactor), or high levels and page splitting and therefore high fragmentation levels. Fragmentation can have a very detrimental effect on range scan performance. The key is included in every non-clustered index (it is the row locator), and this adds 15 bytes to every row of every index too.
I would be tempted to change the primary key to NONCLUSTERED, and create a clustered index on a narrow column which monotonically increases and never changes. An INTEGER IDENTITY column is convenient for this, but by no means the only option.
An updated test rig to demonstrate this follows.
-- For demonstration purposes only
USE tempdb;
GO
-- Drop the test table if it exists
IF OBJECT_ID(N'dbo.Test', N'U')
IS NOT NULL
DROP TABLE dbo.Test;
GO
-- Create the test table
-- Padding added to widen the rows
CREATE TABLE dbo.Test
(
row_id INTEGER IDENTITY NOT NULL
UNIQUE CLUSTERED
WITH (FILLFACTOR = 100),
prefix CHAR(1) NOT NULL,
document CHAR(10) NOT NULL,
seq_no INTEGER NOT NULL,
rc INTEGER NOT NULL,
padding CHAR(1000) NOT NULL DEFAULT ''
);
GO
-- Add 100,000 random-ish rows
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT TOP (100000)
CHAR(RAND(CHECKSUM(NEWID())) * 26 + 65),
'STCR' + RIGHT(1000000 + CHECKSUM(NEWID()) % 1000000, 6),
CHECKSUM(NEWID()) % 5 + 1,
CHECKSUM(NEWID())
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
-- Remove any duplicates
-- (should be very few)
WITH CTE
AS (
SELECT *,
rn = ROW_NUMBER() OVER(
PARTITION BY prefix, document, seq_no
ORDER BY (SELECT 0))
FROM dbo.Test
)
DELETE CTE
WHERE rn > 1;
GO
-- Ensure our target rows exist
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT 'C', 'STCR124171', 1, 12345678
WHERE NOT EXISTS
(
SELECT *
FROM dbo.Test T2
WHERE T2.prefix = 'C'
AND T2.document = 'STCR124171'
AND seq_no = 1
);
INSERT dbo.Test
(prefix, document, seq_no, rc)
SELECT 'C', 'STCR124172', 2, 12345678
WHERE NOT EXISTS
(
SELECT *
FROM dbo.Test T2
WHERE T2.prefix = 'C'
AND T2.document = 'STCR124172'
AND seq_no = 2
);
GO
-- Add the primary key
ALTER TABLE dbo.Test
ADD PRIMARY KEY NONCLUSTERED
(prefix, document, seq_no)
WITH (FILLFACTOR = 80);
GO
-- Clear the buffer pool
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
GO
-- Test the query
SELECT Document, rc
FROM dbo.Test
WHERE (prefix = 'C' AND document = 'STCR124171' AND seq_no = 1)
OR (prefix = 'C' AND document = 'STCR124172' AND seq_no = 2);
-- Tidy up
DROP TABLE dbo.Test;
The query plan in this case is:
April 1, 2010 at 3:00 am
Thanks for the detailed analysis Paul, I understand why the query is slow from your explanation.
Moving the data to a SQL2008 server stops the speed issue. The updated optimizer must be able to deal with this type of query a little better.
Using the UNION ALL clause returns the data at the same speed as the 2 separate statements but in the same format as the OR statement so, it looks like this is the option we will take.
Here are the execution plans
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECTDocument, rc
FROM[ILines]
WHERE(Prefix='C' and Document='STCR124171' and seqno=2)
OR(Prefix='C' and Document='STCR124172' and seqno=1)
OPTION (maxdop 1))
The maxdop command is there to keep the comparison the same across the queries.
That took over 15 seconds to complete.
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECTDocument, rc
FROM[ILines]
WHEREPrefix='C'
ANDDocument='STCR124171'
ANDseqno=2
SELECTDocument, rc
FROM[ILines]
WHEREPrefix='C'
ANDDocument='STCR124172'
ANDseqno=1
That takes milliseconds.
CHECKPOINT
DBCC DROPCLEANBUFFERS
GO
SELECTDocument, rc
FROM[ILines]
WHEREPrefix='C'
ANDDocument='STCR124171'
ANDseqno=2
UNION ALL
SELECTDocument, rc
FROM[ILines]
WHEREPrefix='C'
ANDDocument='STCR124172'
ANDseqno=1
That takes milliseonds again but returns 1 set of data as in the original OR query.
With regard to the clustered index on this table, we are fully aware of the issues that it causes. This has been pointed out quite a while ago by ourselves and external consultants (as well as the 'no lock' clauses too). Again, we can only advise, but do not make the decisions to allow these fundamental flaws to be changed. These databases were designed many years ago when the development team knew very little about SQL design and structure. Newer tables that are created are designed with speed in mind so the indexes are controlled in a different manner. We are starting to have quite a few issues with speed of the queries at the moment so it is inevitable that a BIG change will be on the cards soon. Either that or we grind to a halt eventually! lol
I appreciate ALL your help and will be a regular visitor to this site now.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply