Existance Check - Performance

Question

Existance Check - Performance

Pat Neb

SSC Rookie

Points: 41
More actions
July 23, 2009 at 2:51 pm

#207364

I am wanting to call one of these queries from my code to check if an invoice # already exists. So, I want to return a scalar value of integer.
Which query is faster (invoice # is not indexed, there are approx. 200,000 records)?:
IF EXISTS(SELECT 1 FROM [dbo].[Invoices] WITH(NOLOCK) WHERE [InvoiceNumber] = 'ABC') SELECT 1 ELSE SELECT 0
OR
SELECT COUNT(1) FROM [dbo].[Invoices] WITH(NOLOCK) WHERE [InvoiceNumber] = 'ABC'
Cheers,
Pat

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184394 More actions · Answer 1

I would say that either will be the same. Depending on other queries you probably will want a non-clustered index on InvoiceNumber.

I also wonder about using the NOLOCK hint in this case because you could get a dirty read where the InvoiceNumber exists and then that transaction is rolled back and the InvoiceNumber no longer exists. A non-clustered index on InvoiceNumber would help there as well as the select will likely just take a key lock on the non-clustered index and not affect the rest of the rows in the table.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 2

It sounds like you have the data to do a test... why not just test it and see?

"A Developer must not guess... a Developer must KNOW." 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

happycat59 One Orange Chip Points: 29389 More actions · Answer 3

IF EXISTS is potentially quicker because once any record has been found SQL Server has finished what is needed.

Using COUNT(1) requires SQL Server to return all records that meet the criteria. It does not have the advantage that you have - i.e. it does not know that the invoice # is unique.

Pat Neb SSC Rookie Points: 41 More actions · Answer 4

Thanks for all the replies.

I will do the following:

1. Do a test from code prior to making any index changes with both queries to see which one runs faster.

2. Once that is determined, then, I will add a unique index on the column and re-run the tests.

The only issue I have with adding the index is that you do not always have the ability to do something to the database (at least I don't).

I used the NOLOCK table hint because the table runs inside of transactions for other things and did not want this query to affect their performance. I understand the risk of reading dirty data.

Cheers,

Pat

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 5

happycat59 (7/23/2009)
Using COUNT(1) requires SQL Server to return all records that meet the criteria.

Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 6

Jeff Moden (7/24/2009)
happycat59 (7/23/2009)
Using COUNT(1) requires SQL Server to return all records that meet the criteria.
Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.

Actually neither is true .... count(1) will return a count of all records meeting the criteria. EXISTS will stop searching when it finds a match.

Paul White SSC Guru Points: 150467 More actions · Answer 7

That illustrates nicely why I dislike the COUNT(1) construction so much.

It is pointless and potentially confusing. It seems to be born out of some idea that COUNT(constant) is somehow more efficient than COUNT(*). I have lost count of how many times I have read that COUNT(*) accesses all columns. Good grief.

COUNT(*) and COUNT(1) always and without exception result in identical plans and identical results.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1004362 More actions · Answer 8

matt stockham (7/24/2009)
Jeff Moden (7/24/2009)
happycat59 (7/23/2009)
Using COUNT(1) requires SQL Server to return all records that meet the criteria.
Ummmm.... nope... it doesn't... it stops searching as soon as it finds one record that meets the criteria. The only time it may have to return all records is if a sort or some other anomoly like an aggregate is required.
Actually neither is true .... count(1) will return a count of all records meeting the criteria. EXISTS will stop searching when it finds a match.

I don't know what the heck is going on with my eyes... I swear that I saw that as TOP 1 (says I as I pencil in an appointment for the eye doctor). Thank you for the catch .

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MusabUmair Ten Centuries Points: 1071 More actions · Answer 9

I suggest you to use this solution as its much better when comparing text values if you need an exact math

1. Add an (e.g. CSV_Inv) INTEGER COLUMN to the table

2. UPDATE IT with mentioned below statement and bind it with the computed value i.e. CHECKSUM(InvoiceNumber)

UPDATE Invoices

SET CSV_Inv = CHECKSUM(InvoiceNumber)

3.Create an index on CSV_Inv with name like Ind_CSV_Inv

4. Then use

SELECT 1

FROM Invoices WITH (INDEX = Ind_CSV_Inv)

WHERE CSV_Inv = CHECKSUM('ABC')

it will give you the better results

Musab
http://www.sqlhelpline.com

Paul White SSC Guru Points: 150467 More actions · Answer 10

musab (8/4/2009)
I suggest you to use this solution as its much better when comparing text values if you need an exact math
1. Add an (e.g. CSV_Inv) INTEGER COLUMN to the table
2. UPDATE IT with mentioned below statement and bind it with the computed value i.e. CHECKSUM(InvoiceNumber)
UPDATE Invoices
SET CSV_Inv = CHECKSUM(InvoiceNumber)
3.Create an index on CSV_Inv with name like Ind_CSV_Inv
4. Then use
SELECT 1
FROM Invoices WITH (INDEX = Ind_CSV_Inv)
WHERE CSV_Inv = CHECKSUM('ABC')
it will give you the better results

Using a hash can be valuable if the number of rows justifies it, but you have to be very careful with the implementation.

Just because the hash matches does not mean the values are the same - they might be, but you still have to check.

The advantage is that rows where the hash doesn't match definitely do not match.

In the example given:

SELECT 1

FROM Invoices WITH (INDEX = Ind_CSV_Inv)

WHERE CSV_Inv = CHECKSUM('ABC')

AND InvoiceNumber = 'ABC'

That bold bit is absolutely required for correctness.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi