Validating existence of a records against two fields.

Question

Validating existence of a records against two fields.

Eric_Gagne

SSCrazy

Points: 2062
More actions
August 13, 2008 at 6:55 am

#211141

Hi forum,
I'm writing a procedure that will receive a few parameters which will update or insert records in a 2 tables. One of the parameters, a barcode value has to be validated to make sure it exists but depending on what is in the barcode, I could find it either in column1 or column2.
So far I've done this as follows.
if exists (select 1 from LookupTable where @Parm in (Column1, Column2))
Begin
...blah blah blah
End
I've been told by someone that it's a very bad way of doing it because SQL can't use indexes when coded like this and that my lookups are slower than they could be.
They recommend that I do this with 2 queries either as;
declare @RecordFound int
set @RecordFound = 0
if exists (select 1 from LookupTable where @Parm = Column1)
set @RecordFound = 1
if exists (select 1 from LookupTable where @Parm = Column2)
set @RecordFound = 1
if (@RecordFound = 1)
Begin
...blah blah blah
End
or
if exists (select 1 from LookupTable where @Parm = Column1) or exists (select 1 from LookupTable where @Parm = 2)
Begin
...blah blah blah
End
If I have unique indexes on column1 and column2, wouldn't my method be faster or at least equal than the other two methods which requires two queries ?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 1

Well... It depends...

The IN clause in this case is probably going to cause a table scan. I'd say that you're better off using IF EXISTS in a nested fashion such that you check the value that fails the most first. If it fails, you don't have to make the second check at all. You make the second check if the first check passes. That should minimize the number of unecessary reads.

But, what are you doing? Frequently people use EXISTS statement where they should have used a WHERE statement or some other construct. I recently found where a developer had used EXISTS prior to an UPDATE to "save time" on the UPDATE and to determine if no rows were updated. I showed how he was forcing two reads for each update and that the value was most likely going to be there, not missing. So doing an UPDATE and then checking for rows affected was the better approach.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jack Corbett SSC Guru Points: 184394 More actions · Answer 2

In the case of your query the IN is like OR'ing which will likely cause an table scan because you have 2 indexes to check. In the other options the 2 queries will both do index seeks which will likely be faster than scans.

Here is a little test to show you what I mean (done in adventureworks):

[font="Courier New"]-- clean up test table

IF OBJECT_ID('dbo.Products') IS NOT NULL

BEGIN

DROP TABLE dbo.Products

END

-- create & populate test table

SELECT

ProductId,

CONVERT(VARCHAR(3), productid) + LEFT(ProductNumber, 7) AS prodcode,

CONVERT(VARCHAR(3), productid) + LEFT(Name, 7) AS prodcode2

INTO

dbo.Products

FROM

production.product

-- create primary key on test table

ALTER TABLE dbo.Products

ADD CONSTRAINT PK_dbo_Products PRIMARY KEY CLUSTERED (ProductId)

-- create unique non-clustered indexes on test table

CREATE UNIQUE INDEX UX_Prodcode ON dbo.Products(ProdCode)

CREATE UNIQUE INDEX UX_Prodcode2 ON dbo.Products(ProdCode2)

-- Declare and fill the variable

DECLARE @prodcode VARCHAR(10)

SELECT @prodcode = MIN(prodcode) FROM dbo.products

-- start test getting IO and time stats

PRINT 'Begin In clause'

SET STATISTICS IO ON

SET STATISTICS Time ON

IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode IN (Prodcode, prodcode2))

BEGIN

PRINT 'ENd In clause'

END

ELSE

BEGIN

PRINT 'ENd In clause'

END

SET STATISTICS Io OFF

SET STATISTICS Time OFF

-- clean up cache and buffer so it is a valid test

DBCC FreeProcCache()

DBCC DropCleanBuffers()

PRINT 'Begin = prodcode'

SET STATISTICS IO ON

SET STATISTICS Time ON

IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode)

BEGIN

PRINT 'End = prodcode'

END

ELSE

BEGIN

PRINT 'End = prodcode'

END

SET STATISTICS Io OFF

SET STATISTICS Time OFF

-- clean up cache and buffer so it is a valid test

DBCC FreeProcCache()

DBCC DropCleanBuffers()

PRINT 'Begin = prodcode2'

SET STATISTICS IO ON

SET STATISTICS Time ON

IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode2)

BEGIN

PRINT 'End = prodcode2'

END

ELSE

BEGIN

PRINT 'End = prodcode2'

END

SET STATISTICS Io OFF

SET STATISTICS Time OFF

-- clean up cache and buffer so it is a valid test

DBCC FreeProcCache()

DBCC DropCleanBuffers()

PRINT 'Begin Or Exists'

SET STATISTICS IO ON

SET STATISTICS Time ON

IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode) OR EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode2)

BEGIN

PRINT 'End Or Exists'

END

ELSE

BEGIN

PRINT 'End Or Exists'

END

SET STATISTICS Io OFF

SET STATISTICS Time OFF

-- clean up cache and buffer so it is a valid test for re-run

DBCC FreeProcCache()

DBCC DropCleanBuffers()

[/font]

Here are the results for TIME and IO Stats for each query:

Begin In clause

Table 'Products'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 19 ms.

ENd In clause

Begin = prodcode

Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

End = prodcode

Begin = prodcode2

Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 8 ms.

End = prodcode2

Begin Or Exists

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 12 ms.

End Or Exists

You can see that using IN is the least efficient and actually does a clustered index scan while the others have no scans. I am also attaching the Execution plan.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Eric_Gagne SSCrazy Points: 2062 More actions · Answer 3

Grant,

Here's a short description of what I do. The .NET app that I'm writing will run in Windows CE Device. This is a typical physical inventory count.

Each product in our warehouse is barcoded and this can be either our barcode value or the customer's. We sometimes have both but it's a rare case because most of our customers don't want more than 1 barcode on the products so when they have their own we can't have ours.

When the user scans a barcode I need to check that it's valid. If it is, I insert/create records in my tables add it to the count and return details of the products to the app, if the check fails I return a message to alert the user that the barcode isn't valid (misprinted, damaged ticket, bad scan etc).

There is no processing done in the CE device, after each barcode scan, I call a procedure that does whatever needs to be done and only return data to the app that is displayed on screen.

Here's a simplified sample Skeleton (removed some output parms) of the procedure.

Create Procedure UpdatePhysicalCount

(

@Warehouse char(2),

@Location char(10),

@BarcodeValue Char(25),

@ValidBarcode bit output,

@ProdNo char(9) output,

@ProdColor char(5) output,

@ProdSize char(2) output,

@TotBin int output,

@TotSku int output

)

AS

set @ValidBarcode = 1

Set @ProdNo = ''

set @ProdColor = ''

set @ProdSize = ''

if not exists (select 1 from ProductList where BarcodeValue in (OurBarcode, CustomerBarcode)

Begin

Set @ValidBarcode = 0

Return

End

/* Each item in our product table has a primary key called ProdID which is used to retrieve other columns from that table */

declare @ProdID int

set @ProdID = (select prodid from ProductList where BarcodeValue in (OurBarcode, CustomerBarcode)

set @ProdNo = (Select Prodno from ProductList where ProdID = @ProdID)

set @ProdColor = (Select ProdColor from ProductList where ProdId = @ProdID)

set @ProdSize = (Select @ProdSize from ProductList where ProdID = @ProdID)

/* Then I have a few other statements to update

counters in the inventory count tables

and output parameters */

The remote device calls this procedure and upon return display an error message if @ValidBarode = 1, otherwise it displays the content of the other output parameters and the user goes on with the next scan.

This is the base of all my warehouse applications, I have 4 apps that do different things and they all need to validate the barcodes for existence in either the OurBarcode or CustomerBarcode fields and retrieve data for the corresponding ProdID value.

Eric_Gagne SSCrazy Points: 2062 More actions · Answer 4

Jack,

I modified your code to run it with my tables. The results are very telling......and scary :unsure:

The "IN Claude" takes an average of 575ms to run. There's one thing that I find odd though, it took "only" 274ms the first time then it took between 550ms and 625ms and subsequent runs.

The "Or exists" method takes 11ms.

I guess I have my answer. Thanks !

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 5

OK. You may or may not want to continue the existence check. From what I can see though, I think you could skip it. It looks like you're doing a bunch of select statements to populate properties then you use those properties to insert values. Instead, why don't you do a direct INSERT ... SELECT:

INSERT INTO Something

(ProdId

,ProdNo

,ProdColor

,ProdSize)

SELECT p.ProdId

,p.ProdNo

,p.ProdColor

,p.ProdSize

FROM ProductList p

WHERE BarCodeValue IN (OurBarcode, CustomerBarcode)

Although I'd try to find a way to get rid of the IN too.

Or, if you really do have to load parameters, do it in a single statement like this:

SELECT @ProdId = ProdId

,@ProdNo = Prodno

,@ProdColor = ProdColor

,@ProdSize = ProdSize

FROM ProductList

WHERE BarCodeValue IN (OurBarcode, CustomerBarcode)

That will at least radically reduce the number of reads & scans you have in the system.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jack Corbett SSC Guru Points: 184394 More actions · Answer 6

Glad I could help. That is the difference between SEEKS and SCANS. This is also why you should TEST different code.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Jack Corbett SSC Guru Points: 184394 More actions · Answer 7

I did not look at your code, but as Grant has pointed out, assigning the variables in a single select would work best.

I also think you could eliminate the check as well. You could do your select and then do:

If @@ROWCOUNT = 0

Begin

Set @ValidBarcode = 1

END

ELSE

Begin

Set @ValidBarcode = 0

End

Of course this does not take into account the other logic you have in the procedure that we have not seen.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Eric_Gagne SSCrazy Points: 2062 More actions · Answer 8

Thanks guys. I'm a beginner at SQL and I still have a lot to learn. For instance I didn't know that it's possible to populaire multiple variables in 1 select statement as Grant has demonstrated.

I wouldn't have thought of using the @@ROWCOUNT value either 🙂

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 9

Don't worry about it. There's always more to learn, for everyone. Speaking of which, I need to download 2008 and get started learning new stuff.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning