Validating existence of a records against two fields.

  • Hi forum,

    I'm writing a procedure that will receive a few parameters which will update or insert records in a 2 tables. One of the parameters, a barcode value has to be validated to make sure it exists but depending on what is in the barcode, I could find it either in column1 or column2.

    So far I've done this as follows.

    if exists (select 1 from LookupTable where @Parm in (Column1, Column2))

    Begin

    ...blah blah blah

    End

    I've been told by someone that it's a very bad way of doing it because SQL can't use indexes when coded like this and that my lookups are slower than they could be.

    They recommend that I do this with 2 queries either as;

    declare @RecordFound int

    set @RecordFound = 0

    if exists (select 1 from LookupTable where @Parm = Column1)

    set @RecordFound = 1

    if exists (select 1 from LookupTable where @Parm = Column2)

    set @RecordFound = 1

    if (@RecordFound = 1)

    Begin

    ...blah blah blah

    End

    or

    if exists (select 1 from LookupTable where @Parm = Column1) or exists (select 1 from LookupTable where @Parm = 2)

    Begin

    ...blah blah blah

    End

    If I have unique indexes on column1 and column2, wouldn't my method be faster or at least equal than the other two methods which requires two queries ?

  • Well... It depends...

    The IN clause in this case is probably going to cause a table scan. I'd say that you're better off using IF EXISTS in a nested fashion such that you check the value that fails the most first. If it fails, you don't have to make the second check at all. You make the second check if the first check passes. That should minimize the number of unecessary reads.

    But, what are you doing? Frequently people use EXISTS statement where they should have used a WHERE statement or some other construct. I recently found where a developer had used EXISTS prior to an UPDATE to "save time" on the UPDATE and to determine if no rows were updated. I showed how he was forcing two reads for each update and that the value was most likely going to be there, not missing. So doing an UPDATE and then checking for rows affected was the better approach.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • In the case of your query the IN is like OR'ing which will likely cause an table scan because you have 2 indexes to check. In the other options the 2 queries will both do index seeks which will likely be faster than scans.

    Here is a little test to show you what I mean (done in adventureworks):

    [font="Courier New"]-- clean up test table

    IF OBJECT_ID('dbo.Products') IS NOT NULL

       BEGIN

           DROP TABLE dbo.Products

       END

    -- create & populate test table

    SELECT

       ProductId,

       CONVERT(VARCHAR(3), productid) + LEFT(ProductNumber, 7) AS prodcode,

       CONVERT(VARCHAR(3), productid) + LEFT(Name, 7)  AS prodcode2

    INTO

       dbo.Products

    FROM

       production.product

    -- create primary key on test table

    ALTER TABLE dbo.Products

       ADD CONSTRAINT PK_dbo_Products PRIMARY KEY CLUSTERED  (ProductId)

    -- create unique non-clustered indexes on test table

    CREATE UNIQUE INDEX UX_Prodcode ON dbo.Products(ProdCode)

    CREATE UNIQUE INDEX UX_Prodcode2 ON dbo.Products(ProdCode2)

    -- Declare and fill the variable

    DECLARE @prodcode VARCHAR(10)

    SELECT @prodcode = MIN(prodcode) FROM dbo.products

    -- start test getting IO and time stats

    PRINT 'Begin In clause'

    SET STATISTICS IO ON

    SET STATISTICS Time ON

    IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode IN (Prodcode, prodcode2))

       BEGIN

           PRINT 'ENd In clause'

       END

    ELSE

       BEGIN

           PRINT 'ENd In clause'

       END

    SET STATISTICS Io OFF

    SET STATISTICS Time OFF

    -- clean up cache and buffer so it is a valid test

    DBCC FreeProcCache()

    DBCC DropCleanBuffers()

    PRINT 'Begin = prodcode'

    SET STATISTICS IO ON

    SET STATISTICS Time ON

    IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode)

       BEGIN

           PRINT 'End = prodcode'

       END

    ELSE

       BEGIN

           PRINT 'End = prodcode'

       END

    SET STATISTICS Io OFF

    SET STATISTICS Time OFF

    -- clean up cache and buffer so it is a valid test

    DBCC FreeProcCache()

    DBCC DropCleanBuffers()

    PRINT 'Begin = prodcode2'

    SET STATISTICS IO ON

    SET STATISTICS Time ON

    IF EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode2)

       BEGIN

           PRINT 'End = prodcode2'

       END

    ELSE

       BEGIN

           PRINT 'End = prodcode2'

       END

    SET STATISTICS Io OFF

    SET STATISTICS Time OFF

    -- clean up cache and buffer so it is a valid test

    DBCC FreeProcCache()

    DBCC DropCleanBuffers()

    PRINT 'Begin Or Exists'

    SET STATISTICS IO ON

    SET STATISTICS Time ON

    IF   EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode) OR EXISTS (SELECT 1 FROM dbo.PRoducts WHERE @prodcode = Prodcode2)

       BEGIN

           PRINT 'End Or Exists'

       END

    ELSE

       BEGIN

           PRINT 'End Or Exists'

       END

    SET STATISTICS Io OFF

    SET STATISTICS Time OFF

    -- clean up cache and buffer so it is a valid test for re-run

    DBCC FreeProcCache()

    DBCC DropCleanBuffers()

    [/font]

    Here are the results for TIME and IO Stats for each query:

    Begin In clause

    Table 'Products'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 19 ms.

    ENd In clause

    Begin = prodcode

    Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    End = prodcode

    Begin = prodcode2

    Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 8 ms.

    End = prodcode2

    Begin Or Exists

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Products'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 12 ms.

    End Or Exists

    You can see that using IN is the least efficient and actually does a clustered index scan while the others have no scans. I am also attaching the Execution plan.

  • Grant,

    Here's a short description of what I do. The .NET app that I'm writing will run in Windows CE Device. This is a typical physical inventory count.

    Each product in our warehouse is barcoded and this can be either our barcode value or the customer's. We sometimes have both but it's a rare case because most of our customers don't want more than 1 barcode on the products so when they have their own we can't have ours.

    When the user scans a barcode I need to check that it's valid. If it is, I insert/create records in my tables add it to the count and return details of the products to the app, if the check fails I return a message to alert the user that the barcode isn't valid (misprinted, damaged ticket, bad scan etc).

    There is no processing done in the CE device, after each barcode scan, I call a procedure that does whatever needs to be done and only return data to the app that is displayed on screen.

    Here's a simplified sample Skeleton (removed some output parms) of the procedure.

    Create Procedure UpdatePhysicalCount

    (

    @Warehouse char(2),

    @Location char(10),

    @BarcodeValue Char(25),

    @ValidBarcode bit output,

    @ProdNo char(9) output,

    @ProdColor char(5) output,

    @ProdSize char(2) output,

    @TotBin int output,

    @TotSku int output

    )

    AS

    set @ValidBarcode = 1

    Set @ProdNo = ''

    set @ProdColor = ''

    set @ProdSize = ''

    if not exists (select 1 from ProductList where BarcodeValue in (OurBarcode, CustomerBarcode)

    Begin

    Set @ValidBarcode = 0

    Return

    End

    /* Each item in our product table has a primary key called ProdID which is used to retrieve other columns from that table */

    declare @ProdID int

    set @ProdID = (select prodid from ProductList where BarcodeValue in (OurBarcode, CustomerBarcode)

    set @ProdNo = (Select Prodno from ProductList where ProdID = @ProdID)

    set @ProdColor = (Select ProdColor from ProductList where ProdId = @ProdID)

    set @ProdSize = (Select @ProdSize from ProductList where ProdID = @ProdID)

    /* Then I have a few other statements to update

    counters in the inventory count tables

    and output parameters */

    The remote device calls this procedure and upon return display an error message if @ValidBarode = 1, otherwise it displays the content of the other output parameters and the user goes on with the next scan.

    This is the base of all my warehouse applications, I have 4 apps that do different things and they all need to validate the barcodes for existence in either the OurBarcode or CustomerBarcode fields and retrieve data for the corresponding ProdID value.

  • Jack,

    I modified your code to run it with my tables. The results are very telling......and scary :unsure:

    The "IN Claude" takes an average of 575ms to run. There's one thing that I find odd though, it took "only" 274ms the first time then it took between 550ms and 625ms and subsequent runs.

    The "Or exists" method takes 11ms.

    I guess I have my answer. Thanks !

  • OK. You may or may not want to continue the existence check. From what I can see though, I think you could skip it. It looks like you're doing a bunch of select statements to populate properties then you use those properties to insert values. Instead, why don't you do a direct INSERT ... SELECT:

    INSERT INTO Something

    (ProdId

    ,ProdNo

    ,ProdColor

    ,ProdSize)

    SELECT p.ProdId

    ,p.ProdNo

    ,p.ProdColor

    ,p.ProdSize

    FROM ProductList p

    WHERE BarCodeValue IN (OurBarcode, CustomerBarcode)

    Although I'd try to find a way to get rid of the IN too.

    Or, if you really do have to load parameters, do it in a single statement like this:

    SELECT @ProdId = ProdId

    ,@ProdNo = Prodno

    ,@ProdColor = ProdColor

    ,@ProdSize = ProdSize

    FROM ProductList

    WHERE BarCodeValue IN (OurBarcode, CustomerBarcode)

    That will at least radically reduce the number of reads & scans you have in the system.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Glad I could help. That is the difference between SEEKS and SCANS. This is also why you should TEST different code.

  • I did not look at your code, but as Grant has pointed out, assigning the variables in a single select would work best.

    I also think you could eliminate the check as well. You could do your select and then do:

    If @@ROWCOUNT = 0

    Begin

    Set @ValidBarcode = 1

    END

    ELSE

    Begin

    Set @ValidBarcode = 0

    End

    Of course this does not take into account the other logic you have in the procedure that we have not seen.

  • Thanks guys. I'm a beginner at SQL and I still have a lot to learn. For instance I didn't know that it's possible to populaire multiple variables in 1 select statement as Grant has demonstrated.

    I wouldn't have thought of using the @@ROWCOUNT value either 🙂

  • Don't worry about it. There's always more to learn, for everyone. Speaking of which, I need to download 2008 and get started learning new stuff.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply