Using bit flags in large tables

  • Got any sample code... I'd surely could use that, someday!

  • "One measurement is worth a thousand speculations"

    Mathew,

    Index Scans are NOT always a bad thing... here's proof (and a couple o' tricks)...

    First, let's make a million row test table...

    --===== Let's do the demo in a "safe" database

        USE TempDB

    --drop table jbmtest

    GO

    --===== Create and populate a 1,000,000 row test table.

         -- Column RowNum has a range of 1 to 1,000,000 unique numbers

         -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

         -- Takes about 15 seconds to execute including the indexes...

     SELECT TOP 1000000

            RowNum     = IDENTITY(INT,1,1),

            SomeInt    = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            IsActiveBit = CAST(1 AS BIT)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Populate a handful of bits with 0

     UPDATE dbo.JBMTest

        SET IsActiveBit = 0

      WHERE RowNum % 2 = 0

        AND RowNum < 1000

    --===== Create another index just for demo purposes

     CREATE INDEX IX_JBMTest_IsActiveBit ON dbo.JBMTest (SomeInt,IsActiveBit)

    Ok... now for some tests... one features an interesting little trick to force and Index Seek instead of an Index Scan... read the comments in the code below...

    DECLARE @BitBucket1 INT, @BitBucket2 BIT, @StartTime DATETIME

    --===== This is what you probably get... Index Scan

        SET @StartTime = GETDATE()

     SELECT @BitBucket1 = SomeInt, @BitBucket2 = IsActiveBit

       FROM dbo.JBMTest

      WHERE IsActiveBit = 1

      PRINT STR(@@RowCount) +' rows ' + STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Duration (ms) for Index Scan'

    --===== This will cause an Index SEEK

        SET @StartTime = GETDATE()

     SELECT @BitBucket1 = SomeInt, @BitBucket2 = IsActiveBit

       FROM dbo.JBMTest

      WHERE SomeInt>0 AND IsActiveBit = 1

      PRINT STR(@@RowCount) +' rows ' + STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Duration (ms) for Index Seek'

    --===== This DOESN'T help

        SET @StartTime = GETDATE()

     SELECT @BitBucket1 = SomeInt, @BitBucket2 = IsActiveBit

       FROM dbo.JBMTest WITH(INDEX(IX_JBMTest_IsActiveBit))

      WHERE IsActiveBit = 1

      PRINT STR(@@RowCount) +' rows ' + STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Duration (ms) for another Index Scan'

    Dunno about your machine, but here's what I get for results...

        999501 rows        856 Duration (ms) for Index Scan

        999501 rows        846 Duration (ms) for Index Seek

        999501 rows        856 Duration (ms) for another Index Scan

    Now... why are you so worried about those Index Scans, again?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One other thing to consider about the index scan...

    I'd much rather have the server scan 1M rows of 50 Bytes each VS scanning 1M row or 2 K each.  Scan for scan, the index scan will beat the crap out of the clustered index scan in that case....

     

    Unless you now have to do a bookmark lookup and go back to the data with a query that returns loads of data.  The good ole it depends.

     

    PS Thanks for the neat trick Jeff.  First time I see this one .

  • IMHO, only one thing will likely provide any benefit here, and it is a very slim hope.  With 99% of your data being IsActive = 1, pruning inactive records is useless.  Any filter that includes IsActive = 1 will still result in a table scan.  The thing that will help out a tiny bit is including IsActive in NC index(es), which will allow a very small percentage (unless there are outlier situations) to be filtered out prior to the very costly bookmark lookups. 

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'm with rubes on this issue. I avoid bit columns like the plague. In a 2000 environment I had tables with 4 million rows with an isActive (Bit) column. I switched it to a tinyint column and immediately saw a dramatic increase in performance. I don't exactly know why the optimizer did an index seek instead of an index scan but my query result times returned to those similar to a table with a few hundred thousand rows.

    This article goes into some good explination to why an indexed bit column isn't that good of a choice: http://sqlserver2000.databases.aspfaq.com/can-i-create-an-index-on-a-bit-column.html

    John

  • Try again with the bit column, you'll most likely see a convert_implicit in the execution plan.

     

    Change to sarg like this : WHERE BitCol = CONVERT(BIT, 1)

    That'll most likely push you back to seek.

  • ok, here is a real dumb question I'm sure... but I've seen it enough and if I don't ask I will continue to be

    what does "sarg" mean?

    Regards,
    Rubes

  • There are quite a few posts on this, and I didn't feel like reading them all, so I will appologize if this was said already. 

    If your number is correct that 99% of the rows are Active=1 and your queries are mostly where Active=1  neither removing those rows, or leaving them, or indexing them or whatever will help.  The field is needed and you still need to add it to your where clause, but from a performance prespective, look elsewhere. 

    Now if you do run queries where Active=0 (that's a different story). 

  • sarg = searchable argument.

     

    This is a where clause that can use a index : WHERE MyDate BETWEEN '2007-07-01' AND '2007-07-02'.

     

    Non searchable argument in that exemple would be this : Where DATEADD(MyDate, D, 1) = '2007-07-01'.  This version forces an index scan because there is a calculation to be made on the indexed data.

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply