Using bit flags in large tables

  • We have a table with about 2 million rows. The rows are flagged as being active or inactive using an "IsActive" (bit) flag.

    Most queries against this table will need to include a "WHERE IsActive = 1" to ensure we only return the active rows.

    Seems like an indexing nightmare, prone to index scans... Should I begin re-thinking this flag? e.g. Maybe move inactive rows to a second table?

    Or is there a better approach?

    Thanks!

    Matthew

  • I don't know how many records are flagged as active...

    What do you think about creating a view with all the records with isActive = 1 and running the queries on this view?

     

  • The view will still have to scan the table.  Indexed view may be an option.  But this actually creates a second table.  I think I'd go with with separating the data.

     

    But then again, do you have any performance problems using this design?  Maybe the other where conditions of the queries make them run fast enough.

  • Thank you for the replies.

    Actually, we have many queries that hit this table, using a variety of column combinations in joins/where clause. Most utilize the "IsActive = 1".

    I noticed a report that used to run quite well against this table is now using an index scan on the clustered index (PK), and thought it might be related - if not, this IsActive thing still may be a concern for the future.

    More than 99% of the rows have an IsActive value of 1.

  • Archive the inactive rows somewhere else.  Add a check constraint that Active must equal 1 and run that new constraint with check.

     

    Also you can post the exec plan of that report, myabe it's just something simple that changed.

     

    Also when is the last time you updated the stats on that table?

  • Stats update is definitely overdue, and something I will run later this evening.

    I modified the query in question, eliminating the references to IsActive, and the execution plan is lot more pleasing, as is the overall performance of the query.

    We will begin the necessary steps to move our inactive data to a second table.

    Thanks to all for your help!

  • We had to add that type of column to all existing tables on an existing db, and similar to you, every query was modified to refer to that column.

    I added that column as a covered column to every NC index.

    Worked quite well. Though I would have preferred to move the inactive data to another table-I was overruled...

    jg

  • First of all, I try to avoid bit columns altogether.  I am not sure why, but when working with 2000 I saw performance problems when querying bit columns across linked servers.  Changing bit to tinyint solved it and ever since I've used tinyint for boolean type data.

    Secondly, if there is the chance that multiple boolean fields may exist in one table, I like to combine them into a bigint bitmap field.  For example, create a biProductOptions field to represent IsActive, IsAvailable, IsDiscontinued, etc.  IsActive = 1, IsAvailable = 2, IsDiscontinued = 4.  The cardinality is higher, the field can be indexed for performance, and adding more "options" will only increase performance and utilize the index even better.  The drawback is that biProductOptions with a value of 6 can look a bit obscure.  But throw in a lookup table to join off of and your queries will be more intuitive.  For example...

    create

    table tblProductOptions (biProductOptionID bigint, IsActive tinyint, IsDiscontinued tinyint, IsAvailable tinyint)

    GO

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 0, 0, 0, 0 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 1, 1, 0, 0 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 2, 0, 1, 0 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 3, 1, 1, 0 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 4, 0, 0, 1 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 5, 1, 0, 1 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 6, 0, 1, 1 )

    Insert

    Into tblProductOptions( biProductOptionID, IsActive, IsDiscontinued, IsAvailable ) Values ( 7, 1, 1, 1 )

    GO

    select

    * from tblProducts join tblProductOptions on biProductOptionID = biProductOptionIDs Where IsActive = 1 and IsAvailable = 1

    Regards,
    Rubes

  • Just out of curiosity what percent of your 2 million rows are active?  Depending on the percentage of the rows that are active/inactive the position of the bit field in your indexes can make a huge difference in the performance of your queries (e.g. if 50% of your rows are active, put it at the end if 5% are active put it first) the idea being that your indexes should have fields/items with the highest cardinality at the "front" of the columns in an index, particularly if it's a covering index.  You may also consider using the "with" operator in your index to include the value but not force a sort of the index on that particular key.

    Joe

     

  • I use status fields typed as bit or tinyint quite frequently with good results. I like the space savings of the bit and if its the right data type fo the job, I use it. Absolutely if you only query against the status column you will probably get a table scan, but I rarely query on that column. As soon as you add other columns to the mix then you can index effectively and the bit column is just a final filter that gets applied.

  • About 99% of the rows in our table have an IsActive value of 1. I am still leaning toward moving the inactive rows to another table.

    After running an index defrag and updating stats last night (hadn't been done in over a month) the performance of various queries has increased dramatically.

    rubes, we do have some columns we have combined into "bitmask" columns and I agree on all your points.

  • Before moving the inactive rows to another table be sure to take into account the potential effects on reporting, data warehouse, etc.  If you're "archiving" your inactive customers to another table what happens to their orders, etc?

    Joe

     

  • Thanks for the warning, Joe. Fortunately these rows in question are already child rows themselves. They will theoretically never need to be accessed by the end user and only exist for auditing purposes.

    -Matthew

  • The question then becomes, why do you need them at all?  Does your organization have a data retention policy?  If the data is unimportant/not required for retention why not delete the data?

    Joe

     

  • Not necessarily a suggestion, but an interesting trick.  If you look at the partitioning views in SQL you will find that if you create two tables that have all of the same fields, and have a check constraint on the primary key field of the table, you can create a UNION ALL view that the query optimizer will automatically determine which table to get the data from based on the check constraints.  This is an awful summary of the feature, but read about it an you will get a sense of it.

    In addition to this neat little feature, you can do the same type of partitioning with the use of check constraints on a column without it being the primary key and queries with a TOP clause will also only search the appropriate tables in most cases.

    So, you could put all of your inactive records into a second table, create a view and a couple of check constraints, and when you use SELECT TOP 100 PERCENT, you will find the query optimizer will scan only your table full of active records.  You would also have to probably add some INSTEAD OF triggers to move your data when the IsActive flag is checked, but it could all be made to work pretty easily.

    Now, I will end this with it probably not being a good solution for your archiving needs, but it may be something of interest, so I figured I would share.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply