SQL design approach for searching a table with an unlimited number of bit fields

  • Consider searching a table that contains Apartment Rental Information: A client using the interface selects a number of criteria that are represented as bit fields in the DB, for instance:

    * AllowsPets

    * HasParking

    * HasDeck

    * ModernKitchen

    etc..

    We are facing a situation where each new client of our software has additional fields they want to allow their end users to search on. The number of bit fields could reach into the hundreds.

    I have three approaches that I'm considering and hoping for input and/or a different approach.

    * Current approach: Add more bit fields, sql queries are built dynamically and executed using EXEC: SET @sql = @sql + 'l.[NumUnits],' exec(@SQL))

    Continue to add more bit fields. (table with 300 columns?)

    *

    Represent the data as a series of bits in one field. I'm unclear on if this approach will work, consider the 4 sample bit fields I offered above. The field could look like this: 1011 which would indicate false for 'hasparking' but true for all others. What I'm unclear on is how you would structure a query where you didn't care if it was false or true, for instance 1?11 where the person searching needs 1,3 and 4 to be true but doesn't care if 'HasParking' is true or false.

    *

    Move to an Attribute based approach where you have a table 'AttributeTypeID' and a table PropertyAttributes, which joins the PropertyID to the AttributeTypeId, new bit fields are simply a row in the AttributeTypeID table.

    some other approach? Is this a well known SQL design pattern?

    Thanks for any help

  • tdemille-1102706 (10/1/2009)


    ...

    I have three approaches that I'm considering and hoping for input and/or a different approach.

    * Current approach: Add more bit fields, sql queries are built dynamically and executed using EXEC: SET @sql = @sql + 'l.[NumUnits],' exec(@SQL))

    Continue to add more bit fields. (table with 300 columns?)

    The number of columns could be a problem, however in SQL Server 2008, there is a new feature called "sparse columns" for just this kind of thing. In this approach it is assumed the most of the columns for most of the rows will be unspecified (NULL), which will allow it support literally thousands of these types of columns. The downside is that you need to be on 2008, and there are some querying restrictions.

    *

    Represent the data as a series of bits in one field. I'm unclear on if this approach will work, consider the 4 sample bit fields I offered above. The field could look like this: 1011 which would indicate false for 'hasparking' but true for all others. What I'm unclear on is how you would structure a query where you didn't care if it was false or true, for instance 1?11 where the person searching needs 1,3 and 4 to be true but doesn't care if 'HasParking' is true or false.

    It's easy enough to do, "Must have 1,3 and 4, but don't care about 3" would be [font="Courier New"](ColBits & (1 | 4 | 8)) = (1 | 4 | 8)[/font] (there are other ways also). The problems here are first that you normally cannot use indexes to search for conditions like these and secondly, you have no way to represent "Unspecified". which you will probably want.

    *

    Move to an Attribute based approach where you have a table 'AttributeTypeID' and a table PropertyAttributes, which joins the PropertyID to the AttributeTypeId, new bit fields are simply a row in the AttributeTypeID table.

    This is the Attribute-Values approach, and it has the advantages of being very flexible, easy to design, setup, and to program for dynamic attributes-list matching, it can hanlde "Unspecified" (by just not having an entry for that attribute), plus it can make use of indexes. The downsides are: it can use up diskspace inefficiently, it can be a real headache to administer, especially if you start to use this techinque for everything, transforming any matching more complicated than "must have all of these attributes" or "must have at least one of these attributes" into the corresponding SQL query can be extremely complicated and a real mind-bender, although it can represent "unspecified", taking it into account in your queries is also a mind-bender, and finally, even for these simple list matching examples, and even using the indexes, the lookups can still be a tad inefficient.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thank you for your answer,

    I am most interested in performance, storage is cheap 🙂 Given that constraint, should I just go with regular bit columns?

    Any thoughts on Attribute-Value vs. Bit Fields in one table in terms of performance?

  • On balance, my recommendation would probably be to use the Sparse columns if you have SQL Server 2008, and then to use the Attributes approach, within limits. That is, stick to just AND lists and OR lists of attirbutes (don't mix them), and attributes that almost every landlord uses should be come regular columns, and don't try to apply this technique to everything in your database design. The apparently "unlimited" flexibility can be very seductive, but it comes at the price of subtly but rapidly increasing complexity and overhead.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Comparative performance is tough to generalize between these options because it ultimately depends how many columns there will really be, and how large a range will you have to search over if you cannot use the indexes, and on average, how many attributes per apartment will actually be in use?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • generally a client will start searching with a broad search and then start adding attributes, for instance:

    bedrooms = 3, area = 'boston'

    bedrooms = 3, area = 'boston', sqfeet > 1500

    bedrooms = 3, area = 'boston', sqfeet > 1500, allowspets = true, hasview = true, haspool = true

    etc.

    hard to quantify how many attributes are used per client or per search, consider if you were looking to search MLS listings for a home to purchase you might make your query progressively more narrow

    we have NO intention of changing our schema to be EAV, and in fact it does not seem very enticing to me 🙂

    We just need to solve this one problem... keep adding columns or add attributes

    Someone had mentioned using dynamic pivot to view an EAV, anyone have any ideas on how that is accomplished?

  • Dynamic pivoting can be useful for summarizing and consolidating Attribute type relations, but not for filtering it. And certainly not if performance is a concern.

    And frankly, most client languages can do what dynamic pivots do better than dynamic pivot.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • If you can do sparse columns, that would be ideal. If not, EAV is a very workable way to do this. Yes, it has some drawbacks, but it's easy enough to overcome them.

    Another option would be to add a number of generic bit columns, and then have a metadata table that defines what they are on a per-customer basis. BitCol1 could be AllowsPets for company 1, and could be HasPool for company 2, that kind of thing. Adds a layer to the software, since you have to interpret that, but it can be done and can be quite efficient.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • any thoughts on performance of sparse bit columns vs. just regular (non sparse) bit columns?

  • I haven't had a chance to use them in a production environment, so really can't say how they do under real loads.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply