Using CLR to work with binary strings in VARBINARY field.

  • I have a situation where I need to record for each user in the database 20K different binary settings (each setting is either set or not). There is potentially up to 50,000 Users each with their own collection of 20K settings.

    When loading the user, the application will need to pull the entire list of settings for that user.

    Updates to these settings will be additions only (turning on of the corresponding bits), the string will begin with all 0s.

    The settings can be mapped via integer IDs from 1 to 20,000 which relate to a position in the binary string.

    So, my question is, would it be better to store this information in the relational way, (with a table containing user_id, and setting id, and a bit flag), which would incur up to 1,000,000,000 rows. Or create the 50,000 row table with a varbinary field to hold the 20000 bit binary string?

    Going with the binary string, I would need to create CLR functions for:

    1) Updating the binary string but toggling the bit in the position relative to the ID passed in. (Returns a new binary string)

    2) Walking through the string, and returning an Integer array of the ordinal positions currently set to 1 (i.e. settings which are turned on), this could be used in an "in" statement, or I have a regular function to parse arrays to a table.

    Any thoughts?

    Any Thoughts on this?

  • Hey Ian,

    Just a quick question before I post a proper reply: how many 'bits' are likely to be set per user?

    The relational solution will be much quicker and more maintainable, if the number of bits set is small, compared to the 20K possibles.

    One would only store the set bits.

    Out of interest, what is the nature of these flags? It's seems a very unusual requirement...

    Cheers,

    Paul

  • All 20K bits would be in the string if a binary solution was chosen, because the position in the string would be the identifier of the particular setting.

    It's being used for data storage of a collection of settings for an online video game, in order to maintain if certain aspects of the game have been accomplished by a character.

    There is a static mapping of about 20K aspects (up to 40K), and I need to record for each character if they've completed them or not. Certainly in the beginning of the application, a traditional approach would be faster, but as more characters complete more aspects of the game, the table would grow quite large, (index defragmentation time may be limited too). Going with a binary solution, limits the rows to just the max number of characters, and removes the 20K repeats of the character ID and object type columns for each collection of settings.

  • IlanP (4/24/2009)


    ...

    (A)When loading the user, the application will need to pull the entire list of settings for that user.

    (B)Updates to these settings will be additions only (turning on of the corresponding bits), the string will begin with all 0s.

    ... I would need to create CLR functions for:

    1) Updating the binary string but toggling the bit in the position relative to the ID passed in. (Returns a new binary string)

    2) Walking through the string, and returning an Integer array of the ordinal positions currently set to 1 (i.e. settings which are turned on), this could be used in an "in" statement, or I have a regular function to parse arrays to a table.

    ...

    So, my question is, would it be better to store this information in the relational way, (with a table containing user_id, and setting id, and a bit flag), which would incur up to 1,000,000,000 rows. Or create the 50,000 row table with a varbinary field to hold the 20000 bit binary string?

    Well, you never really answered Paul's question which is an important one.

    Absent that information my instinct would be to use explicit structured denormalization by going the Varbinary route with this design:

    CREATE Table UserSettings(

    [user_id] int primary key clustered,

    settings varbinary(2500)

    );

    Then I would implement your functional requirements (bolded indexes above) as follows:

    (A):SELECT settings from UserSettings Where user_id = @user_id;

    (B):CREATE Proc spSetFlagON(

    @user_id as int,

    @setting as int-- note: assume zero-based indexing

    ) As

    -- note: not sure if I got the byte-ordering right here...

    UPDATE UserSettings

    Set settings

    = CAST(STUFF(settings, 1+(@setting/8), 1

    , SUBSTRING(settings, 1+(@setting/8), 1)

    | CAST(POWER(2, (@setting%8)) as tinyint)

    ) as BINARY(2500))

    Where user_id = @user_id

    1): Don't use a CLR function, use the sProc above.

    2): Do this in the client code.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry's advice is sound but I would like to add an alternative view:

    If there are to be a large number of users and updates would be frequent (the number of bits implies that you are recording pretty much every piece of information possible!) then I would start to worry that the overhead of rewriting the entire varbinary structure each time might impose an artificial limit of overall responsiveness - which is kinda important in an on-line game.

    Latching and locking activity may be concentrated by bit-stuffing the information in this way. This is especially true if the user's varbinary structure is read to manage other users' in-game activities.

    Even with a row-level versioning isolation level, tempdb will be hot with row-versions and latching may still be a problem.

    I should imagine that the updates would need to be immediate too (rather than batched up) and that there would be significant read activity on the table too?

    The logging implications of writing a full undo and redo copy of the varbinary structure each time should be considered carefully too.

    The normalized solution may sound unwieldy - but SQL Server is heavily optimized toward those sorts of operations. Given sufficient CPU and (importantly) buffer pool memory, a properly designed normalized solution could be very performant and scalable.

    A suitably capable system with scale-out potential and high-availability features will not be cheap, however.

    Cheers,

    Paul

  • Paul White (5/1/2009)


    If there are to be a large number of users and updates would be frequent (the number of bits implies that you are recording pretty much every piece of information possible!) then I would start to worry that the overhead of rewriting the entire varbinary structure each time might impose an artificial limit of overall responsiveness - which is kinda important in an on-line game.

    Latching and locking activity may be concentrated by bit-stuffing the information in this way. This is especially true if the user's varbinary structure is read to manage other users' in-game activities.

    Heh, I was going to disagree with you until I read this last sentence. I hadn't thought of that and it's an excellent point.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • There is also a hybrid approach that you can take. Instead of just 20,000 bit flags, in 2500 binary bytes or 20,000 records, break your flags into "zones" or "groups" that are strongly associated.

    So for instance, if your game had 20 levels and each bit had a specific relationship to a single level you could then divide them up into 20 different records, one for each level. Depending on how generically/specifically you map them, they would each be in a separate table (specific) or in the same table with a GroupID, like so:

    CREATE Table UserSettings(

    [user_id] int,

    group_id smallint,

    settings varbinary(125) -- depends on how many groups

    );

    How big the varbinary actually is would depend on how many groups you used. This could just be done arbitrarily also, just pick a number and divide the bits up into that many groups. Offhand I would guess that you would want to have somewhere between 10 and 50 groups.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I thought I answered Paul's first question. If we use the binary string, it would need to start out with ~20K bits. This is because the settings would be mapped based on the bit positions and thus the string length would need to be initialized for all the settings at the time the character was created, not just the ones completed by the user.

    For this requirement, we decided to go with individual records per setting, since we would only need to record the settings for which the user has accomplished so far, and they are only one way settings (once achieved, it is set and never toggled). This would eliminate the need for updates, so the table would be accessed with just inserts and selects. As the DB develops, we decided we could just monitor performance of the table, and if this model begins to slow down too much as the table grows, we could look at the implementation at that time.

    There is another section of the application where we did actually choose to use the binary approach, and your suggestions are exactly how we implemented it. (by breaking the binary string into groups, and having the application do the bit updating and passing in the new string to the database).

    Thank you guys for your feedback and advice!

  • Glad we could help.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yep, you're welcome Ian.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply