Aggregate bit operator in TSQL

  • OK, here is the Set-Based solution, using Jeff's data setup (I also stole Jeff's other two tricks for compactness):

    --===== Create the test data

    -- (NOT PART OF THE SOLUTION)

    DECLARE @SomeTable TABLE

    (BitMask INT)

    INSERT INTO @SomeTable

    (BitMask)

    SELECT 1 UNION ALL

    SELECT 5 UNION ALL

    SELECT 7 UNION ALL

    SELECT 17 UNION ALL

    SELECT 19 UNION ALL

    SELECT 29

    --===== OR it all together

    SELECT SIGN(SUM((BitMask & 1)+0))

    + SIGN(SUM(((BitMask & 2)/2)+0))*2

    + SIGN(SUM(((BitMask & 4)/4)+0))*4

    + SIGN(SUM(((BitMask & 8)/8)+0))*8

    + SIGN(SUM(((BitMask & 16)/16)+0))*16

    + SIGN(SUM(((BitMask & 32)/32)+0))*32

    + SIGN(SUM(((BitMask & 64)/64)+0))*64

    + SIGN(SUM(((BitMask & 128)/128)+0))*128

    + SIGN(SUM(((BitMask & 256)/256)+0))*256

    + SIGN(SUM(((BitMask & 512)/512)+0))*512

    + SIGN(SUM(((BitMask & 1024)/1024)+0))*1024

    AS AggregateBitmask

    From @SomeTable

    This is just for the first 11 bits, though I am sure that you can figure out how to extend it. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Just out of curiosity, I tested both of these methods across a million rows:

    --====== Stats Off

    SET STATISTICS TIME Off

    SET STATISTICS IO Off

    GO

    DROP TABLE #SomeTable

    GO

    --===== Create the test data

    -- (NOT PART OF THE SOLUTION)

    CREATE TABLE #SomeTable

    (BitMask INT)

    --====== Insert a Million Rows

    -- (NOT PART OF THE SOLUTION)

    INSERT INTO #SomeTable

    Select TOP (1000000) Ascii(Left(c1.Name,1)) + 256*Ascii(Left(c2.Name,1))

    From master.sys.system_columns c1, master.sys.system_columns c2

    --====== Stats ON

    SET STATISTICS TIME ON

    SET STATISTICS IO ON

    --===== pseudo-cursor:

    DECLARE @OredData INT

    SELECT @OredData = ISNULL(@OredData,0)|BitMask

    FROM #SomeTable

    --===== Display the result

    -- (NOT PART OF THE SOLUTION)

    SELECT @OredData

    --===== set-based:

    SELECT @OredData = SIGN(SUM((BitMask & 1)+0))

    + SIGN(SUM(((BitMask & 2)/2)+0))*2

    + SIGN(SUM(((BitMask & 4)/4)+0))*4

    + SIGN(SUM(((BitMask & 8)/8)+0))*8

    + SIGN(SUM(((BitMask & 16)/16)+0))*16

    + SIGN(SUM(((BitMask & 32)/32)+0))*32

    + SIGN(SUM(((BitMask & 64)/64)+0))*64

    + SIGN(SUM(((BitMask & 128)/128)+0))*128

    + SIGN(SUM(((BitMask & 256)/256)+0))*256

    + SIGN(SUM(((BitMask & 512)/512)+0))*512

    + SIGN(SUM(((BitMask & 1024)/1024)+0))*1024

    + SIGN(SUM(((BitMask & 2048)/2048)+0))*2048

    + SIGN(SUM(((BitMask & 4096)/4096)+0))*4096

    + SIGN(SUM(((BitMask & 8192)/8192)+0))*8192

    + SIGN(SUM(((BitMask & 16384)/16384)+0))*16384

    From #SomeTable

    --===== Display the result

    -- (NOT PART OF THE SOLUTION)

    SELECT @OredData

    The results were:

    Pseudo-cursor:.5 seconds

    Set-based:5.7 seconds

    Not surprising considering how much more calculation has to go into the set-based approach. Of course the advantage of set-base is that it can be more easily incorporated into other set operations and more easily adapted to things like GROUP BY etc., whereas pseudo-cursors generally have to be held out as a separate step.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Overall, I think that I would have to agree that Bitwise aggregates really would be a useful extension to SQL Server, especially if the bitwise operations would fully support binary strings. The bitwise operators are fully Abelian so there are no ordering concerns like there would be with the oft-proposed string concatenation aggregates. Maybe someone should propose it on Connect?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I'm not sure why folks try to use so many bitwise integer columns. It makes the column "multi-purpose" which is not what the definition of a column actually is. It doesn't really save on any code and not sure it lends itself to any performance advantages.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/24/2008)


    I'm not sure why folks try to use so many bitwise integer columns. It makes the column "multi-purpose" which is not what the definition of a column actually is. It doesn't really save on any code and not sure it lends itself to any performance advantages.

    I agree to a point which is "the proper tool for the job".

    In our case, which is very similar to ACLS or the permission settings on files in the Windows file system, objects can have the following permissions: Read, Update, Delete, Create, and Assign (i.e., grant to others). The permissions can be granted at the user level and at the group level. There can be multiple groups and users can belong to more than one group.

    When we have to check permissions on an object we need to combine all of the permissions from all sources.

    So trying to map this into individual permission columns and/or permission rows with all of the 1:M relationships would yield a pretty heavy table along with the DML to manipulate it. Note that we also store "last modified" by and timestamp on the object's permissions settings.

    Yes, I did think of modeling this in a [purist] proper relational model either as permission rows or permission columns with millions and millions of rows. But I opted for the bit mask approach. One of my few "exceptions to the rule".

    So for our application the bit mask works quite well.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Jeff Moden (12/24/2008)


    I'm not sure why folks try to use so many bitwise integer columns. It makes the column "multi-purpose" which is not what the definition of a column actually is.

    "Yeah, but Chris Date says that it's OK." Except on the days when he says that it isn't. 😛

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You know... I've never read one of his books... they any good?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I find his books really irritating. Codd was better than Date, IMHO, but they both have this habit of saying contradictory things all the time. And Date in particular seems to have no regard for practical issues at all.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I find both those things to be true with a lot of authors.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SELECTMAX(SIGN(col1)),

    MAX(SIGN(col2)),

    MAX(SIGN(col3))


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (12/31/2008)


    SELECTMAX(SIGN(col1)),

    MAX(SIGN(col2)),

    MAX(SIGN(col3))

    Simplicity personified.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/25/2008)


    I find both those things to be true with a lot of authors.

    True, but when you're on the seventh edition of your book, and you're still trying to come up with a cogent message that actually has a practical application (I mean - ANY practical application, EVER), it might be worth stepping down off of the high horse and not come across as "I know better than all of you peon idiots"....

    Codd had a LOT of good things to say, and a lot of useful stuff. The 12 "laws" - really useful at focusing on what needs to happen. Boyce Codd Normal form - hoorah yeah, hit the ground running, and storm that beach looking for data. Date, on the other hand, just about worthless, and too whiny to stomach. My opinion of course, so feel free to form your own.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am with you there, Matt. We wouldn't even have NULLs if Codd hadn't fought so hard for them.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Select CAST( MAX(Cast(Col1 as int)) as bit)

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply