A questions about summarisation

  • Hi All,

    I have some data which includes a "shift" flag which contains an integer with bits set depending upon which shift/s the data falls into. For instance Prime shift will have the first bit set for a shift value of 1, overnight shift will have the second bit set for a shift value of 2, weekend will have the 3 bit set for a shift value of 4 etc...

    I'd like to summarise this data by shift, which on the face of it should be fairly easy, except that some data can be in more than one shift, for instance a record can be Prime and overnight having both first and second bits set for a shift value of 3.

    My questions is, how to summarise this data such that I get a separate group for each shift, without a grouping for 3, that is I want the data with shift=3 to be included in the Prime and Overnight groups.

    I'm guessing that this is not possible without having to first process the data converting the shift flag into something more discrete.

    Any ideas anyone?

  • Please provide sample data and expected output in a ready to use format as described in the first link in my signature.

    You have visited this side often enough to know better, right? 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Is this something even remotely like you need?

    CREATE TABLE #bitwise

    ( a_int_value int NOT NULL,

    Hours INT NOT NULL);

    INSERT #bitwise --VALUES (1);

    SELECT 1,8 UNION ALL --Prime shift

    SELECT 2,6 UNION ALL --Overnight shift

    SELECT 3,12 --Week end

    DECLARE @Mask INT

    SET @Mask =1

    SELECT SUM(Hours) AS 'Hours',a_int_value & @Mask

    FROM #bitwise GROUP BY a_int_value & @Mask

    /* Output

    Hours(No column name)

    6 0

    20 1

    */

    --Clean up

    DROP TABLE #bitwise

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Kudos to Ron for creating sample data for you.

    I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags. See http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/ for a relevant discussion.

    As implied in your question, your task would be much easier (and often faster) if you were to store each attribute in its own column instead 🙂

    SQL Server also optimises storage for BIT columns - a single byte can store up to eight BIT fields in the same row. The current integer column requires four bytes, complicates processing, and usually precludes useful indexing.

  • To add emphasis to Paul White's comment:

    I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags

    Developers would be sore tempted to extend the "flags" by adding values, say a shift value of 4 for weekend work at an overtime pay rate -- you name the situation it is so easy to for a developer not to understand your problem(s).

    I have expanded the "Shift designators" to provide a more or less worst case" situation as:

    CREATE TABLE #bitwise

    ( id INT IDENTITY(1,1),a_int_value int NOT NULL,Hours INT NOT NULL);

    INSERT #bitwise (a_int_value,Hours)

    SELECT 1,-1 UNION ALL --Prime shift

    SELECT 2,3 UNION ALL --Overnight shift

    SELECT 3,1 UNION ALL --Week end

    -- new shift designators

    SELECT 4,1 UNION ALL

    SELECT 5,-1 UNION ALL

    SELECT 6,1 UNION ALL

    SELECT 7,300 UNION ALL

    SELECT 8,-1 UNION ALL

    SELECT 9,300 UNION ALL

    SELECT 10,-1

    DECLARE @Mask INT

    SET @Mask =1

    SELECT id, Hours,a_int_value & @Mask AS'Mask Result'

    FROM #bitwise

    /* Returns

    idHoursMask Result

    1-11

    230

    311

    410

    5-11

    610

    73001

    8-10

    93001

    10-10

    Further extending the possible complexity of sorting the above all out I

    altered the select statemet to attempt to utilize the additional a_int meaning

    */

    DECLARE @Mask1 INT

    DECLARE @Mask2 INT

    SET @Mask1 =1

    SET @Mask2 = 5

    SELECT id, a_int_value & @Mask1 AS 'Mask1 Result',

    a_int_value & @Mask2 AS 'MAsk2 Result'

    FROM #bitwise

    /* Results:

    idMask1 MAsk2

    111

    200

    311

    404

    515

    604

    715

    800

    911

    1000

    --Clean up

    DROP TABLE #bitwise

    Well enough of my attempt to re-inforce Paul White's statement

    I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Firstly, I must appologise for not including some sample data, while I have visited the site many times, I haven't asked many questions and really just didn't think about it too much yesterday when I posed this one.

    Secondly, thanks to all who are contributing to this discussion.

    The data contains individual records for item production and the shift flag which designates which shift the record falls into, this is done with a bit pattern, bit 1 being shift 1, bit 2 being shift 2 etc. A record can be in more than one shift as indicated by the last two records spanning the prime and evening shifts.

    select1 as shift, --prime 00000001 08:00 - 18:00

    10 as items

    union all

    select1 as shift, --prime 00000001 08:00 - 18:00

    15 as items

    union all

    select2 as shift, --evening 00000010 18:00 - 08:00

    20 as items

    union all

    select2 as shift, --evening 00000010 18:00 - 08:00

    25 as items

    union all

    select3 as shift, -- both prime and evening 00000011 16:00 - 20:00

    30 as items

    union all

    select3 as shift, --both prime and evening 00000011 16:00 - 20:00

    35 as items

    What is required is that the data be summarised by the bit positions so that the peak shift data (the last two records) is included in both the prime and evening shifts and doesn't create a group in its own right.

    I think I've figured it out.

    with shifts as

    (

    select 1 as shift,

    'Prime' as name

    union

    select2 as shift,

    'Evening' as name

    union

    -- these next two records cause the data to be in both Prime and Evening shifts

    select3 as shift,

    'Prime' as name

    union

    select3 as shift,

    'Evening' as name

    ), testdata as

    (

    select1 as shift, --prime 00000001 08:00 - 18:00

    10 as items

    union all

    select1 as shift, --prime 00000001 08:00 - 18:00

    15 as items

    union all

    select2 as shift, --evening 00000010 18:00 - 08:00

    20 as items

    union all

    select2 as shift, --evening 00000010 18:00 - 08:00

    25 as items

    union all

    select3 as shift, -- both prime and evening 00000011 16:00 - 20:00

    30 as items

    union all

    select3 as shift, --both prime and evening 00000011 16:00 - 20:00

    35 as items

    ), classified as

    (

    -- the join creates additional records for the data that falls in multiple shifts

    -- therefor allowing that data to be included in both

    select a.*,b.name

    from testdata a

    left join shifts b

    on a.shift = b.shift

    )

    select * from classified

    --select name,

    --sum(items) as totItems

    --from classified

    --group by name

    The reason for having the shift information encoded into a single field is to provide flexibility when defining shifts. At the moment there are 2 shifts, occasionally there are extra shifts defined for special projects, these may span and include multiple existing shifts. To have separate columns would require a table definition change whenever a new temporary shift was required. Using the single column approach allows for a number of shifts to be defined with one of the following codes 1,2,4,8,16,32 etc. this allows a single record to be in more than one shift for instance if shift=12 then record is in shifts 3(4) and 4(8). Any additional shifts are included by simply adding the shift definition to a table for example Prime shift has the following definition.

    (datepart(hh,date_time) >= 9 and

    datepart(hh,date_time) <= 18 and

    datepart(dw,date_time) in (2,3,4,5,6))

    This code is added to a control table along with the shift flag value of 1. When the source data is read in, we have a function that uses each of the shift definitions to accumulate the shift value, i.e. it tests the datetime value against each of the shift definitions and summs the shift flag value if the condition is true.

    Then summarisation is performed by firstly having a simple CTE

    with classified as

    (

    select a.*,b.name

    from testdata a

    left join shifts b

    on a.shift = b.shift

    )

    which classifies the data.

    Lastly this data comes in batches of around 1,600,000,000 records so I really REALLY wanted to do it all in just one pass, which I think the above does.

    Thanks again for all your input.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply