A questions about summarisation

Question

Post reply

A questions about summarisation

Adrian Heald

SSCommitted

Points: 1754
More actions
April 11, 2010 at 5:25 pm

#231494

Hi All,
I have some data which includes a "shift" flag which contains an integer with bits set depending upon which shift/s the data falls into. For instance Prime shift will have the first bit set for a shift value of 1, overnight shift will have the second bit set for a shift value of 2, weekend will have the 3 bit set for a shift value of 4 etc...
I'd like to summarise this data by shift, which on the face of it should be fairly easy, except that some data can be in more than one shift, for instance a record can be Prime and overnight having both first and second bits set for a shift value of 3.
My questions is, how to summarise this data such that I get a separate group for each shift, without a grouping for 3, that is I want the data with shift=3 to be included in the Prime and Overnight groups.
I'm guessing that this is not possible without having to first process the data converting the shift flag into something more discrete.
Any ideas anyone?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

Please provide sample data and expected output in a ready to use format as described in the first link in my signature.

You have visited this side often enough to know better, right? 😉

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

Is this something even remotely like you need?

CREATE TABLE #bitwise

( a_int_value int NOT NULL,

Hours INT NOT NULL);

INSERT #bitwise --VALUES (1);

SELECT 1,8 UNION ALL --Prime shift

SELECT 2,6 UNION ALL --Overnight shift

SELECT 3,12 --Week end

DECLARE @Mask INT

SET @Mask =1

SELECT SUM(Hours) AS 'Hours',a_int_value & @Mask

FROM #bitwise GROUP BY a_int_value & @Mask

/* Output

Hours(No column name)

6 0

20 1

*/

--Clean up

DROP TABLE #bitwise

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 3

Kudos to Ron for creating sample data for you.

I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags. See http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/ for a relevant discussion.

As implied in your question, your task would be much easier (and often faster) if you were to store each attribute in its own column instead 🙂

SQL Server also optimises storage for BIT columns - a single byte can store up to eight BIT fields in the same row. The current integer column requires four bytes, complicates processing, and usually precludes useful indexing.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Ron McCullough SSC Guru Points: 63877 More actions · Answer 4

To add emphasis to Paul White's comment:

I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags

Developers would be sore tempted to extend the "flags" by adding values, say a shift value of 4 for weekend work at an overtime pay rate -- you name the situation it is so easy to for a developer not to understand your problem(s).

I have expanded the "Shift designators" to provide a more or less worst case" situation as:

CREATE TABLE #bitwise

( id INT IDENTITY(1,1),a_int_value int NOT NULL,Hours INT NOT NULL);

INSERT #bitwise (a_int_value,Hours)

SELECT 1,-1 UNION ALL --Prime shift

SELECT 2,3 UNION ALL --Overnight shift

SELECT 3,1 UNION ALL --Week end

-- new shift designators

SELECT 4,1 UNION ALL

SELECT 5,-1 UNION ALL

SELECT 6,1 UNION ALL

SELECT 7,300 UNION ALL

SELECT 8,-1 UNION ALL

SELECT 9,300 UNION ALL

SELECT 10,-1

DECLARE @Mask INT

SET @Mask =1

SELECT id, Hours,a_int_value & @Mask AS'Mask Result'

FROM #bitwise

/* Returns

idHoursMask Result

1-11

230

311

410

5-11

610

73001

8-10

93001

10-10

Further extending the possible complexity of sorting the above all out I

altered the select statemet to attempt to utilize the additional a_int meaning

*/

DECLARE @Mask1 INT

DECLARE @Mask2 INT

SET @Mask1 =1

SET @Mask2 = 5

SELECT id, a_int_value & @Mask1 AS 'Mask1 Result',

a_int_value & @Mask2 AS 'MAsk2 Result'

FROM #bitwise

/* Results:

idMask1 MAsk2

111

200

311

404

515

604

715

800

911

1000

--Clean up

DROP TABLE #bitwise

Well enough of my attempt to re-inforce Paul White's statement

I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Adrian Heald SSCommitted Points: 1754 More actions · Answer 5

Firstly, I must appologise for not including some sample data, while I have visited the site many times, I haven't asked many questions and really just didn't think about it too much yesterday when I posed this one.

Secondly, thanks to all who are contributing to this discussion.

The data contains individual records for item production and the shift flag which designates which shift the record falls into, this is done with a bit pattern, bit 1 being shift 1, bit 2 being shift 2 etc. A record can be in more than one shift as indicated by the last two records spanning the prime and evening shifts.

select1 as shift, --prime 00000001 08:00 - 18:00

10 as items

union all

select1 as shift, --prime 00000001 08:00 - 18:00

15 as items

union all

select2 as shift, --evening 00000010 18:00 - 08:00

20 as items

union all

select2 as shift, --evening 00000010 18:00 - 08:00

25 as items

union all

select3 as shift, -- both prime and evening 00000011 16:00 - 20:00

30 as items

union all

select3 as shift, --both prime and evening 00000011 16:00 - 20:00

35 as items

What is required is that the data be summarised by the bit positions so that the peak shift data (the last two records) is included in both the prime and evening shifts and doesn't create a group in its own right.

I think I've figured it out.

with shifts as

(

select 1 as shift,

'Prime' as name

union

select2 as shift,

'Evening' as name

union

-- these next two records cause the data to be in both Prime and Evening shifts

select3 as shift,

'Prime' as name

union

select3 as shift,

'Evening' as name

), testdata as

(

select1 as shift, --prime 00000001 08:00 - 18:00

10 as items

union all

select1 as shift, --prime 00000001 08:00 - 18:00

15 as items

union all

select2 as shift, --evening 00000010 18:00 - 08:00

20 as items

union all

select2 as shift, --evening 00000010 18:00 - 08:00

25 as items

union all

select3 as shift, -- both prime and evening 00000011 16:00 - 20:00

30 as items

union all

select3 as shift, --both prime and evening 00000011 16:00 - 20:00

35 as items

), classified as

(

-- the join creates additional records for the data that falls in multiple shifts

-- therefor allowing that data to be included in both

select a.*,b.name

from testdata a

left join shifts b

on a.shift = b.shift

)

select * from classified

--select name,

--sum(items) as totItems

--from classified

--group by name

The reason for having the shift information encoded into a single field is to provide flexibility when defining shifts. At the moment there are 2 shifts, occasionally there are extra shifts defined for special projects, these may span and include multiple existing shifts. To have separate columns would require a table definition change whenever a new temporary shift was required. Using the single column approach allows for a number of shifts to be defined with one of the following codes 1,2,4,8,16,32 etc. this allows a single record to be in more than one shift for instance if shift=12 then record is in shifts 3(4) and 4(8). Any additional shifts are included by simply adding the shift definition to a table for example Prime shift has the following definition.

(datepart(hh,date_time) >= 9 and

datepart(hh,date_time) <= 18 and

datepart(dw,date_time) in (2,3,4,5,6))

This code is added to a control table along with the shift flag value of 1. When the source data is read in, we have a function that uses each of the shift definitions to accumulate the shift value, i.e. it tests the datetime value against each of the shift definitions and summs the shift flag value if the condition is true.

Then summarisation is performed by firstly having a simple CTE

with classified as

(

select a.*,b.name

from testdata a

left join shifts b

on a.shift = b.shift

)

which classifies the data.

Lastly this data comes in batches of around 1,600,000,000 records so I really REALLY wanted to do it all in just one pass, which I think the above does.

Thanks again for all your input.