April 11, 2010 at 5:25 pm
Hi All,
I have some data which includes a "shift" flag which contains an integer with bits set depending upon which shift/s the data falls into. For instance Prime shift will have the first bit set for a shift value of 1, overnight shift will have the second bit set for a shift value of 2, weekend will have the 3 bit set for a shift value of 4 etc...
I'd like to summarise this data by shift, which on the face of it should be fairly easy, except that some data can be in more than one shift, for instance a record can be Prime and overnight having both first and second bits set for a shift value of 3.
My questions is, how to summarise this data such that I get a separate group for each shift, without a grouping for 3, that is I want the data with shift=3 to be included in the Prime and Overnight groups.
I'm guessing that this is not possible without having to first process the data converting the shift flag into something more discrete.
Any ideas anyone?
April 11, 2010 at 6:50 pm
Please provide sample data and expected output in a ready to use format as described in the first link in my signature.
You have visited this side often enough to know better, right? 😉
April 11, 2010 at 6:51 pm
Is this something even remotely like you need?
CREATE TABLE #bitwise
( a_int_value int NOT NULL,
Hours INT NOT NULL);
INSERT #bitwise --VALUES (1);
SELECT 1,8 UNION ALL --Prime shift
SELECT 2,6 UNION ALL --Overnight shift
SELECT 3,12 --Week end
DECLARE @Mask INT
SET @Mask =1
SELECT SUM(Hours) AS 'Hours',a_int_value & @Mask
FROM #bitwise GROUP BY a_int_value & @Mask
/* Output
Hours(No column name)
6 0
20 1
*/
--Clean up
DROP TABLE #bitwise
April 12, 2010 at 1:03 am
Kudos to Ron for creating sample data for you.
I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags. See http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/ for a relevant discussion.
As implied in your question, your task would be much easier (and often faster) if you were to store each attribute in its own column instead 🙂
SQL Server also optimises storage for BIT columns - a single byte can store up to eight BIT fields in the same row. The current integer column requires four bytes, complicates processing, and usually precludes useful indexing.
April 12, 2010 at 6:49 am
To add emphasis to Paul White's comment:
I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags
Developers would be sore tempted to extend the "flags" by adding values, say a shift value of 4 for weekend work at an overtime pay rate -- you name the situation it is so easy to for a developer not to understand your problem(s).
I have expanded the "Shift designators" to provide a more or less worst case" situation as:
CREATE TABLE #bitwise
( id INT IDENTITY(1,1),a_int_value int NOT NULL,Hours INT NOT NULL);
INSERT #bitwise (a_int_value,Hours)
SELECT 1,-1 UNION ALL --Prime shift
SELECT 2,3 UNION ALL --Overnight shift
SELECT 3,1 UNION ALL --Week end
-- new shift designators
SELECT 4,1 UNION ALL
SELECT 5,-1 UNION ALL
SELECT 6,1 UNION ALL
SELECT 7,300 UNION ALL
SELECT 8,-1 UNION ALL
SELECT 9,300 UNION ALL
SELECT 10,-1
DECLARE @Mask INT
SET @Mask =1
SELECT id, Hours,a_int_value & @Mask AS'Mask Result'
FROM #bitwise
/* Returns
idHoursMask Result
1-11
230
311
410
5-11
610
73001
8-10
93001
10-10
Further extending the possible complexity of sorting the above all out I
altered the select statemet to attempt to utilize the additional a_int meaning
*/
DECLARE @Mask1 INT
DECLARE @Mask2 INT
SET @Mask1 =1
SET @Mask2 = 5
SELECT id, a_int_value & @Mask1 AS 'Mask1 Result',
a_int_value & @Mask2 AS 'MAsk2 Result'
FROM #bitwise
/* Results:
idMask1 MAsk2
111
200
311
404
515
604
715
800
911
1000
--Clean up
DROP TABLE #bitwise
Well enough of my attempt to re-inforce Paul White's statement
I have to say though, that this is usually a poor design idea - storing more than one attribute in a single column using flags
April 12, 2010 at 5:19 pm
Firstly, I must appologise for not including some sample data, while I have visited the site many times, I haven't asked many questions and really just didn't think about it too much yesterday when I posed this one.
Secondly, thanks to all who are contributing to this discussion.
The data contains individual records for item production and the shift flag which designates which shift the record falls into, this is done with a bit pattern, bit 1 being shift 1, bit 2 being shift 2 etc. A record can be in more than one shift as indicated by the last two records spanning the prime and evening shifts.
select1 as shift, --prime 00000001 08:00 - 18:00
10 as items
union all
select1 as shift, --prime 00000001 08:00 - 18:00
15 as items
union all
select2 as shift, --evening 00000010 18:00 - 08:00
20 as items
union all
select2 as shift, --evening 00000010 18:00 - 08:00
25 as items
union all
select3 as shift, -- both prime and evening 00000011 16:00 - 20:00
30 as items
union all
select3 as shift, --both prime and evening 00000011 16:00 - 20:00
35 as items
What is required is that the data be summarised by the bit positions so that the peak shift data (the last two records) is included in both the prime and evening shifts and doesn't create a group in its own right.
I think I've figured it out.
with shifts as
(
select 1 as shift,
'Prime' as name
union
select2 as shift,
'Evening' as name
union
-- these next two records cause the data to be in both Prime and Evening shifts
select3 as shift,
'Prime' as name
union
select3 as shift,
'Evening' as name
), testdata as
(
select1 as shift, --prime 00000001 08:00 - 18:00
10 as items
union all
select1 as shift, --prime 00000001 08:00 - 18:00
15 as items
union all
select2 as shift, --evening 00000010 18:00 - 08:00
20 as items
union all
select2 as shift, --evening 00000010 18:00 - 08:00
25 as items
union all
select3 as shift, -- both prime and evening 00000011 16:00 - 20:00
30 as items
union all
select3 as shift, --both prime and evening 00000011 16:00 - 20:00
35 as items
), classified as
(
-- the join creates additional records for the data that falls in multiple shifts
-- therefor allowing that data to be included in both
select a.*,b.name
from testdata a
left join shifts b
on a.shift = b.shift
)
select * from classified
--select name,
--sum(items) as totItems
--from classified
--group by name
The reason for having the shift information encoded into a single field is to provide flexibility when defining shifts. At the moment there are 2 shifts, occasionally there are extra shifts defined for special projects, these may span and include multiple existing shifts. To have separate columns would require a table definition change whenever a new temporary shift was required. Using the single column approach allows for a number of shifts to be defined with one of the following codes 1,2,4,8,16,32 etc. this allows a single record to be in more than one shift for instance if shift=12 then record is in shifts 3(4) and 4(8). Any additional shifts are included by simply adding the shift definition to a table for example Prime shift has the following definition.
(datepart(hh,date_time) >= 9 and
datepart(hh,date_time) <= 18 and
datepart(dw,date_time) in (2,3,4,5,6))
This code is added to a control table along with the shift flag value of 1. When the source data is read in, we have a function that uses each of the shift definitions to accumulate the shift value, i.e. it tests the datetime value against each of the shift definitions and summs the shift flag value if the condition is true.
Then summarisation is performed by firstly having a simple CTE
with classified as
(
select a.*,b.name
from testdata a
left join shifts b
on a.shift = b.shift
)
which classifies the data.
Lastly this data comes in batches of around 1,600,000,000 records so I really REALLY wanted to do it all in just one pass, which I think the above does.
Thanks again for all your input.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply