December 6, 2010 at 3:02 pm
I have been given a data dump from a 3rd party system we use that allows employees to report work they've done. The data is denormalized and contains a flag that indicates whether or not the employee reported hours for that month. What I am trying to determine is who had a gap of at least one month between reporting. More specifically, I need to find employees who didn't report hours for one or more months, but that period of not reported data is "bookended" with months were data WAS reported. The whole goal of this has to do with managing the licenses for this system (which is used by several thousand people). My data dump contains approximately 28,000 rows, one row per employee with columns for months reported from October of 2009 through September of 2010. And every employee has a value of 0 or 1 for every month, there are no nulls.
Sample Data (using a smaller range of data for example purposes):
if OBJECT_ID('tempdb..#Employee') is not null
drop table #Employee
go
create table #Employee
(EmployeeID int,
Work0110 bit,
Work0210 bit,
Work0310 bit,
Work0410 bit,
Work0510 bit,
Work0610 bit)
insert into #Employee values (1000, 1, 1, 1, 1, 1, 1)
insert into #Employee values (1001, 0, 0, 0, 0, 0, 0)
insert into #Employee values (1002, 0, 0, 1, 1, 1, 1)
insert into #Employee values (1003, 0, 0, 1, 1, 0, 0)
insert into #Employee values (1004, 1, 1, 1, 0, 1, 0)
insert into #Employee values (1005, 1, 0, 0, 0, 1, 1)
insert into #Employee values (1006, 0, 0, 1, 0, 1, 0)
insert into #Employee values (1007, 1, 0, 1, 0, 1, 0)
insert into #Employee values (1008, 1, 0, 0, 0, 0, 0)
insert into #Employee values (1009, 0, 0, 0, 0, 0, 1)
Desired Result:
EmployeeID Work0110 Work0210 Work0310 Work0410 Work0510 Work0610 FLAG
----------- -------- -------- -------- -------- -------- -------- ----
1000 1 1 1 1 1 1 N
1001 0 0 0 0 0 0 N
1002 0 0 1 1 1 1 N
1003 0 0 1 1 0 0 N
1004 1 1 1 0 1 0 Y
1005 1 0 0 0 1 1 Y
1006 0 0 1 0 1 0 Y
1007 1 0 1 0 1 0 Y
1008 1 0 0 0 0 0 N
1009 0 0 0 0 0 1 N
Strategy #1 - Unpivot the data and use ranking function to determine if a 0 is bookended by a 1. Then I realized that wouldn't work because my initial thought was to check if row 1 and row 6 were a 1 and if there was a zero anywhere in the other rows, then flag. But then I would miss employees such as 1004, 1006 and 1007.
Strategy #2 - Using the original data dump as-is (denormalized), use multiple case statements and....and....this is where I just fell on my face. Lunacy. This will never work.
Strategy #3 - Somehow utilize the famous Tally table. But here, I'm simply scratching my head.
I think somewhere in here, I have my answer, but I can't seem to put it all together.
Thank you -
Lisa
December 6, 2010 at 4:09 pm
Probably not the most elegant answer you will receive....
select *,
CASE WHEN
CONVERT(char(1),Work0110)
+CONVERT(char(1),Work0210)
+CONVERT(char(1),Work0310)
+CONVERT(char(1),Work0410)
+CONVERT(char(1),Work0510)
+CONVERT(char(1),Work0610) LIKE '%1%0%1%' THEN 'Y'
ELSE 'N'
END As [FLAG]
from #Employee
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 6, 2010 at 4:12 pm
Solution, Round 1:
if OBJECT_ID('tempdb..#Employee') is not null
drop table #Employee
go
create table #Employee
(EmployeeID int,
Work0110 bit,
Work0210 bit,
Work0310 bit,
Work0410 bit,
Work0510 bit,
Work0610 bit)
insert into #Employee values (1000, 1, 1, 1, 1, 1, 1)
insert into #Employee values (1001, 0, 0, 0, 0, 0, 0)
insert into #Employee values (1002, 0, 0, 1, 1, 1, 1)
insert into #Employee values (1003, 0, 0, 1, 1, 0, 0)
insert into #Employee values (1004, 1, 1, 1, 0, 1, 0)
insert into #Employee values (1005, 1, 0, 0, 0, 1, 1)
insert into #Employee values (1006, 0, 0, 1, 0, 1, 0)
insert into #Employee values (1007, 1, 0, 1, 0, 1, 0)
insert into #Employee values (1008, 1, 0, 0, 0, 0, 0)
insert into #Employee values (1009, 0, 0, 0, 0, 0, 1)
;WITH cte AS
(SELECT
*
from
#Employee
UNPIVOT ( Worked For MonthCol IN ( Work0110, Work0210, Work0310, Work0410, Work0510, Work0610)
) AS drv
)
,cte2 AS
(
SELECT
EmployeeID,
Worked,
CASE MonthCol
WHEN 'Work0110' THEN 1
WHEN 'Work0210' THEN 2
WHEN 'Work0310' THEN 3
WHEN 'Work0410' THEN 4
WHEN 'Work0510' THEN 5
WHEN 'Work0610' THEN 6
END AS OrderingColumn
FROM
cte
)
, cte3 AS
(
SELECT
c2.EmployeeID,
MAX( CASE WHEN drvBefore.EmployeeID IS NOT NULL AND drvAfter.EmployeeID IS NOT NULL
THEN 1
ELSE 0
END) AS Flagged
FROM
cte2 AS c2
LEFT JOIN
(SELECT EmployeeID, OrderingColumn
FROMcte2
WHEREWorked = 1
) AS drvBefore
ONc2.EmployeeID = drvBefore.EmployeeID
AND c2.OrderingColumn > drvBefore.OrderingColumn
LEFT JOIN
(SELECT EmployeeID, OrderingColumn
FROMcte2
WHEREWorked = 1
) AS drvAfter
ONc2.EmployeeID = drvAfter.EmployeeID
AND c2.OrderingColumn < drvAfter.OrderingColumn
WHERE
Worked = 0
GROUP BY
c2.EmployeeID
)
SELECT e.*, c3.Flagged
FROM
#Employee AS e
JOIN
cte3 AS c3
ONe.EmployeeID = c3.EmployeeID
You were on the right track with the unpivot. Now, hopefully someone can come by and clean this up, the triangle joins are terrible.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 4:13 pm
Oh man! That's MUCH cleaner than where I was going! I was capturing the min and max rows (using row_number()) where hours were entered. Then capturing rows (using row_number()) where hours were not entered. Then I was going to see which employees had a not entered month that fell between the min and max of reported months. Etc. Very messy! Thank you so much for this simple solution!!
Lisa
December 6, 2010 at 4:16 pm
mister.magoo (12/6/2010)
Probably not the most elegant answer you will receive....
Just a head's up, that code will fail for EmployeeId 1005.
I need more coffee, or glasses.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 4:18 pm
HI Craig, can you explain why? It produces the correct answer for me...
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 6, 2010 at 4:20 pm
mister.magoo (12/6/2010)
HI Craig, can you explain why? It produces the correct answer for me...
It works just fine. Apologies, very nice solution. I'll be --------------> way for a bit.
I misread the wildcards. Self.smack.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 4:28 pm
No problem. It is ugly code, so by rights should have something wrong!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 6, 2010 at 4:29 pm
Another "ugly" method...just for the sake of it...
select *,
CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END
from #Employee
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
December 6, 2010 at 4:32 pm
mister.magoo (12/6/2010)
Another "ugly" method...just for the sake of it...
select *,
CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END
from #Employee
Ewww.... yeah, I'll agree with you on this one. Definitely "ugly". The first one? Most definitely elegant!
December 6, 2010 at 4:41 pm
mister.magoo (12/6/2010)
Another "ugly" method...just for the sake of it...
select *,
CASE WHEN 32*Work0110+16*Work0210+8*Work0310+4*Work0410+2*Work0510+Work0610 NOT IN (0,1,2,3,4,6,7,8,12,14,15,24,28,30,31,32,48,56,60,62,63) THEN 'Y' ELSE 'N' END
from #Employee
I believe this method is for when your juniors get involved in the process you want a way to say "You do not touch this without talking to me..."
I love it though. Beautifully random use of a bitmask. 😀
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 5:24 pm
CELKO (12/6/2010)
A little change in the DDL makes life more relational. Never sue BIT flags in SQL; that is assembley language.m Alo, MySQL has a nice conventon for numeric momth names based on ISO-8601:
So, to understand this, instead of one byte per row physical storage (6 bits, 1 byte can hold them), you're recommending INT, which is 4 bytes per flag, for 24 bytes per row of storage. Besides the fact that it's "archaic", what does this data expansion bring to the table? Does it run faster? Does it optimize better? I can't see over a million rows that more physical pages being necessary will be helpful.
Even going to tinyint at a byte a piece is 6 bytes to the 1 it takes up.
As per the MySQL, it's a shame that nifty function isn't around for us, but this is the SQL Server 2k8 board. Though I hope you leveraging your knowledge of it will produce a locally usable workaround. Will be interesting to see.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 5:44 pm
Craig Farrell (12/6/2010)
mister.magoo (12/6/2010)
HI Craig, can you explain why? It produces the correct answer for me...It works just fine. Apologies, very nice solution. I'll be --------------> way for a bit.
I misread the wildcards. Self.smack.
No problem, I can beat that. Me.smack.Craig!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
December 6, 2010 at 5:56 pm
WayneS (12/6/2010)
No problem, I can beat that. Me.smack.Craig!
Me.Duck (WayneS.smack.Craig, const_ImitateNeo). 😛 :w00t:
What can I tell ya? Looked too good to be that easy and my eyes played tricks on me.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 6, 2010 at 9:18 pm
mister.magoo (12/6/2010)
Probably not the most elegant answer you will receive....
select *,
CASE WHEN
CONVERT(char(1),Work0110)
+CONVERT(char(1),Work0210)
+CONVERT(char(1),Work0310)
+CONVERT(char(1),Work0410)
+CONVERT(char(1),Work0510)
+CONVERT(char(1),Work0610) LIKE '%1%0%1%' THEN 'Y'
ELSE 'N'
END As [FLAG]
from #Employee
I guess were both not elegant then because that's similar to the way I was going to do it. The only thing I may have done differently is the column names to make it a bit more generic. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply