November 18, 2011 at 9:48 pm
I have a large table of data that needs to be analyzed. This is a simplified example of what the first step I need. I'm trying to create the "Need" column. When there is a new worker assigned to the account, I'd like to indicate this is the Xth person to work on the account. If it passes back to a previous worker, count is increased.
Is there a way to do this without looping, etc? Thanks.
RecordSeqAccountDateRepNeed
1000A100108/01/11Jim1
1001A100108/06/11Jim1
1002A100108/11/11Jim1
1003A100108/16/11Doug2
1004A100108/21/11Stan3
1005A100108/26/11Stan3
1006A100108/31/11Stan3
1007A100109/05/11Stan3
1008A100109/10/11Stan3
1009A100109/15/11Stan3
1010A100109/20/11Stan3
1011A100109/25/11Stan3
1012A100109/30/11Jim4
1013A100110/05/11Jim4
1014A100110/10/11Jim4
1015A100110/15/11Jim4
1016A100110/20/11Jim4
1017A100110/25/11Jim4
1018A100110/30/11Jim4
1019A100111/04/11Doug5
1020A100208/01/11Jim1
1021A100208/06/11Jim1
1022A100208/11/11Jim1
1023A100208/16/11Doug2
1024A100208/21/11Stan3
1025A100208/26/11Jim4
1026A100208/31/11Jim4
1027A100209/05/11Jim4
1028A100209/10/11Jim4
1029A100209/15/11Jim4
1030A100209/20/11Jim4
1031A100209/25/11Jim4
1032A100209/30/11Jim4
1033A100210/05/11Jim4
1034A100210/10/11Jim4
1035A100210/15/11Jim4
1036A100210/20/11Jim4
1037A100210/25/11Jim4
1038A100210/30/11Jim4
1039A100211/04/11Jim4
November 19, 2011 at 9:01 am
How large is "large"? How many rows? Also, is this something that will only need to be done once (so performance isn't as much of an issue) or something that will be queried regularly?
November 19, 2011 at 10:12 am
OK, here's my first stab at it. This is pretty tricky without using a cursor. Hopefully someone else with more experience with similar problems will chime in; this works (with some caveats), but I don't think it will scale well at all. There's got to be a better way of doing this, but I'm stumped at the moment...
-- For testing, create a temp table
CREATE TABLE #Cases
(
RecordSeq INT,
Account CHAR(5) NOT NULL,
ActivityDate DATETIME NOT NULL,
Rep VARCHAR(30) NOT NULL,
PRIMARY KEY (RecordSeq)
);
-- Adding this index converts the majority of scans to seeks
CREATE INDEX Account_ActivityDate ON #Cases (Account, ActivityDate) INCLUDE (Rep)
-- Generate some test data and insert it into the temp table
INSERT INTO #Cases(RecordSeq, Account, ActivityDate, Rep)
SELECT ROW_NUMBER() OVER (ORDER BY Account, ActivityDate) AS RecordSeq,
Account, ActivityDate, Rep
FROM
(
SELECT 'A1001', '8/1/2011', 'Jim' UNION ALL
SELECT 'A1001', '8/2/2011', 'Jim' UNION ALL
SELECT 'A1001', '8/3/2011', 'Jim' UNION ALL
SELECT 'A1001', '8/4/2011', 'Doug' UNION ALL
SELECT 'A1001', '8/5/2011', 'Doug' UNION ALL
SELECT 'A1001', '8/6/2011', 'Jim' UNION ALL
SELECT 'B1002', '8/2/2011', 'Jim' UNION ALL
SELECT 'B1002', '8/3/2011', 'Stan' UNION ALL
SELECT 'B1002', '8/4/2011', 'Stan' UNION ALL
SELECT 'B1002', '8/5/2011', 'Doug' UNION ALL
SELECT 'B1002', '8/6/2011', 'Stan'
) AS d (Account, ActivityDate, Rep)
-- Don't need this to be a CTE, just makes the final query easier to read
;WITH RepChanges
AS
(
SELECT Account, ActivityDate, Rep,
ROW_NUMBER() OVER (PARTITION BY Account ORDER BY ActivityDate) AS ChangeNum
-- (The PARTITION BY resets the numbering for each new Account)
FROM #Cases AS c OUTER APPLY
(
-- Inline TVF to find the rep who last worked on this account
SELECT Rep
FROM #Cases
WHERE Account = c.Account
AND ActivityDate = (SELECT MAX(ActivityDate)
FROM #Cases
WHERE Account = c.Account
AND ActivityDate < c.ActivityDate)
) AS prev (PrevRep)
WHERE Rep != PrevRep OR PrevRep IS NULL-- Return 1 row for each change of rep (including initial row)
)
SELECT c2.*, rc.ChangeNum
FROM #Cases c2 CROSS APPLY
(
-- Inline TVF to return the ChangeNum for each row in #Cases
SELECT MAX(ChangeNum) -- Avoid duplicates when the case is reassigned to a previous rep
FROM RepChanges
WHERE Account = c2.Account
AND Rep = c2.Rep
AND ActivityDate <= c2.ActivityDate
) as rc (ChangeNum)
ORDER BY c2.Account, c2.ActivityDate
DROP TABLE #Cases
DISCLAIMERS:
I'm wondering if a cursor may actually be a better solution here, but I'm not sure. There are some folks here who are absolute wizards at building efficient set-based solutions to problems like this, hopefully they will know of a better way to solve this...
November 19, 2011 at 10:54 am
OK, you really don't want to use my previous code on a large set of data. The total number of seeks generated is roughly equal to the square of the number of rows in the table; that will get out of hand in a hurry. That approach was an interesting exercise, but I'm afraid it's not going to be usable unless your total number of rows in this table is reasonably low. 🙁
November 19, 2011 at 6:54 pm
texpic (11/18/2011)
I have a large table of data that needs to be analyzed. This is a simplified example of what the first step I need. I'm trying to create the "Need" column. When there is a new worker assigned to the account, I'd like to indicate this is the Xth person to work on the account. If it passes back to a previous worker, count is increased.Is there a way to do this without looping, etc? Thanks.
RecordSeqAccountDateRepNeed
1000A100108/01/11Jim1
1001A100108/06/11Jim1
1002A100108/11/11Jim1
1003A100108/16/11Doug2
1004A100108/21/11Stan3
1005A100108/26/11Stan3
1006A100108/31/11Stan3
1007A100109/05/11Stan3
1008A100109/10/11Stan3
1009A100109/15/11Stan3
1010A100109/20/11Stan3
1011A100109/25/11Stan3
1012A100109/30/11Jim4
1013A100110/05/11Jim4
1014A100110/10/11Jim4
1015A100110/15/11Jim4
1016A100110/20/11Jim4
1017A100110/25/11Jim4
1018A100110/30/11Jim4
1019A100111/04/11Doug5
1020A100208/01/11Jim1
1021A100208/06/11Jim1
1022A100208/11/11Jim1
1023A100208/16/11Doug2
1024A100208/21/11Stan3
1025A100208/26/11Jim4
1026A100208/31/11Jim4
1027A100209/05/11Jim4
1028A100209/10/11Jim4
1029A100209/15/11Jim4
1030A100209/20/11Jim4
1031A100209/25/11Jim4
1032A100209/30/11Jim4
1033A100210/05/11Jim4
1034A100210/10/11Jim4
1035A100210/15/11Jim4
1036A100210/20/11Jim4
1037A100210/25/11Jim4
1038A100210/30/11Jim4
1039A100211/04/11Jim4
I'm confused because of the NEED column you have posted. Is what you have posted what you currently have or what you want for a final result?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2011 at 6:56 pm
Never mind... I see it now.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2011 at 7:00 pm
If you use JonFox' sample data...
SELECT RecordSeq, Account, ActivityDate, Rep,dense_rank() OVER(PARTITION BY Account ORDER BY rep)
from #Cases
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 19, 2011 at 7:08 pm
mister.magoo (11/19/2011)
If you use JonFox' sample data...
SELECT RecordSeq, Account, ActivityDate, Rep,dense_rank() OVER(PARTITION BY Account ORDER BY rep)
from #Cases
If I'm understanding the OP correctly, that's not quite what he's looking for.
texpic
If it passes back to a previous worker, count is increased.
That would seem to mean that for Account A1001, Jim should have "1" for the first three rows, then Doug should have "2" for the next two, then when it gets assigned back to Jim it should change to "3".
November 19, 2011 at 7:20 pm
JonFox (11/19/2011)
mister.magoo (11/19/2011)
If you use JonFox' sample data...
SELECT RecordSeq, Account, ActivityDate, Rep,dense_rank() OVER(PARTITION BY Account ORDER BY rep)
from #Cases
If I'm understanding the OP correctly, that's not quite what he's looking for.
texpic
If it passes back to a previous worker, count is increased.That would seem to mean that for Account A1001, Jim should have "1" for the first three rows, then Doug should have "2" for the next two, then when it gets assigned back to Jim it should change to "3".
yes you are right - I misread it - i should probably sleep -thanks
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
November 19, 2011 at 8:12 pm
First the test data:
IF OBJECT_ID('TempDB..#Table') IS NOT NULL
DROP TABLE #Table;
CREATE TABLE #Table
(
RecordSeq INT PRIMARY KEY CLUSTERED
,Account VARCHAR(10)
,Date DATE
,Rep VARCHAR(10)
,ExpectedResult INT
);
insert #Table
SELECT 1000 ,'A1001', '08/01/11' ,'Jim', 1
UNION ALL SELECT 1001 ,'A1001', '08/06/11' ,'Jim', 1
UNION ALL SELECT 1002 ,'A1001', '08/11/11' ,'Jim', 1
UNION ALL SELECT 1003 ,'A1001', '08/16/11' ,'Doug', 2
UNION ALL SELECT 1004 ,'A1001', '08/21/11' ,'Stan', 3
UNION ALL SELECT 1005 ,'A1001', '08/26/11' ,'Stan', 3
UNION ALL SELECT 1006 ,'A1001', '08/31/11' ,'Stan', 3
UNION ALL SELECT 1007 ,'A1001', '09/05/11' ,'Stan', 3
UNION ALL SELECT 1008 ,'A1001', '09/10/11' ,'Stan', 3
UNION ALL SELECT 1009 ,'A1001', '09/15/11' ,'Stan', 3
UNION ALL SELECT 1010 ,'A1001', '09/20/11' ,'Stan', 3
UNION ALL SELECT 1011 ,'A1001', '09/25/11' ,'Stan', 3
UNION ALL SELECT 1012 ,'A1001', '09/30/11' ,'Jim', 4
UNION ALL SELECT 1013 ,'A1001','10/05/11' ,'Jim', 4
UNION ALL SELECT 1014 ,'A1001','10/10/11' ,'Jim', 4
UNION ALL SELECT 1015 ,'A1001','10/15/11' ,'Jim', 4
UNION ALL SELECT 1016 ,'A1001','10/20/11' ,'Jim', 4
UNION ALL SELECT 1017 ,'A1001','10/25/11' ,'Jim', 4
UNION ALL SELECT 1018 ,'A1001','10/30/11' ,'Jim', 4
UNION ALL SELECT 1019 ,'A1001','11/04/11' ,'Doug', 5
UNION ALL SELECT 1020 ,'A1002', '08/01/11' ,'Jim', 1
UNION ALL SELECT 1021 ,'A1002', '08/06/11' ,'Jim', 1
UNION ALL SELECT 1022 ,'A1002', '08/11/11' ,'Jim', 1
UNION ALL SELECT 1023 ,'A1002', '08/16/11' ,'Doug', 2
UNION ALL SELECT 1024 ,'A1002', '08/21/11' ,'Stan', 3
UNION ALL SELECT 1025 ,'A1002', '08/26/11' ,'Jim', 4
UNION ALL SELECT 1026 ,'A1002', '08/31/11' ,'Jim', 4
UNION ALL SELECT 1027 ,'A1002', '09/05/11' ,'Jim', 4
UNION ALL SELECT 1028 ,'A1002', '09/10/11' ,'Jim', 4
UNION ALL SELECT 1029 ,'A1002', '09/15/11' ,'Jim', 4
UNION ALL SELECT 1030 ,'A1002', '09/20/11' ,'Jim', 4
UNION ALL SELECT 1031 ,'A1002', '09/25/11' ,'Jim', 4
UNION ALL SELECT 1032 ,'A1002', '09/30/11' ,'Jim', 4
UNION ALL SELECT 1033 ,'A1002','10/05/11' ,'Jim', 4
UNION ALL SELECT 1034 ,'A1002','10/10/11' ,'Jim', 4
UNION ALL SELECT 1035 ,'A1002','10/15/11' ,'Jim', 4
UNION ALL SELECT 1036 ,'A1002','10/20/11' ,'Jim', 4
UNION ALL SELECT 1037 ,'A1002','10/25/11' ,'Jim', 4
UNION ALL SELECT 1038 ,'A1002','10/30/11' ,'Jim', 4
UNION ALL SELECT 1039 ,'A1002','11/04/11' ,'Jim', 4;
CREATE NONCLUSTERED INDEX NIX_Table_Acct_Date
ON #Table( Account , Date) INCLUDE ( Rep );
Now the code:
WITH NumberedData AS
(
SELECT *
, RN1 = ROW_NUMBER() OVER ( PARTITION BY Account,Rep ORDER BY RecordSeq )
, RN2 = ROW_NUMBER() OVER ( PARTITION BY Account ORDER BY RecordSeq )
FROM #Table
)
, MaxRecNumber AS
(
SELECT RecordSeq, Account, Rep , Date , ExpectedResult
, RMax = MAX(RecordSeq) OVER (PARTITION BY Account,Rep,(RN1-RN2))
FROM NumberedData
)
SELECT * , ActualResult = DENSE_RANK() OVER(PARTITION BY Account ORDER BY RMax)
FROM MaxRecNumber
ORDER BY RecordSeq
November 19, 2011 at 9:54 pm
First, nicely done there Cold Coffee. I came up with an almost identical solution and saw yours when I was getting ready to post it. Your solution uses MAX() OVER (brilliant idea, BTW) which I believe is going to be more effecient than the method I'm using to calculate a pseudo-group number because I use an extra ROW_NUMBER() OVER to do it. That's provided that the RecordSeq column always reflects the correct order by Account and Date. If it doesn't, then there will be a problem. Replace the 8/6 date for the second row of "Jim" with 9/6 and see what I mean.
Here's the test data setup using the OP's original data...
--=======================================================================================
-- Create and populate a test table from the data given in the post.
-- Note that NOTHING in this section is a part of the solution. We're just creating
-- test data to demonstrate the solution here.
--=======================================================================================
--===== Conditionally drop the test table to make reruns in SSMS easier
IF OBJeCT_ID('TempDB..#TestTable','U') IS NOT NULL DROP TABLE #TestTable;
GO
--===== Create and populate the test table on the fly.
SELECT RecordSeq = ISNULL(CAST(d.RecordSeq AS INT),0), --ISNULL makes a NOT NULL column
Account = CAST(d.Account AS CHAR(5)),
Date = CAST(d.Date AS DATETIME),
Rep = CAST(d.Rep AS VARCHAR(10))
INTO #TestTable
FROM (
SELECT '1000','A1001','08/01/11','Jim' UNION ALL
SELECT '1001','A1001','08/06/11','Jim' UNION ALL
SELECT '1002','A1001','08/11/11','Jim' UNION ALL
SELECT '1003','A1001','08/16/11','Doug' UNION ALL
SELECT '1004','A1001','08/21/11','Stan' UNION ALL
SELECT '1005','A1001','08/26/11','Stan' UNION ALL
SELECT '1006','A1001','08/31/11','Stan' UNION ALL
SELECT '1007','A1001','09/05/11','Stan' UNION ALL
SELECT '1008','A1001','09/10/11','Stan' UNION ALL
SELECT '1009','A1001','09/15/11','Stan' UNION ALL
SELECT '1010','A1001','09/20/11','Stan' UNION ALL
SELECT '1011','A1001','09/25/11','Stan' UNION ALL
SELECT '1012','A1001','09/30/11','Jim' UNION ALL
SELECT '1013','A1001','10/05/11','Jim' UNION ALL
SELECT '1014','A1001','10/10/11','Jim' UNION ALL
SELECT '1015','A1001','10/15/11','Jim' UNION ALL
SELECT '1016','A1001','10/20/11','Jim' UNION ALL
SELECT '1017','A1001','10/25/11','Jim' UNION ALL
SELECT '1018','A1001','10/30/11','Jim' UNION ALL
SELECT '1019','A1001','11/04/11','Doug' UNION ALL
SELECT '1020','A1002','08/01/11','Jim' UNION ALL
SELECT '1021','A1002','08/06/11','Jim' UNION ALL
SELECT '1022','A1002','08/11/11','Jim' UNION ALL
SELECT '1023','A1002','08/16/11','Doug' UNION ALL
SELECT '1024','A1002','08/21/11','Stan' UNION ALL
SELECT '1025','A1002','08/26/11','Jim' UNION ALL
SELECT '1026','A1002','08/31/11','Jim' UNION ALL
SELECT '1027','A1002','09/05/11','Jim' UNION ALL
SELECT '1028','A1002','09/10/11','Jim' UNION ALL
SELECT '1029','A1002','09/15/11','Jim' UNION ALL
SELECT '1030','A1002','09/20/11','Jim' UNION ALL
SELECT '1031','A1002','09/25/11','Jim' UNION ALL
SELECT '1032','A1002','09/30/11','Jim' UNION ALL
SELECT '1033','A1002','10/05/11','Jim' UNION ALL
SELECT '1034','A1002','10/10/11','Jim' UNION ALL
SELECT '1035','A1002','10/15/11','Jim' UNION ALL
SELECT '1036','A1002','10/20/11','Jim' UNION ALL
SELECT '1037','A1002','10/25/11','Jim' UNION ALL
SELECT '1038','A1002','10/30/11','Jim' UNION ALL
SELECT '1039','A1002','11/04/11','Jim'
) d (RecordSeq,Account,Date,Rep)
;
--===== Add what seems to be the PK of this table
ALTER TABLE #TestTable
ADD PRIMARY KEY CLUSTERED (RecordSeq)
;
Here's my solution to the problem. Like I said, almost identical but will withstand out-of-order dates.
--=======================================================================================
-- Solve the problem using cascaded list sequencing to create groupings along the
-- way and then correctly number the groupings to produce the NEED column
--=======================================================================================
WITH
cteEnumerate AS
(
SELECT *,
Sequence1 = ROW_NUMBER() OVER (PARTITION BY Account ORDER BY Account, Date, Rep),
Sequence2 = ROW_NUMBER() OVER (PARTITION BY Account, Rep ORDER BY Account, Date, Rep)
FROM #TestTable
)
,
cteOrderedGroups AS
(
SELECT RecordSeq, Account, Date, Rep,
OrderedGroup = Sequence1 - ROW_NUMBER() OVER (PARTITION BY Account, (Sequence1-Sequence2) ORDER BY Sequence1)
FROM cteEnumerate
)
SELECT RecordSeq, Account, Date, Rep,
Need = DENSE_RANK() OVER (PARTITION BY Account ORDER BY OrderedGroup)
FROM cteOrderedGroups
ORDER BY Account, Date
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 19, 2011 at 11:43 pm
Jeff Moden (11/19/2011)
First, nicely done there Cold Coffee. I came up with an almost identical solution and saw yours when I was getting ready to post it. Your solution uses MAX() OVER (brilliant idea, BTW) which I believe is going to be more effecient than the method I'm using to calculate a pseudo-group number because I use an extra ROW_NUMBER() OVER to do it.
Thanks Jeff; this means a lot, really a lot, to me 🙂
Jeff Moden (11/19/2011)
That's provided that the RecordSeq column always reflects the correct order by Account and Date. If it doesn't, then there will be a problem. Replace the 8/6 date for the second row of "Jim" with 9/6 and see what I mean
Yeah, i see your point Jeff. But i assumed that RecordSeq will move forward sequentially with date and for every block it will be in incremental order. The OP told us "When there is a new worker assigned to the account, I'd like to indicate this is the Xth person to work on the account" which i assumed to be in chronological.
November 20, 2011 at 12:31 am
Understood. I never assume the OP's description is correct in their request and I try to protect them as best I can when I think I see a less than bullet-proof set of requirements. I'll do the same thing at work when given requirements that I don't agree with except I'll get the requirements changed there so cuffs-match-collar between the code, the documentation, and the requirements. 🙂 I'll admit that it sometimes takes a bit of "what if" mind reading, though. :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2011 at 6:39 am
I knew there had to be a better way than my original attempt! Looking forward to spending some time later today digging in and understanding how these solutions work...
November 20, 2011 at 8:12 am
JonFox (11/20/2011)
I knew there had to be a better way than my original attempt! Looking forward to spending some time later today digging in and understanding how these solutions work...
This site has more than a million members. Only 4 of us responded to this thread and you were the first of the 4 to crank out some test data on this thread which is incredibly important IMHO on any of these threads. You took a great shot at a possible solution and then you were humble enough to report a potential performance problem in your own code.
Whether your solution was effecient or not, my hat is off to you for trying, Jon. Please don't ever change.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply