June 15, 2015 at 5:52 pm
My data looks like this:
I need to calculate the last two columns (noofgrp and grpsize) No of Groups (count of Clientid) and Group Size (number of clients in each group) according to begtim and endtime. So I tried the following in the first Temp table
GrpSize= count(clientid) over (partition by begtime,endtime) else 0 end
and in the second Temp Table, I have
select
,GrpSize=sum(grpsize)
,NoofGrp=count(distinct grpsize)
From Temp1
The issue is for the date of 5/26, the begtime and endtime are not consistent. in Grp1 (group 1) all clients starts the session at 1030 and ends at 1200 (90 minutes session) except one who starts at 11 and end at 1200 (row 8). For this client since his/her endtime is the same as others, I want that client to be in the first group(Grp1). Reverse is true for the second group (Grp2). All clients begtime is 12:30 and endtime is 1400 but clientid=2 (row 9) who begtime =1230 but endtime = 1300. However, since this client begtime is the same as the rest, I wan that client to be in the second group (grp2) My partition over creates 4 groups rather than two. Any suggestions?
Thank You,
Helal
June 15, 2015 at 10:54 pm
Please post consumable data -- in the form of CREATE TABLE/INSERT statements or a SELECT statement with UNIONS...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
June 16, 2015 at 3:26 am
pietlinden (6/15/2015)
http://www.sqlservercentral.com/articles/Best+Practices/61537/%5B/quote%5D
I've generated some test data based on your attachment for people to use:
USE tempdb;
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES('20150506',1,'10:30','12:00'),
('20150506',2,'10:30','12:00'),
('20150506',4,'10:30','12:00'),
('20150506',5,'10:30','12:00'),
('20150506',6,'10:30','12:00'),
('20150506',8,'10:30','12:00'),
('20150506',10,'11:00','12:00'),
('20150506',2,'12:30','13:00'),
('20150506',3,'12:30','14:00'),
('20150506',7,'12:30','14:00'),
('20150506',9,'12:30','14:00'),
('20150507',10,'12:30','14:00');
One question I have, you've included client 10 from 20150507 in "Group2" even though its on a different day. Is there a business reason for this or just a typo?
June 16, 2015 at 6:01 am
Sure it can be done more efficiently than this, but this appears to work:
WITH AddRank
AS
(
SELECTGT.DT,
GT.ClientID,
GT.BegTime,
GT.EndTime,
GT.Duration,
--Remove Partition by DT if you want times grouped regardless of day)
BegRank = DENSE_RANK() OVER (PARTITION BY GT.DT ORDER BY GT.BegTime),
EndRank = DENSE_RANK() OVER (PARTITION BY GT.DT ORDER BY GT.EndTime)
FROM@GroupTestData AS GT
),
AddCount
AS
(
SELECTAR.DT,
AR.ClientID,
AR.BegTime,
AR.EndTime,
AR.Duration,
AR.BegRank,
AR.EndRank,
BegCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.DT, AR.BegRank),
EndCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.DT, AR.EndRank)
FROMAddRank AS AR
),
DeriveGroupCount
AS
(
SELECTAT.DT,
AT.ClientID,
AT.BegTime,
AT.EndTime,
AT.Duration,
AT.BegRank,
AT.EndRank,
AT.BegCount,
AT.EndCount,
GroupCount = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndCount
WHEN AT.BegCount > AT.EndCount THEN AT.BegCount
ELSE AT.BegCount
END,
GroupRank = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndRank
WHEN AT.BegCount > AT.EndCount THEN AT.BegRank
ELSE AT.EndRank
END
FROMAddCount AS AT
)
SELECTDG.DT,
DG.ClientID,
DG.BegTime,
DG.EndTime,
DG.Duration,
GroupNum = 'Group' + CAST(DENSE_RANK() OVER (PARTITION BY DG.DT ORDER BY DG.GroupRank) AS VARCHAR),
DG.GroupCount
FROMDeriveGroupCount AS DG;
I've using the windows functions partitioning by the day, removing this gets the results you put as expected, but wasn't sure if that was a typo or not
June 16, 2015 at 4:04 pm
Thank you for sharing the code. For most part, the code is working since I can get no of groups from your GroupNum column. However, GrpSize (GroupCount) should be 7 for the Group1 and 5 for Group2. Yours come up like:
DTClientIDBegTimeEndTimeDurationGroupNumGroupCount
2015-05-06 00:00:00.000110:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000210:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000410:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000510:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000610:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000810:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.0001011:00:00.000000012:00:00.000000060Group17
2015-05-06 00:00:00.000212:30:00.000000013:00:00.000000030Group24
2015-05-06 00:00:00.000312:30:00.000000014:00:00.000000090Group24
2015-05-06 00:00:00.000712:30:00.000000014:00:00.000000090Group24
2015-05-06 00:00:00.000912:30:00.000000014:00:00.000000090Group24
2015-05-07 00:00:00.0001012:30:00.000000014:00:00.000000090Group11
Thank You,
Helal
June 17, 2015 at 1:14 am
helal.mobasher 13209 (6/16/2015)
Thank you for sharing the code. For most part, the code is working since I can get no of groups from your GroupNum column. However, GrpSize (GroupCount) should be 7 for the Group1 and 5 for Group2. Yours come up like:DTClientIDBegTimeEndTimeDurationGroupNumGroupCount
2015-05-06 00:00:00.000110:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000210:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000410:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000510:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000610:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.000810:30:00.000000012:00:00.000000090Group17
2015-05-06 00:00:00.0001011:00:00.000000012:00:00.000000060Group17
2015-05-06 00:00:00.000212:30:00.000000013:00:00.000000030Group24
2015-05-06 00:00:00.000312:30:00.000000014:00:00.000000090Group24
2015-05-06 00:00:00.000712:30:00.000000014:00:00.000000090Group24
2015-05-06 00:00:00.000912:30:00.000000014:00:00.000000090Group24
2015-05-07 00:00:00.0001012:30:00.000000014:00:00.000000090Group11
Thank You,
Helal
Helal - if you would see my post above, I was questioning whether that particular row should be grouped with the rest, as it is on a different day. If day is irrelevant see the comments in the code, you can remove the PARTITION BY DT from the window functions, it should then provide the expected output
USE tempdb;
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES('20150506',1,'10:30','12:00'),
('20150506',2,'10:30','12:00'),
('20150506',4,'10:30','12:00'),
('20150506',5,'10:30','12:00'),
('20150506',6,'10:30','12:00'),
('20150506',8,'10:30','12:00'),
('20150506',10,'11:00','12:00'),
('20150506',2,'12:30','13:00'),
('20150506',3,'12:30','14:00'),
('20150506',7,'12:30','14:00'),
('20150506',9,'12:30','14:00'),
('20150507',10,'12:30','14:00');
WITH AddRank
AS
(
SELECTGT.DT,
GT.ClientID,
GT.BegTime,
GT.EndTime,
GT.Duration,
--Remove Partition by DT if you want times grouped regardless of day)
--BegRank = DENSE_RANK() OVER (PARTITION BY GT.DT ORDER BY GT.BegTime),
--EndRank = DENSE_RANK() OVER (PARTITION BY GT.DT ORDER BY GT.EndTime)
BegRank = DENSE_RANK() OVER (ORDER BY GT.BegTime),
EndRank = DENSE_RANK() OVER (ORDER BY GT.EndTime)
FROM@GroupTestData AS GT
),
AddCount
AS
(
SELECTAR.DT,
AR.ClientID,
AR.BegTime,
AR.EndTime,
AR.Duration,
AR.BegRank,
AR.EndRank,
--BegCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.DT, AR.BegRank),
--EndCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.DT, AR.EndRank)
BegCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.BegRank),
EndCount = COUNT(AR.ClientID) OVER (PARTITION BY AR.EndRank)
FROMAddRank AS AR
),
DeriveGroupCount
AS
(
SELECTAT.DT,
AT.ClientID,
AT.BegTime,
AT.EndTime,
AT.Duration,
AT.BegRank,
AT.EndRank,
AT.BegCount,
AT.EndCount,
GroupCount = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndCount
WHEN AT.BegCount > AT.EndCount THEN AT.BegCount
ELSE AT.BegCount
END,
GroupRank = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndRank
WHEN AT.BegCount > AT.EndCount THEN AT.BegRank
ELSE AT.EndRank
END
FROMAddCount AS AT
)
SELECTDG.DT,
DG.ClientID,
DG.BegTime,
DG.EndTime,
DG.Duration,
--GroupNum = 'Group' + CAST(DENSE_RANK() OVER (PARTITION BY DG.DT ORDER BY DG.GroupRank) AS VARCHAR),
GroupNum = 'Group' + CAST(DENSE_RANK() OVER (ORDER BY DG.GroupRank) AS VARCHAR),
DG.GroupCount
FROMDeriveGroupCount AS DG;
June 17, 2015 at 9:43 am
Dohsan,
My bad of overlooking your question in the first post. You were right and that was a typo. Thus date matters and 5/7 should have been 5/6. Thank you for your help.
Helal
June 18, 2015 at 1:54 pm
Here's another solution... I haven't tested it against Dohsan's solution, so I can't tell you which will perform better.
(This vast majority of the cost on the following is in sorts... The majority of which should be able to be addressed with the correct covering index)
-- Dosan's Test Data --
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES('20150506',1,'10:30','12:00'),
('20150506',2,'10:30','12:00'),
('20150506',4,'10:30','12:00'),
('20150506',5,'10:30','12:00'),
('20150506',6,'10:30','12:00'),
('20150506',8,'10:30','12:00'),
('20150506',10,'11:00','12:00'),
('20150506',2,'12:30','13:00'),
('20150506',3,'12:30','14:00'),
('20150506',7,'12:30','14:00'),
('20150506',9,'12:30','14:00'),
('20150507',10,'12:30','14:00');
-- The Actual Solution --
;WITH FindGroupStarts AS (
SELECT
gtd.DT,
gtd.ClientID,
gtd.BegTime,
gtd.EndTime,
ROW_NUMBER() OVER (PARTITION BY gtd.ClientID ORDER BY gtd.DT, gtd.BegTime, gtd.ClientID) rn
FROM
@GroupTestData gtd
), PutRowInGroups AS (
SELECT
fgs.DT,
fgs.ClientID,
fgs.BegTime,
fgs.EndTime,
MAX(fgs.rn) OVER (ORDER BY fgs.DT, fgs.BegTime, fgs.ClientID ROWS UNBOUNDED PRECEDING) GroupNum
FROM
FindGroupStarts fgs
), MaxClientOfGroup AS (
SELECT
prg.GroupNum,
MAX(prg.ClientID) AS MaxClientID
FROM
PutRowInGroups prg
GROUP BY
prg.GroupNum
)
SELECT
prg.DT,
prg.ClientID,
prg.BegTime,
prg.EndTime,
CASE
WHEN prg.ClientID = mcg.MaxClientID
THEN 'Grp' + CAST(prg.GroupNum AS VARCHAR(10)) + '=' + CAST(COUNT(*) OVER (PARTITION BY prg.GroupNum) AS VARCHAR(10))
ELSE ''
END AS GrpSize
FROM
PutRowInGroups prg
JOIN MaxClientOfGroup mcg
ON prg.GroupNum = mcg.GroupNum
June 18, 2015 at 2:12 pm
Doohsan,
I needed to add couple more variables into the partition statement. Thus, I modified the data and codes accordingly (see below). First I get this error message:
The Parallel Data Warehouse (PDW) features are not enabled.
Second, am I doing this correctly since GroupCount seems to be ok but not GroupRank that is all 1!
Thank You Helal
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
ProvID INT NOT NULL,
Svctype char (5) NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES
('20150506',1,10,'K7','10:30','12:00'),
('20150506',2,10,'K7','10:30','12:00'),
('20150506',4,10,'K7','10:30','12:00'),
('20150506',5,10,'K7','10:30','12:00'),
('20150506',6,10,'K7','10:30','12:00'),
('20150506',8,10,'K7','10:30','12:00'),
('20150506',10,10,'K7','11:00','12:00'),
('20150506',2,10,'K7','12:30','13:00'),
('20150506',3,10,'K7','12:30','14:00'),
('20150506',7,10,'K7','12:30','14:00'),
('20150506',9,10,'K7','12:30','14:00'),
('20150506',10,10,'K7','12:30','14:00');
; WITH AddRank
AS
(
SELECTGT.DT,
GT.ClientID,
GT.BegTime,
GT.EndTime,
GT.Duration,
--Remove Partition by DT if you want times grouped regardless of day)
BegRank = DENSE_RANK() OVER (PARTITION BY GT.PROVID, GT.SVCTYPE, GT.DT ORDER BY GT.BegTime),
EndRank = DENSE_RANK() OVER (PARTITION BY GT.PROVID, GT.SVCTYPE, GT.DT ORDER BY GT.EndTime)
FROM@GroupTestData AS GT
),
AddCount
AS
(
SELECTAR.DT,
AR.ClientID,
AR.BegTime,
AR.EndTime,
AR.Duration,
AR.BegRank,
AR.EndRank,
BegCount = COUNT(AR.ClientID) OVER (PARTITION BY GT.PROVID, GT.SVCTYPE, GT.DT ORDER BY AR.BegRank),
EndCount = COUNT(AR.ClientID) OVER (PARTITION BY GT.PROVID, GT.SVCTYPE, GT.DT ORDER BY AR.EndRank)
FROMAddRank AS AR
),
DeriveGroupCount
AS
(
SELECTAT.DT,
AT.ClientID,
AT.BegTime,
AT.EndTime,
AT.Duration,
AT.BegRank,
AT.EndRank,
AT.BegCount,
AT.EndCount,
GroupCount = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndCount
WHEN AT.BegCount > AT.EndCount THEN AT.BegCount
ELSE AT.BegCount
END,
GroupRank = CASE
WHEN AT.BegCount < AT.EndCount THEN AT.EndRank
WHEN AT.BegCount > AT.EndCount THEN AT.BegRank
ELSE AT.EndRank
END
FROMAddCount AS AT
)
SELECTDG.DT,
DG.ClientID,
DG.BegTime,
DG.EndTime,
DG.Duration,
GroupNum = 'Group' + CAST(DENSE_RANK() OVER (PARTITION BY GT.PROVID, GT.SVCTYPE, GT.DT ORDER BY DG.GroupRank) AS VARCHAR),
DG.GroupCount
FROMDeriveGroupCount AS DG;
June 18, 2015 at 2:19 pm
Thank You. Coupl eissues:
1. I think there was a missing close parentheses in this statement right before ROWS. I added that.
\MAX(fgs.rn) OVER (ORDER BY fgs.DT, fgs.BegTime, fgs.ClientID) ROWS UNBOUNDED PRECEDING) GroupNum
Then, I get this error:
Msg 11305, Level 15, State 10, Line 43
The Parallel Data Warehouse (PDW) features are not enabled.
last, how do I add more variables such ProvID and SvcType in to the partition code?
Can I do this:
MAX(fgs.rn) OVER (ORDER BY fgs.provid, fgs.svctype,fgs.DT, fgs.BegTime, fgs.ClientID) ROWS UNBOUNDED PRECEDING) GroupNum
Given the new data:
Thank You
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES('20150506',1,'10:30','12:00'),
('20150506',2,'10:30','12:00'),
('20150506',4,'10:30','12:00'),
('20150506',5,'10:30','12:00'),
('20150506',6,'10:30','12:00'),
('20150506',8,'10:30','12:00'),
('20150506',10,'11:00','12:00'),
('20150506',2,'12:30','13:00'),
('20150506',3,'12:30','14:00'),
('20150506',7,'12:30','14:00'),
('20150506',9,'12:30','14:00'),
('20150507',10,'12:30','14:00');
June 18, 2015 at 2:45 pm
helal.mobasher 13209 (6/18/2015)
Thank You. Coupl eissues:1. I think there was a missing close parentheses in this statement right before ROWS. I added that.
\MAX(fgs.rn) OVER (ORDER BY fgs.DT, fgs.BegTime, fgs.ClientID) ROWS UNBOUNDED PRECEDING) GroupNum
Then, I get this error:
Msg 11305, Level 15, State 10, Line 43
The Parallel Data Warehouse (PDW) features are not enabled.
last, how do I add more variables such ProvID and SvcType in to the partition code?
Can I do this:
MAX(fgs.rn) OVER (ORDER BY fgs.provid, fgs.svctype,fgs.DT, fgs.BegTime, fgs.ClientID) ROWS UNBOUNDED PRECEDING) GroupNum
Given the new data:
Thank You
DECLARE @GroupTestData TABLE
(
DT DATETIME NOT NULL,
ClientID INT NOT NULL,
BegTime TIME NOT NULL,
EndTime TIME NOT NULL,
Duration AS CAST(DATEDIFF(MI,BegTime,EndTime) AS INT)
);
INSERT INTO @GroupTestData (DT,ClientID,BegTime,EndTime)
VALUES('20150506',1,'10:30','12:00'),
('20150506',2,'10:30','12:00'),
('20150506',4,'10:30','12:00'),
('20150506',5,'10:30','12:00'),
('20150506',6,'10:30','12:00'),
('20150506',8,'10:30','12:00'),
('20150506',10,'11:00','12:00'),
('20150506',2,'12:30','13:00'),
('20150506',3,'12:30','14:00'),
('20150506',7,'12:30','14:00'),
('20150506',9,'12:30','14:00'),
('20150507',10,'12:30','14:00');
The code executes fine exactly as I posted it. No missing parentheses... (tested on SQL Server Dev Edition)
The following is the correct syntax...
MAX(fgs.rn) OVER (ORDER BY fgs.DT, fgs.BegTime, fgs.ClientID ROWS UNBOUNDED PRECEDING)
I've never had the opportunity to get my mitts on a PDW so I can't say if there are any SQL syntax differences or if any additional functionality needs to be enabled or installed...
HTH,
Jason
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply