September 22, 2014 at 7:16 pm
I have the following table called "test":
CREATE TABLE test
(
call_state tinyint,
call_type tinyint,
caller int,
localtime datetime,
s_id tinyint,
callee int,
duration int
);
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 6 , 1801 , 2540 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 3 , 1801 , 2540 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:48' , 2 , 1801 , 2540 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 1 , 1801 , 2540 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 6 , 1801 , 6060 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 3 , 1801 , 6070 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 2 , 1801 , 6070 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 1 , 1801 , 6070 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 6 , 1801 , 3850 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 3 , 1801 , 3850 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 2 , 1801 , 3850 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 1 , 1801 , 3860 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:03' , 1 , 1801 , 4470 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 11 , 1801 , 4470 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 6 , 1801 , 4460 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 3 , 1801 , 4470 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 2 , 1801 , 4460 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 2 , 1801 , 4200 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 3 , 1801 , 4210 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 1 , 1801 , 4210 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 6 , 1801 , 4210 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 6 , 1801 , 2730 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 3 , 1801 , 2730 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 1 , 1801 , 2730 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 5 , 1801 , 3760 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 2 , 1801 , 3770 )
INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 1 , 1801 , 3750 )
I would like to have the result set back in the following format with 2 more columns. Please see the attached result.jpg.
Group column = If caller and callee are the same, localtime between each row is within 1 second, then it will flag as 1. Next group will be flag as 2 ... etc.
Filter column = If s_id for all the group calls has 6 and 2, then mark 1 for filter column where s_id = 2. Like group 1.
If all the group calls has only 6, then mark 1 for filter column where s_id = 6. Like group 6.
If all the group calls has only 2, then mark 1 for filter column where s_id = 2. Like group 7.
All the s_id other than 2 or 6, they don't need to be flag as 1 on the filter column. Basically, each group call can only have 1 or no row flag as 1 for the filter column.
Thanks very much!
September 23, 2014 at 3:20 am
-- Since group allocation must take place sequentially row by row, your good options
-- are quite limited. This solution uses a recursive CTE, you could also consider the
-- Quirky Update, probably the fastest method, or a cursor.
-- Performance will be horrible unless you have an existing column which models rn in "OrderedSet",
-- or you can add one. You will need an index on it too.
;WITH OrderedSet AS (
SELECT
rn = ROW_NUMBER() OVER(ORDER BY [caller], callee, localtime),
*
FROM test
),
rCTE AS (
SELECT tr.*, grp = 1
FROM OrderedSet tr
WHERE rn = 1
UNION ALL
SELECT tr.*,
grp = CASE
WHEN tr.[caller] = lr.[caller] AND tr.callee = lr.callee
AND DATEDIFF(SECOND, lr.localtime, tr.localtime ) <= 1 THEN lr.grp
ELSE lr.grp+1 END
FROM rCTE lr
INNER JOIN OrderedSet tr
ON tr.rn = lr.rn+1
)
SELECT r.*,
filter = CASE
WHEN x.minsid = 2 AND x.maxsid = 6 AND r.s_id = 2 THEN 1
WHEN x.minsid = 6 AND r.s_id = 6 THEN 1
WHEN x.minsid = 2 AND x.maxsid = 2 AND r.s_id = 2 THEN 1
ELSE NULL END
FROM rCTE r
CROSS APPLY (
SELECT minsid = MIN(ri.s_id), maxsid = MAX(ri.s_id)
FROM rCTE ri
WHERE ri.grp = r.grp
AND ri.s_id IN (2,6)
) x
OPTION (MAXRECURSION 0);
-- Here's how not to do it: a query which resolves the group number but uses a "triangular join".
-- For each and every row in the set, the whole set up to that point (rn) is scanned.
-- Horrible if you have anything except a very small number of rows.
;WITH OrderedSet AS (
SELECT
rn = ROW_NUMBER() OVER(ORDER BY [caller], callee, localtime),
*
FROM test
),
MarkedSet AS (
SELECT tr.*, Marker = ISNULL(x.Marker,1)
FROM OrderedSet tr
OUTER APPLY (
SELECT Marker = 0
FROM OrderedSet lr
WHERE lr.rn + 1 = tr.rn
AND lr.[caller] = tr.[caller]
AND lr.callee = tr.callee
AND DATEDIFF(SECOND, lr.localtime, tr.localtime ) <= 1
) x
)
SELECT *
FROM MarkedSet m
CROSS APPLY (
SELECT grp = SUM(Marker)
FROM MarkedSet pr
WHERE pr.rn <= m.rn
) x
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 23, 2014 at 3:05 pm
Thank you so much for the query and it worked flawlessly. However, is there another way to get the same results faster? My data set has 20 millions records.
September 23, 2014 at 3:56 pm
As Chris said in the comments, the quirky update should be a lot faster. I've been bitten by the rCTEs ' performance when using a lot of information. You can read about the quirky update (QU) in the following article: http://www.sqlservercentral.com/articles/T-SQL/68467/
This could do the trick, but remember to test and follow the QU rules.
CREATE TABLE #test
(
call_state tinyint,
call_type tinyint,
caller int,
localtime datetime,
s_id tinyint,
callee int,
duration int,
[group] int,
filter int
);
CREATE CLUSTERED INDEX IX_TempTest ON #Test(caller, callee, localtime)
INSERT INTO #test(
call_state,
call_type ,
caller ,
localtime ,
s_id ,
callee ,
duration
)
SELECT call_state,
call_type ,
caller ,
localtime ,
s_id ,
callee ,
duration
FROM test
DECLARE @Caller int,
@Callee int,
@Time datetime,
@Group int = 0
UPDATE t SET
@Group = [Group] = @Group + CASE WHEN Caller = @Caller AND callee = @Callee AND ABS( DATEDIFF( ss, localtime, @Time)) <= 1 THEN 0 ELSE 1 END,
@Caller = caller,
@Callee = callee,
@Time = localtime,
filter = CASE WHEN s_id = 2 THEN 1 END
FROM #test t WITH(TABLOCKX)
OPTION(MAXDOP 1)
UPDATE t SET
filter = 1
FROM #test t
WHERE s_id = 6
AND [group] IN (SELECT [group] FROM #test GROUP BY [group] HAVING MAX(filter) IS NULL)
SELECT *,
--If you don't want to update the table twice, you can calculate the filter on the fly.
CASE ROW_NUMBER() OVER( PARTITION BY [group] ORDER BY CASE s_id WHEN 2 THEN 1 WHEN 6 THEN 2 ELSE 3 END) WHEN 1 THEN 1 END filter2
FROM #test
DROP TABLE #test
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply