Grouping of data based on moving time period

  • Afternoon, looking to pick some brains here to what I think is a grouping problem. I hope my explanation isn't as muddy as i thought it was when trying to type it.

    To give you some context we have a new amendments application (nothing fancy, excel based with SQL Server back end) that allows users to submit amendments to product data (Product Info, PO Prices, Dates etc.). There is also an admin tool that a team uses to action these amendments in the various systems.

    The old version of this tool, users submitted amendments by style and could if need be submit multiple amendments against one product at the same time. The new tool, I believe for audit reasons, users submit by amendment type, so for example I would submit a cost price change for a given style.

    The issue now is that on the occasions where a user has multiple amendments, they now come through separately. So cost price would be Amendment 1 and a date change would be amendment 2 even though they could be the same product. This could potentially mean that the admin team would be duplicating work if the paperwork is updated and sent after each amendment, whereas before they would make both changes and only send the paperwork once.

    Having not built either of these tools, I've been tasked with trying to fix this, my two thoughts being either to amend the user form to somehow capture/allow users to submit amendments together or try to use the existing data and doing the grouping dynamically in the back end. At the moment I'm trying to look at the latter, as preferably I don’t want to have to do too much on the tool itself.

    In my mind I'm looking at putting in a time delay of a couple of hours, so amendments do not appear in the admin tool right away.

    Use that lag to look at grouping any submitted amendments that occur within 30mins of the first occurrence of that style

    This grouping would then be given a joint time so when the 'time lag' period passes the amendments will be visible together.

    I've tried a few things and a few head on desk moments trying to get a set based approach but haven't been able to get where i want, its either an issue where amendments span an hour, such as 9:59 and then 10:03 or grouping together amendments that happen after the 30mins of the first one.

    Here is some sample data

    USE FF_Winning_Together;

    IF OBJECT_ID(N'tempdb..#AmendTest',N'U') IS NOT NULL

    DROP TABLE #AmendTest;

    CREATE TABLE #AmendTest

    (

    AmendmentIDINT IDENTITY(1,1)NOT NULL,

    StyleCHAR(1)NOT NULL,

    AmendmentStatusVARCHAR(10)NOT NULL,

    DTDATETIMENOT NULL

    )

    INSERT INTO #AmendTest (Style,AmendmentStatus,DT)

    VALUES('C','Submitted','2014-03-13 08:53:37.110'),

    ('F','Submitted','2014-03-13 08:54:00.000'),

    ('A','Submitted','2014-03-13 08:55:37.110'),

    ('C','Submitted','2014-03-13 08:59:27.323'),

    ('B','Submitted','2014-03-13 09:03:37.110'),

    ('B','Submitted','2014-03-13 09:04:46.013'),

    ('C','Submitted','2014-03-13 09:04:50.230'),

    ('A','Submitted','2014-03-13 09:05:46.013'),

    ('B','Submitted','2014-03-13 09:05:50.230'),

    ('D','Submitted','2014-03-13 09:23:37.110'),

    ('D','Submitted','2014-03-13 09:24:46.013'),

    ('D','Submitted','2014-03-13 09:29:27.323'),

    ('D','Submitted','2014-03-13 09:34:50.230'),

    ('A','Submitted','2014-03-13 09:56:00.323'),

    ('A','Submitted','2014-03-13 10:06:50.230'),

    ('B','Submitted','2014-03-13 11:09:27.323');

    select * from #AmendTest

    order by Style, dt;

    -- one of my attempts

    WITH DeriveTimeGroup (Style,StyleCnt,MinDt,MaxDt)

    AS

    (

    SELECTP.Style,

    COUNT(P.Style),

    MIN(P.DT),

    DATEADD(MINUTE,30,MIN(P.DT))

    --,DATEADD(MINUTE,(DATEDIFF(MINUTE,0,AT.DT)/30)*30,0))

    --DATEADD(MINUTE,(DATEDIFF(MINUTE,0,AT.DT)/30)*30+30,0))

    FROM#AmendTest AS P

    GROUP

    BYP.Style

    )

    SELECTP.AmendmentID,

    P.Style,

    P.DT,

    D.StyleCnt,

    D.MinDt,

    D.MaxDt,

    CASE

    WHEN D.StyleCnt = 1 THEN 0

    WHEN P.DT >= D.MinDt AND P.DT <= D.MaxDt THEN 1

    ELSE 0

    END AS Grp,

    CASE

    WHEN P.DT > D.MaxDt THEN DATEADD(MINUTE,120,P.DT)

    ELSE DATEADD(MINUTE,120,D.MaxDt)

    END AS ViewTime

    FROM#AmendTest AS P

    INNER

    JOINDeriveTimeGroup AS D

    ONP.Style = D.Style

    WHEREP.DT <= DATEADD(MINUTE,-120,GETDATE())

    ORDER

    BYP.Style,

    P.DT;

    so using this data, they would be grouped as follows:

    A 3,8

    A 13,14

    B 5,6,9

    B 16

    C 1,4,7

    D 10,11,12,13

    F 2

    Any help welcome and if any more detail or clarification happy to add.

  • Think I'm getting closer:

    WITH GroupTime (Style,GrpDT)

    AS

    (

    SELECTAT.Style,

    DATEADD(HOUR,DATEDIFF(HOUR,0,DATEADD(MINUTE,30,AT.DT)),0)

    FROM#AmendTest AS AT

    GROUP

    BYAT.Style,

    DATEADD(HOUR,DATEDIFF(HOUR,0,DATEADD(MINUTE,30,AT.DT)),0)

    ),

    DeriveNew (AmendmentID,Style,GrpDT,GrpNum)

    AS

    (

    SELECTAT.AmendmentID,

    AT.Style,

    CA1.GrpDT,

    DENSE_RANK() OVER (PARTITION BY AT.Style ORDER BY CA1.GrpDT ASC)

    FROM#AmendTest AS AT

    CROSS APPLY (SELECT DATEADD(HOUR,DATEDIFF(HOUR,0,DATEADD(MINUTE,30,AT.DT)),0)) AS CA1(GrpDT)

    INNER

    JOINGroupTime GT

    ONAT.Style = GT.Style

    AND CA1.GrpDT = GT.GrpDT

    ),

    StyleCnt (Style,GrpNum,GrpCnt)

    AS

    (

    SELECTDN.Style,

    DN.GrpNum,

    COUNT(DN.GrpNum)

    FROMDeriveNew AS DN

    GROUP

    BYDN.Style,

    DN.GrpNum

    )

    SELECTDN.AmendmentID,

    DN.Style,

    DN.GrpDT,

    DN.GrpNum,

    SC.GrpCnt,

    --If only one Style group as 0

    CASE

    WHEN SC.GrpCnt = 1 THEN 0

    ELSE DN.GrpNum

    END AS Chuff

    FROMDeriveNew AS DN

    INNER

    JOINStyleCnt AS SC

    ONDN.Style = SC.Style

    AND DN.GrpNum = SC.GrpNum

  • This is certainly ugly, I might come back to improve this query.

    SELECT at.Style,

    STUFF((SELECT ',' + CAST( x.AmendmentID AS varchar(10))

    FROM #AmendTest x

    WHERE at.Style = x.Style

    AND x.DT BETWEEN at.DT AND DATEADD(MI, 30, at.DT)

    ORDER BY x.AmendmentID

    FOR XML PATH('')), 1,1, '')

    FROM #AmendTest at

    JOIN #AmendTest at2 ON at.Style = at2.Style AND at2.DT BETWEEN at.DT AND DATEADD(MI, 30, at.DT)

    WHERE at.AmendmentID NOT IN(

    SELECT b.AmendmentID

    FROM #AmendTest a

    JOIN #AmendTest b ON a.Style = b.Style AND b.DT > a.DT AND b.DT <= DATEADD(MI, 30, a.DT))

    GROUP BY at.AmendmentID, at.Style, at.DT

    ORDER BY at.Style, at.DT

    Should you group within 30 minutes from the first AmendmentID? or the previous one?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Perhaps this is not quite so ugly:

    SELECT Style, GroupedAmendmentIDs

    FROM

    (

    SELECT Style

    ,GroupedAmendmentIDs=STUFF((

    SELECT ',' + CAST(AmendmentID AS VARCHAR(5))

    FROM #AmendTest b

    WHERE a.Style = b.Style AND 30 >= ABS(DATEDIFF(minute, a.DT, b.DT))

    FOR XML PATH('')

    ), 1, 1, '')

    FROM #AmendTest a

    ) a

    GROUP BY Style, GroupedAmendmentIDs

    ORDER BY Style, GroupedAmendmentIDs;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks guys I'll take a look at these this morning.

    The 'requirements' as such aren't set in concrete, 30mins was just a finger in the air estimate on my behalf of what a suitable time would be to allow someone to submit multiple amendments, similar with the length of the time lag to stop them showing before they've had a change to be grouped.

    I actually wasn't specifically looking to concatenate the IDs together, more just highlighting how they would be grouped together, although the concatenation part has got me looking at this in a different way!

  • And to answer luis's question I was originally basing it on 30mins from the first amendment, although again had loosely thought about resetting it and taking it as 30mins from the previous one

  • Right, been able to spend a bit more time on this and revisit and the good news is seems I have something workable uses Luis's code

    IF OBJECT_ID(N'dbo.VW_AmendmentsGroupTest1',N'V') IS NOT NULL

    DROP VIEW dbo.VW_AmendmentsGroupTest1;

    GO

    CREATE VIEW dbo.VW_AmendmentsGroupTest1 WITH SCHEMABINDING

    AS

    WITH BaseGroup (Product_Ref,Submitted_Timestamp,ViewTime,GroupedIDs)

    AS

    (

    SELECTAT.Product_Ref,

    AT.Submitted_Timestamp,

    --Add hour to help calculate a lag/delay so data cannot be viewed before its grouped

    DATEADD(HOUR,1,AT.Submitted_Timestamp),

    STUFF((

    SELECT ',' + CAST( x.Amendment_ID AS varchar(10))

    FROM Amendments.Amendments_Header x

    WHEREAT.Product_Ref = x.Product_Ref

    AND x.Submitted_Timestamp BETWEEN at.Submitted_Timestamp AND DATEADD(MI, 30, at.Submitted_Timestamp)

    ORDER BY x.Amendment_ID

    FOR XML PATH('')

    ), 1,1, '')

    FROMAmendments.Amendments_Header AS AT

    INNER

    JOINAmendments.Amendments_Header AS AT2

    ONAT.Product_Ref = AT2.Product_Ref

    AND AT2.Submitted_Timestamp BETWEEN at.Submitted_Timestamp AND DATEADD(MI, 30, at.Submitted_Timestamp)

    WHEREAT.Amendment_ID NOT IN (

    SELECTB.Amendment_ID

    FROMAmendments.Amendments_Header AS A

    INNER

    JOINAmendments.Amendments_Header AS B

    ONA.Product_Ref = B.Product_Ref

    AND B.Submitted_Timestamp > A.Submitted_Timestamp

    AND B.Submitted_Timestamp <= DATEADD(MI, 30, A.Submitted_Timestamp)

    )

    GROUP

    BYAT.Amendment_ID,

    AT.Product_Ref,

    AT.Submitted_Timestamp

    )

    --re-split out the delimited IDs and determine by count if it is a multiple or not

    SELECTBG.Product_Ref,

    --round the time

    DATEADD(HOUR,DATEDIFF(HOUR,0,DATEADD(MI,30,BG.ViewTime)),0) AS ViewTime,

    CA1.Item AS AmdendmentID,

    CASE

    WHEN COUNT(*) OVER (PARTITION BY BG.Product_Ref, BG.ViewTime) > 1 THEN 'Y'

    ELSE 'N'

    END AS Multiple

    FROMBaseGroup AS BG

    CROSS

    APPLYdbo.DelimitedSplit8K(BG.GroupedIDs,',') AS CA1;

    GO

    With this view I can take an amendment ID and see if it is a multiple, if it is return the product and grouped time to then return the rest of the grouped IDs. Something along the lines of:

    DECLARE@ProductRef VARCHAR(10),

    @dt DATETIME,

    @id int = 2544;

    SELECT@ProductRef = A.Product_Ref,

    @dt = A.ViewTime

    FROMdbo.VW_AmendmentsGroupTest1 AS A

    WHEREA.AmdendmentID = @id;

    SELECTB.AmdendmentID

    FROMdbo.VW_AmendmentsGroupTest1 AS B

    WHEREB.Product_Ref = @ProductRef

    AND B.ViewTime = @dt;

    One small data issue I'm still having is if the submitted amendments span over a time greater than 1/2 hour its not returning all the IDs.

    Here's an example:

    IF OBJECT_ID(N'tempdb..#GroupTest',N'U') IS NOT NULL

    DROP TABLE #GroupTest;

    CREATE TABLE #GroupTest

    (

    Product VARCHAR(10) NOT NULL,

    ID INT NOT NULL,

    DT DATETIME NOT NULL

    )

    INSERT INTO #GroupTest

    VALUES('CT42203599',2382,'2014-03-10 15:06:00.980'),

    ('CT42203599',2384,'2014-03-10 15:19:09.590'),

    ('CT42203599',2385,'2014-03-10 15:19:57.450'),

    ('CT42203599',2386,'2014-03-10 15:21:26.013'),

    ('CT42203599',2387,'2014-03-10 15:29:56.370'),

    ('CT42203599',2388,'2014-03-10 15:35:18.197'),

    ('CT42203599',2389,'2014-03-10 15:36:57.200'),

    ('CT42203599',2390,'2014-03-10 15:37:57.900'),

    ('CT42203599',2391,'2014-03-10 15:39:41.087'),

    ('CT42203599',2392,'2014-03-10 15:56:20.067'),

    ('CT42203599',2393,'2014-03-10 15:59:51.007'),

    ('CT42203599',2394,'2014-03-10 16:01:00.573'),

    ('CT42203599',2395,'2014-03-10 16:01:49.550'),

    ('CT42203599',2396,'2014-03-10 16:03:23.453')

    --Should be 2 groups 2382--2388 and 2389-2396; however, 2389+ are being ignored completely

    --as 2389 is within 30mins of some of the values in the 2382-2388 range

    SELECTAT.Product,

    AT.DT,

    --AT2.ID,

    DATEADD(HOUR,1,AT.DT)

    ,STUFF((

    SELECT ',' + CAST( x.Amendment_ID AS varchar(10))

    FROM Amendments.Amendments_Header x

    WHEREAT.Product = x.Product_Ref

    AND x.Submitted_Timestamp BETWEEN at.DT AND DATEADD(MI, 30, at.DT)

    ORDER BY x.Amendment_ID

    FOR XML PATH('')

    ), 1,1, '')

    FROM#GroupTest AS AT

    INNER

    JOIN#GroupTest AS AT2

    ONAT.Product = AT2.Product

    AND AT2.DT BETWEEN at.DT AND DATEADD(MI, 30, at.DT)

    WHEREAT.ID NOT IN (

    SELECTB.ID--A.*,B.ID, B.*

    FROM#GroupTest AS A

    INNER

    JOIN#GroupTest AS B

    ONA.Product = B.Product

    AND B.DT > A.DT

    AND B.DT <= DATEADD(MI, 30, A.DT)

    )

    GROUP

    BYAT.ID,

    AT.Product,

    AT.DT

    Can't seem to get the second lot to show correctly, so any ideas...

  • Think I finally solved this, went with a different tact started by finding the Min and Max dates by product by day and using the numbers function to produce 30min intervals from the min to max dates. Seems to do the business for this purpose, still needs some more thorough checks

    Anyone that is interested here is the code

    WITH GroupBaseData (OptionID,DTDay,MinDT,MaxDT)

    AS

    (

    SELECT--GT.ID,

    GT.OptionID,

    ca1.DTDay,

    --CA1.DTDay,

    --Find min and max dates by day to get range values

    MIN(GT.DT),

    MAX(CA2.DT1)

    --,1440 - MIN(CA3.Mins)

    FROMdbo.GroupTest AS GT

    --Find day start to group on

    CROSS

    APPLY(SELECTDATEADD(DAY,DATEDIFF(DAY,0,GT.DT),0)) AS CA1(DTDay)

    CROSS

    APPLY(SELECTDATEADD(MI,1,GT.DT)) AS CA2(DT1)

    --CROSS

    --APPLY(SELECT DATEDIFF(MI,CA1.DTday,CA2.DT)/30*30) AS CA3(Mins)

    GROUP

    BYGT.OptionID,

    CA1.DTDay

    ),

    GroupIntervals

    AS

    (

    SELECTGB.OptionID,

    GB.DTDay,

    GB.MinDT,

    GB.MaxDT,

    CA1.StartInterval,

    CA2.EndInterval,

    ROW_NUMBER() OVER (PARTITION BY GB.optionID,GB.DTDay ORDER BY CA1.StartInterval ASC) AS RN

    FROMGroupBaseData AS GB

    --1440mins in a day, so 48 30min intervals

    CROSS APPLY dbo.GetNums(0,48) AS GN

    CROSS APPLY (selectDATEADD(mi,GN.N * 30,mindt)) AS CA1(StartInterval) --Use Min date as start point and increase in 30mins intervals

    CROSS APPLY (select DATEADD(mi,(GN.N + 1) * 30,mindt)) AS CA2(EndInterval) --End of 30min Interval

    WHERE --(N * 30) <= bd.mins

    --OptionID = 'f'

    CA2.EndInterval <= (GB.MaxDT + 30)

    )

    SELECTGT.ID,

    --GT.DT,

    GT.OptionID,

    --GI.StartInterval,

    --GI.EndInterval,

    --GI.RN,

    --DATEADD(HOUR,1,GI.StartInterval),

    CASE

    WHEN COUNT(CA2.Viewtime) OVER (PARTITION BY GT.OptionID, GI.RN) > 1 then 'Y'

    ELSE 'N'

    END AS Multiple,

    CA2.Viewtime

    FROMGroupTest as gt

    CROSS

    APPLY(SELECTDATEADD(DAY,DATEDIFF(DAY,0,GT.DT),0)) AS CA1(DTDay)

    INNER

    JOINGroupIntervals AS GI

    ONGT.OptionID = GI.OptionID

    AND CA1.DTDay = GI.DTDay

    CROSS

    APPLY(SELECT DATEADD(MINUTE,DATEDIFF(MINUTE,0,DATEADD(HOUR,1,GI.StartInterval)) / 30 * 30, 0)) AS CA2(Viewtime)

    WHEREGT.DT >= GI.StartInterval

    ANDGT.DT < GI.EndInterval;

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply