How to do this in SQL?

  • if given a table A

    (Customer_ID, Cost, Trip_EndDate)

    sorted by Customer_ID, Trip_EndDate

    Customer_ID COST TRIP_ENDDATE

    ------------------------------------

    001 200.00 01/14/1997

    001 300.00 03/06/1997

    001 300.00 09/06/1997

    001 300.00 09/22/1997

    001 300.00 09/26/1997

    determine if they are the same trip by assigning a trip ID in new table called it B such that

    Customer ID COST TRIP_ENDDATE TRIP_ID

    ------------------------------------------------

    001 200.00 01/14/1997 1

    001 300.00 03/06/1997 2

    001 300.00 09/16/1997 3

    001 300.00 09/22/1997 3

    001 300.00 09/26/1997 3

    determine whether each record is the same trip by

    if the difference between first the records' trip_enddate with the same customer_ID is within 7 days

    (e.g. 09/16/1997 and 09/22/1997 is within 7 days and then 09/22/1997 and 09/26/1997 is within 7 days) then assign same trip ID

    how do you do this in SQL? does a CURSOR has to be implemented?

  • Ng,

    You present a unique question. I need a little more info:

    You said that the CustomerID records need to match within 7 days of each other. It looks like the Cost column helps to better identify the trip too. For any given trip, would the cost be different on one of the lines?

  • I believe this is what you are looking for

    use pubs

    go

    -------------------------------------------------

    Create table taba(

    Customer_ID int

    ,COST money

    ,TRIP_ENDDATE datetime

    )

    Insert into taba select 001, 200.00, '01/14/1997'

    Insert into taba select 001, 300.00, '03/06/1997'

    Insert into taba select 001, 300.00, '09/06/1997'

    Insert into taba select 001, 300.00, '09/22/1997'

    Insert into taba select 001, 300.00, '09/26/1997'

    --determine if they are the same trip by assigning a trip ID in new table called it B such that

    ----------------------------------------------------

    Create table tabb(

    Customer_ID int

    ,COST money

    ,TRIP_ENDDATE datetime

    ,TRIP_ID int

    )

    ------------------------------------------------

    INsert into tabB select 001, 200.00, '01/14/1997', 1

    INsert into tabB select 001, 300.00, '03/06/1997', 2

    INsert into tabB select 001, 300.00, '09/16/1997', 3

    INsert into tabB select 001, 300.00, '09/22/1997', 3

    INsert into tabB select 001, 300.00, '09/26/1997', 3

    Select b.*, a.*

    from tabb as b

    Inner join taba as a

    on a.Customer_ID = b.Customer_ID

    AndAbs( datediff( day, a.TRIP_ENDDATE, b.TRIP_ENDDATE ) ) <= 7

    DRop table taba

    DRop table tabb

  • You will need to use at least a WHILE loop or a CURSOR to get the data properly. However, if you need to perform this often I would add a column to Table A and create a Trigger to do this each time a record is created to keep up.

  • I don't see a solution to your problem in a single query. The problem lies in the assignment of a unique Trip_ID in the new table.

    However, you could do it in two goes. First insert one record for each trip using an identity field for Trip_id. After that remove the identity (or 'set IDENTITY_INSERT OFF') and insert all records that are not yet present.

    The query that was posted by AnzioBake is a good start to identify records belonging to the same trip.

  • Here is another stab at it. I am assuming you will not have a customer with two trips ending on the same day. If so you need some sort of unique key on this table to help with the id generation.

    drop table test1

    go

    create table test1 (customer_id int, cost float, trip_enddate datetime)

    go

    insert into test1

    select1,200.00,'1/14/1997'

    union all

    select1,300.00,'3/06/1997'

    union all

    select1,300.00,'9/16/1997'

    union all

    select1,300.00,'9/22/1997'

    union all

    select1,300.00,'9/26/1997'

    union all

    select1,300.00,'10/26/1997'

    union all

    select1,300.00,'10/27/1997'

    union all

    select1,300.00,'10/30/1997'

    union all

    select1,300.00,'11/26/1997'

    go

    drop table test2

    go

    create table test2 (customer_id int, cost float, trip_enddate datetime, trip_id int not null)

    go

    insert into test2 (customer_id, cost, trip_enddate, trip_id)

    select customer_id,

    cost,

    trip_enddate,

    (select count(*) from test1 as b where a.customer_id = b.customer_id and a.trip_enddate >= b.trip_enddate)

    from test1 a

    go

    update t2

    set t2.trip_id = dt.dup_trip_id

    from test2 t2

    join (select a.customer_id,

    a.cost,

    a.trip_enddate,

    a.trip_id as current_trip_id,

    (select min(trip_id)

    from test2 c

    where c.customer_id = a.customer_id

    and datediff(day,c.trip_enddate, b.trip_enddate) <= 7

    and c.trip_enddate < a.trip_enddate) as dup_trip_id

    from test2 a

    join test2 b

    onb.customer_id = a.customer_id

    anddatediff(day,b.trip_enddate, a.trip_enddate) <= 7

    and b.trip_enddate < a.trip_enddate

    ) as dt

    on dt.customer_id = t2.customer_id

    and dt.current_trip_id = t2.trip_id

    go

    select * from test2

    go

  • Another quick note about my previous post. The business rule my script follows is that trips after the first duplicate must be within seven days of the first duplicate to also be considered duplicates. In other words, trip #1 is on 01/01, trip#2 is on 01/05 (duplicate), trip #3 is on 01/10 (duplicate), trip #4 is on 1/12 (duplicate) trip #5 is on 1/13 (not a duplicate since it is seven days from first dup). If this is no good, you are going to have to loop!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply