Previous row where condition exists

  • Hello,

    I have a table of data in which I've identified some duplicate records.  I need to pull all of the duplicate records for someone to review.  My problem is that I'm only able to pull one of the duplicates and not both.  The sample below just includes those values which are actually matching (duplicates) between the rows but there are many other columns that the user will need to review between the duplicates to ensure things on their end.  This is why I need to include BOTH of the matching rows.

    Below is the sample data:

    CREATE TABLE #duplicatesample(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Rownum] [bigint] NULL,
    [ldate] [varchar](10) NULL,
    [RIN#] [nvarchar](255) NULL,
    [Provider_Name] [nvarchar](255) NULL,
    [License#] [nvarchar](255) NULL,
    [pick_up] [nvarchar](5) NULL,
    [drop_off] [nvarchar](5) NULL,
    [Modifier] [nvarchar](519) NULL,
    [CALCULATED_FARE] [float] NULL
    ) ON [PRIMARY]
    GO
    SET IDENTITY_INSERT #duplicatesample ON
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (1, 1, N'19760317', N'015953094', N'ElmedTransportation', N'808547AM', N'22:34', N'23:33', N'A0428', 340.01)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (2, 1, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (3, 2, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (4, 1, N'20190503', N'023245509', N'RACE AMB INC.', N'794931', N'02:56', N'03:16', N'A0428', 292.44)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (5, 1, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (6, 2, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (7, 1, N'20190702', N'027693860', N'MED E', N'895583', N'21:02', N'22:00', N'A0428, A0422', 208.08)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (8, 1, N'20190707', N'332066786', N'MED E', N'895550', N'09:43', N'10:38', N'A0428', 193.03)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (9, 1, N'20190717', N'969316942', N'SUPAMB SERV', N'7-214-69', N'13:30', N'14:49', N'A0428', 223.13)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (10, 1, N'20190720', N'165815440', N'ILMEDCAR INC', N'18302 PT', N'17:25', N'18:12', N'A0425, A0130', 16.509999999999998)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (11, 1, N'20190723', N'176854016', N'ILMEDCAR INC', N'8364 MC', N'15:50', N'16:03', N'A0425, A0130', 21.66)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (12, 1, N'20190724', N'107238503', N'MED E', N'8789MC', N'15:03', N'15:47', N'A0120', 15.22)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (13, 1, N'20190724', N'317386456', N'MED E', N'895568', N'10:54', N'11:30', N'A0428', 148.23)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (14, 1, N'20190731', N'112529763', N'MED E', N'895552', N'23:53', N'00:48', N'A0428', 176.23)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (15, 1, N'20190731', N'028625614', N'MED E', N'895582', N'16:09', N'17:13', N'A0426', 250.31)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (16, 1, N'20190801', N'099906182', N'MED E', N'895569', N'15:00', N'15:49', N'A0428', 170.63)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (17, 1, N'20190801', N'177446952', N'MED E', N'895577', N'13:06', N'13:36', N'A0428, A0422', 202.48000000000002)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (18, 1, N'20190801', N'177446952', N'MED E', N'895577', N'14:09', N'15:07', N'A0428, A0422', 202.48000000000002)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (19, 1, N'20190802', N'222767865', N'MED E', N'895582', N'07:35', N'08:35', N'A0428', 176.23)
    GO
    INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (20, 1, N'20190805', N'195079058', N'MED E', N'895560', N'12:02', N'12:47', N'A0428', 159.43)
    GO
    SET IDENTITY_INSERT #duplicatesample OFF
    GO

    I can identify the duplicates by the RowNum column; if it's > 1 then it's a duplicate of the previous row.  (There are some cases where there are more than 1 duplicate so the rownum will be 3).  What I need is to be able to pull BOTH rows that are duplicates of eachother.  I feel like I should be using the LAG function in here for previous row, but not sure how to apply it with a condition (where rownum = 2).  Something like:

    SELECT ID
    ,Rownum
    ,CASE WHEN ROWNUM = 2 THEN (SELECT LAG(ID) OVER (ORDER BY ID)) ELSE 0 END PREVROW
    ,ldate
    ,RIN#
    ,Provider_Name
    ,License#
    ,pick_up
    ,drop_off
    ,Modifier
    ,CALCULATED_FARE
    FROM #temp3
    ORDER BY ID

    This doesn't work for what I need, but it's kind of the explanation of what I'm looking for.  How I think this could be done is If the rowum = 2, put an X (or something) in a column for the previous row.  Then I could just do a select from the final result where rownum > 1 OR column = X to pull both rows that are duplicates of each other.

    Anyone have any ideas how to do this?

    Much appreciated.

  • Do you mean something like this?

    SELECT *
    FROM duplicateSample ds2
    WHERE ds2.RIN IN (SELECT RIN
    FROM duplicateSample ds1
    WHERE rowNum = 2 );
  • Not quite.  The RIN# is not a unique number here, that number will exist multiple times on a day and provider so this will not return the  expected results across the entire dataset.

    It's really just about the rownum = 2 and capturing the row previous to that one, not matter what else exists.

  • Which columns are used to determine if a row is a duplicate or not?  I'm not talking about after you use ROW_NUMBER() to mark them... I'm talking about what did you used in the PARTITION BY of the ROW_NUMBER() function to "count" the dupes?  The reason I ask, of course, is that there's an easy way to do this and I'd like to demo it against the test data that you've provided.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually, I used all of the values listed in the table as the values which identify duplicates (those are what the Row_Number used to create the RowNum column).  Does that help?  Should I post a new table with more values in addition to these, to help with what you're thinking?

  •  

    Does this do what you want?

    select * 
    from #duplicatesample a
    left join #duplicatesample b
    on b.Rownum=2
    and b.ldate=a.ldate
    and b.ID <> a.id
    where a.Rownum<> 2

    It would be helpful if you could define or just provide a list of the output you are expecting.

  • tacy.highland wrote:

    Actually, I used all of the values listed in the table as the values which identify duplicates (those are what the Row_Number used to create the RowNum column).  Does that help?  Should I post a new table with more values in addition to these, to help with what you're thinking?

    So, basically, all of the columns you have in the #duplicatesample table you provided EXCEPT the ID and RowNum columns are what you used to identify dupes in whatever the original table was... Is that correct?

    And, if so, yes... it would be helpful if you could post more of the original table OR I can just add some other columns to this one.  What I'd be looking for is not only rows that have dupes but those that do not so that I can demonstrate the solution I'm thinking  of.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Using EXISTS:

     Select *
    From #duplicatesample d
    Where Exists (Select *
    From #duplicatesample d2
    Where d2.Rownum > 1
    And d2.ldate = d.ldate
    And d2.[RIN#] = d.[RIN#]);

    Assuming the columns ldate and RIN# are the key columns.  That will depend on how you identified the Rownum column.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here's code that relies on the sequential ordering of the ID column of the underlying data set.  You could generate ID using a SEQUENCE if nothing else to insure it is sequential in order of data.

    I also added dups 3 and 4 to one sample just to test that part of it.

    TRUNCATE TABLE #duplicatesample;

    INSERT #duplicatesample ([Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES

    (1, N'19760317', N'015953094', N'ElmedTransportation', N'808547AM', N'22:34', N'23:33', N'A0428', 340.01),

    (1, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43),

    (2, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43),

    (1, N'20190503', N'023245509', N'RACE AMB INC.', N'794931', N'02:56', N'03:16', N'A0428', 292.44),

    (1, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97),

    (2, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97),

    (3, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97), <<--<<

    (4, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97), <<--<<

    (1, N'20190702', N'027693860', N'MED E', N'895583', N'21:02', N'22:00', N'A0428, A0422', 208.08),

    (1, N'20190707', N'332066786', N'MED E', N'895550', N'09:43', N'10:38', N'A0428', 193.03),

    (1, N'20190717', N'969316942', N'SUPAMB SERV', N'7-214-69', N'13:30', N'14:49', N'A0428', 223.13),

    (1, N'20190720', N'165815440', N'ILMEDCAR INC', N'18302 PT', N'17:25', N'18:12', N'A0425, A0130', 16.509999999999998),

    (1, N'20190723', N'176854016', N'ILMEDCAR INC', N'8364 MC', N'15:50', N'16:03', N'A0425, A0130', 21.66),

    (1, N'20190724', N'107238503', N'MED E', N'8789MC', N'15:03', N'15:47', N'A0120', 15.22),

    (1, N'20190724', N'317386456', N'MED E', N'895568', N'10:54', N'11:30', N'A0428', 148.23),

    (1, N'20190731', N'112529763', N'MED E', N'895552', N'23:53', N'00:48', N'A0428', 176.23),

    (1, N'20190731', N'028625614', N'MED E', N'895582', N'16:09', N'17:13', N'A0426', 250.31),

    (1, N'20190801', N'099906182', N'MED E', N'895569', N'15:00', N'15:49', N'A0428', 170.63),

    (1, N'20190801', N'177446952', N'MED E', N'895577', N'13:06', N'13:36', N'A0428, A0422', 202.48000000000002),

    (1, N'20190801', N'177446952', N'MED E', N'895577', N'14:09', N'15:07', N'A0428, A0422', 202.48000000000002),

    (1, N'20190802', N'222767865', N'MED E', N'895582', N'07:35', N'08:35', N'A0428', 176.23),

    (1, N'20190805', N'195079058', N'MED E', N'895560', N'12:02', N'12:47', N'A0428', 159.43);

    SELECT ca1.*
    FROM #duplicatesample ds
    CROSS APPLY (
    SELECT
    ds.ID, ds.Rownum,
    ds.ldate, ds.RIN#,
    ds.Provider_Name, ds.License#,
    ds.pick_up, ds.drop_off,
    ds.Modifier, ds.CALCULATED_FARE /*, ds....*/
    WHERE ds.Rownum = 2
    UNION ALL
    SELECT
    ds1.ID, ds1.Rownum,
    ds1.ldate, ds1.RIN#,
    ds1.Provider_Name, ds1.License#,
    ds1.pick_up, ds1.drop_off,
    ds1.Modifier, ds1.CALCULATED_FARE /*, ds1....*/
    FROM #duplicatesample ds1
    WHERE ds.Rownum = 2 AND
    ds1.ID = ds.ID - 1
    UNION ALL
    SELECT
    ds3.ID, ds3.Rownum,
    ds3.ldate, ds3.RIN#,
    ds3.Provider_Name, ds3.License#,
    ds3.pick_up, ds3.drop_off,
    ds3.Modifier, ds3.CALCULATED_FARE /*, ds3....*/
    FROM #duplicatesample ds3
    WHERE ds3.Rownum > 2 AND ds3.ID = ds.ID + ds3.Rownum - 2
    ) AS ca1
    WHERE ds.Rownum = 2
    ORDER BY ID

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Scott, I'm looking through the query, running against my full dataset and it looks pretty close. I'll need to add more columns to my final dataset as there are a number of other columns that users will need to review, but I think this will work.  It missed about 4 records which had triple matches (but you added that scenario in your example and it worked so not sure why it didn't pick up those in mine). It's close enough for me to be able to move forward, which is what I really needed.

    This will save hours and hours of work for staff to go through and validate data.  Thank you so much!

  • tacy.highland wrote:

    It missed about 4 records which had triple matches...

    This is what I'm trying to get at.  If you used the old standby of using ROW_NUMBER() to find dupes and then are trying to use the result of ROW_NUMBER() to list the full rows of the dupes, you're making it a bit difficult on yourself.  I think I can help but I need the couple of answers to my short question on my previous post.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Below is a slightly bigger example dataset, 30 rows, more columns.

    CREATE TABLE #duplicatetrips2(
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [RowNum] [bigint] NULL,
    [TRIP_DATE] [varchar](10) NULL,
    [LDATE] [varchar](10) NULL,
    [CL_ID] [nvarchar](255) NULL,
    [PROVIDER_NAME] [nvarchar](255) NULL,
    [VEH] [nvarchar](4) NULL,
    [EMP_NAME] [varchar](1) NOT NULL,
    [BookingID] [nvarchar](50) NULL,
    [PICK_UP] [nvarchar](5) NULL,
    [DROP_OFF] [nvarchar](5) NULL,
    [PADDRESS] [nvarchar](9) NULL,
    [DADDRESS] [nvarchar](9) NULL,
    [PM] [nvarchar](255) NULL,
    [DM] [nvarchar](255) NULL,
    [MODIFIER] [nvarchar](519) NULL,
    [BASE_RATE] [float] NULL,
    [DIST] [float] NULL,
    [RATE_PER_MILE] [float] NULL,
    [COST_OF_MILEAGE] [float] NULL,
    [CALCULATED_FARE] [float] NULL,
    [VARIANCE] [float] NULL,
    [type] [varchar](20) NOT NULL
    ) ON [PRIMARY]
    GO
    SET IDENTITY_INSERT #duplicatetrips2 ON
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5901, 1, N'01/15/2020', N'20200115', N'143136380', N'LifeLine Ambulance LLC', N'8430', N'', N'3954-20', N'11:15', N'11:19', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5902, 1, N'01/15/2020', N'20200115', N'212636062', N'LifeLine Ambulance LLC', N'8430', N'', N'5238-20', N'09:50', N'10:45', N'645 S CE', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 1, 1.03, 1.03, 15.479999999999999, 15.479999999999999, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5903, 1, N'01/15/2020', N'20200115', N'326959210', N'LifeLine Ambulance LLC', N'8431', N'', N'5466-20', N'14:36', N'15:27', N'850 W IRV', N'2451 W TO', N'P', N'R', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5904, 1, N'01/15/2020', N'20200115', N'080889777', N'LifeLine Ambulance LLC', N'8434', N'', N'5143-20', N'14:16', N'15:35', N'7531 S ST', N'2425 EAST', N'D', N'R', N'A0425, A0130', 14.45, 2, 0.515, 1.03, 16.509999999999998, 16.509999999999998, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5905, 1, N'01/15/2020', N'20200115', N'029404324', N'LifeLine Ambulance LLC', N'8436', N'', N'5226-20', N'10:57', N'12:28', N'1969 W OG', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5906, 1, N'01/15/2020', N'20200115', N'210811857', N'LifeLine Ambulance LLC', N'8436', N'', N'5218-20', N'08:51', N'10:52', N'1969 W OG', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5907, 1, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'240-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5908, 2, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'241-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5909, 3, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'241-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5910, 1, N'01/15/2020', N'20200115', N'029404324', N'LifeLine Ambulance LLC', N'8438', N'', N'5225-20', N'06:30', N'08:30', N'4437 S C', N'1969 W OG', N'R', N'P', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5911, 1, N'01/15/2020', N'20200115', N'127312668', N'LifeLine Ambulance LLC', N'8769', N'', N'5284-20', N'08:06', N'09:07', N'4800 W CH', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5912, 1, N'01/15/2020', N'20200115', N'210811857', N'LifeLine Ambulance LLC', N'8769', N'', N'5217-20', N'06:00', N'07:24', N'4437 S C', N'1969 W OG', N'R', N'P', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5913, 1, N'01/15/2020', N'20200115', N'212636062', N'LifeLine Ambulance LLC', N'8769', N'', N'5237-20', N'08:15', N'08:14', N'5130 W JA', N'645 S CE', N'R', N'P', N'A0425, A0130', 14.45, 1, 1.03, 1.03, 15.479999999999999, 15.479999999999999, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5914, 1, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'242-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5915, 2, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'241-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5916, 3, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'242-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5917, 1, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5148-20', N'07:17', N'07:30', N'6141 N PU', N'1029 W HO', N'R', N'D', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5918, 2, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5148-20', N'07:17', N'07:30', N'6141 N PU', N'1029 W HO', N'R', N'D', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5919, 1, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5149-20', N'07:41', N'07:58', N'1029 W HO', N'6141 N PU', N'D', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5920, 2, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5149-20', N'07:41', N'07:58', N'1029 W HO', N'6141 N PU', N'D', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5921, 1, N'01/15/2020', N'20200115', N'326959210', N'LifeLine Ambulance LLC', N'9052', N'', N'5005-20', N'12:00', N'13:01', N'2451 W TO', N'850 W IRV', N'R', N'P', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5922, 1, N'01/15/2020', N'20200115', N'112628763', N'MEDEX', N'8468', N'', N'20-02444', N'16:49', N'17:57', N'5700 S MA', N'2649 E 75', N'H', N'R', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5923, 2, N'01/15/2020', N'20200115', N'112628763', N'MEDEX', N'8468', N'', N'20-02444', N'16:49', N'17:57', N'5700 S MA', N'2649 E 75', N'H', N'R', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5924, 1, N'01/15/2020', N'20200115', N'213439797', N'MEDEX', N'8469', N'', N'20-02338', N'11:27', N'12:25', N'5700 S MA', N'2525 S MI', N'H', N'H', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5925, 2, N'01/15/2020', N'20200115', N'213439797', N'MEDEX', N'8469', N'', N'20-02338', N'11:27', N'12:25', N'5700 S MA', N'2525 S MI', N'H', N'H', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5926, 1, N'01/15/2020', N'20200115', N'100380807', N'MEDEX', N'8788', N'', N'20-02358', N'10:52', N'12:32', N'2544 W MO', N'912 E 133', N'H', N'R', N'A0120', 8.01, 22, 0.046818181818181821, 1.03, 30.67, 30.67, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5927, 2, N'01/15/2020', N'20200115', N'100380807', N'MEDEX', N'8788', N'', N'20-02358', N'10:52', N'12:32', N'2544 W MO', N'912 E 133', N'H', N'R', N'A0120', 8.01, 22, 0.046818181818181821, 1.03, 30.67, 30.67, N'Master Regular A0120')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5928, 1, N'01/15/2020', N'20200115', N'163225998', N'MEDEX', N'8955', N'', N'20-02460', N'17:43', N'18:46', N'1740 W TA', N'11045 S W', N'H', N'R', N'A0428', 208.5, 13, 0.43076923076923074, 5.6, 281.3, 281.3, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5929, 1, N'01/15/2020', N'20200115', N'019278001', N'MEDEX', N'8955', N'', N'20-02299', N'03:30', N'04:01', N'518 N AUS', N'8311 ROOS', N'H', N'H', N'A0428', 208.5, 6, 0.93333333333333324, 5.6, 242.1, 242.1, N'Master Regular')
    GO
    INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5930, 1, N'01/15/2020', N'20200115', N'093216711', N'MEDEX', N'8955', N'', N'20-02493', N'22:51', N'23:38', N'518 N AUS', N'836 W WEL', N'H', N'H', N'A0428', 208.5, 14, 0.39999999999999997, 5.6, 286.9, 286.9, N'Master Regular')
    GO
    SET IDENTITY_INSERT #duplicatetrips2 OFF
    GO

    but I ran the code Scott provided against this dataset and it still works, pulling up two sets of 3 dups.  I'm stumped.

     

  • tacy.highland wrote:

    but I ran the code Scott provided against this dataset and it still works, pulling up two sets of 3 dups.  I'm stumped.

    Oh no... I'm NOT saying that Scotts code didn't work (although I've not actually tried it).  I'm just going to try to demonstrate a different way and, hopefully, get it right.

    My question is, are all of these rows dupes of at least a 2 count or is there, hopefully, some rows with no dupes?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There are definitely rows without dupes. And in the full dataset that's the vast majority.  Any records that have a 1 rownum, (which are not followed by a 2 or 3 rownum in the next row) are non-duplicates.  And you were also right before, where I used the Row_number to create the rownum column.  If you have a better way, I'm all ears!

    Thanks!

  • And, to be sure, you said the following columns are what you used to identify dupes?

    ,ldate

    ,RIN#

    ,Provider_Name

    ,License#

    ,pick_up

    ,drop_off

    ,Modifier

    ,CALCULATED_FARE

    If that's true, then there's trouble in paradise because there is no RIN# or License# column in the new test data you've provided.

    So which columns are you actually using to determine if dupes exist?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply