August 6, 2020 at 4:27 pm
Hello,
I have a table of data in which I've identified some duplicate records. I need to pull all of the duplicate records for someone to review. My problem is that I'm only able to pull one of the duplicates and not both. The sample below just includes those values which are actually matching (duplicates) between the rows but there are many other columns that the user will need to review between the duplicates to ensure things on their end. This is why I need to include BOTH of the matching rows.
Below is the sample data:
CREATE TABLE #duplicatesample(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Rownum] [bigint] NULL,
[ldate] [varchar](10) NULL,
[RIN#] [nvarchar](255) NULL,
[Provider_Name] [nvarchar](255) NULL,
[License#] [nvarchar](255) NULL,
[pick_up] [nvarchar](5) NULL,
[drop_off] [nvarchar](5) NULL,
[Modifier] [nvarchar](519) NULL,
[CALCULATED_FARE] [float] NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT #duplicatesample ON
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (1, 1, N'19760317', N'015953094', N'ElmedTransportation', N'808547AM', N'22:34', N'23:33', N'A0428', 340.01)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (2, 1, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (3, 2, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (4, 1, N'20190503', N'023245509', N'RACE AMB INC.', N'794931', N'02:56', N'03:16', N'A0428', 292.44)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (5, 1, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (6, 2, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (7, 1, N'20190702', N'027693860', N'MED E', N'895583', N'21:02', N'22:00', N'A0428, A0422', 208.08)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (8, 1, N'20190707', N'332066786', N'MED E', N'895550', N'09:43', N'10:38', N'A0428', 193.03)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (9, 1, N'20190717', N'969316942', N'SUPAMB SERV', N'7-214-69', N'13:30', N'14:49', N'A0428', 223.13)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (10, 1, N'20190720', N'165815440', N'ILMEDCAR INC', N'18302 PT', N'17:25', N'18:12', N'A0425, A0130', 16.509999999999998)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (11, 1, N'20190723', N'176854016', N'ILMEDCAR INC', N'8364 MC', N'15:50', N'16:03', N'A0425, A0130', 21.66)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (12, 1, N'20190724', N'107238503', N'MED E', N'8789MC', N'15:03', N'15:47', N'A0120', 15.22)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (13, 1, N'20190724', N'317386456', N'MED E', N'895568', N'10:54', N'11:30', N'A0428', 148.23)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (14, 1, N'20190731', N'112529763', N'MED E', N'895552', N'23:53', N'00:48', N'A0428', 176.23)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (15, 1, N'20190731', N'028625614', N'MED E', N'895582', N'16:09', N'17:13', N'A0426', 250.31)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (16, 1, N'20190801', N'099906182', N'MED E', N'895569', N'15:00', N'15:49', N'A0428', 170.63)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (17, 1, N'20190801', N'177446952', N'MED E', N'895577', N'13:06', N'13:36', N'A0428, A0422', 202.48000000000002)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (18, 1, N'20190801', N'177446952', N'MED E', N'895577', N'14:09', N'15:07', N'A0428, A0422', 202.48000000000002)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (19, 1, N'20190802', N'222767865', N'MED E', N'895582', N'07:35', N'08:35', N'A0428', 176.23)
GO
INSERT #duplicatesample ([ID], [Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES (20, 1, N'20190805', N'195079058', N'MED E', N'895560', N'12:02', N'12:47', N'A0428', 159.43)
GO
SET IDENTITY_INSERT #duplicatesample OFF
GO
I can identify the duplicates by the RowNum column; if it's > 1 then it's a duplicate of the previous row. (There are some cases where there are more than 1 duplicate so the rownum will be 3). What I need is to be able to pull BOTH rows that are duplicates of eachother. I feel like I should be using the LAG function in here for previous row, but not sure how to apply it with a condition (where rownum = 2). Something like:
SELECT ID
,Rownum
,CASE WHEN ROWNUM = 2 THEN (SELECT LAG(ID) OVER (ORDER BY ID)) ELSE 0 END PREVROW
,ldate
,RIN#
,Provider_Name
,License#
,pick_up
,drop_off
,Modifier
,CALCULATED_FARE
FROM #temp3
ORDER BY ID
This doesn't work for what I need, but it's kind of the explanation of what I'm looking for. How I think this could be done is If the rowum = 2, put an X (or something) in a column for the previous row. Then I could just do a select from the final result where rownum > 1 OR column = X to pull both rows that are duplicates of each other.
Anyone have any ideas how to do this?
Much appreciated.
August 6, 2020 at 5:27 pm
Do you mean something like this?
SELECT *
FROM duplicateSample ds2
WHERE ds2.RIN IN (SELECT RIN
FROM duplicateSample ds1
WHERE rowNum = 2 );
August 6, 2020 at 5:57 pm
Not quite. The RIN# is not a unique number here, that number will exist multiple times on a day and provider so this will not return the expected results across the entire dataset.
It's really just about the rownum = 2 and capturing the row previous to that one, not matter what else exists.
August 6, 2020 at 6:01 pm
Which columns are used to determine if a row is a duplicate or not? I'm not talking about after you use ROW_NUMBER() to mark them... I'm talking about what did you used in the PARTITION BY of the ROW_NUMBER() function to "count" the dupes? The reason I ask, of course, is that there's an easy way to do this and I'd like to demo it against the test data that you've provided.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2020 at 6:06 pm
Actually, I used all of the values listed in the table as the values which identify duplicates (those are what the Row_Number used to create the RowNum column). Does that help? Should I post a new table with more values in addition to these, to help with what you're thinking?
August 6, 2020 at 6:14 pm
Does this do what you want?
select *
from #duplicatesample a
left join #duplicatesample b
on b.Rownum=2
and b.ldate=a.ldate
and b.ID <> a.id
where a.Rownum<> 2
It would be helpful if you could define or just provide a list of the output you are expecting.
August 6, 2020 at 6:38 pm
Actually, I used all of the values listed in the table as the values which identify duplicates (those are what the Row_Number used to create the RowNum column). Does that help? Should I post a new table with more values in addition to these, to help with what you're thinking?
So, basically, all of the columns you have in the #duplicatesample table you provided EXCEPT the ID and RowNum columns are what you used to identify dupes in whatever the original table was... Is that correct?
And, if so, yes... it would be helpful if you could post more of the original table OR I can just add some other columns to this one. What I'd be looking for is not only rows that have dupes but those that do not so that I can demonstrate the solution I'm thinking of.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2020 at 6:38 pm
Using EXISTS:
Select *
From #duplicatesample d
Where Exists (Select *
From #duplicatesample d2
Where d2.Rownum > 1
And d2.ldate = d.ldate
And d2.[RIN#] = d.[RIN#]);
Assuming the columns ldate and RIN# are the key columns. That will depend on how you identified the Rownum column.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
August 6, 2020 at 7:04 pm
Here's code that relies on the sequential ordering of the ID column of the underlying data set. You could generate ID using a SEQUENCE if nothing else to insure it is sequential in order of data.
I also added dups 3 and 4 to one sample just to test that part of it.
TRUNCATE TABLE #duplicatesample;
INSERT #duplicatesample ([Rownum], [ldate], [RIN#], [Provider_Name], [License#], [pick_up], [drop_off], [Modifier], [CALCULATED_FARE]) VALUES
(1, N'19760317', N'015953094', N'ElmedTransportation', N'808547AM', N'22:34', N'23:33', N'A0428', 340.01),
(1, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43),
(2, N'20171124', N'134745595', N'ElmedTransportation', N'8511MC', N'13:01', N'13:23', N'A0425, A0130', 43),
(1, N'20190503', N'023245509', N'RACE AMB INC.', N'794931', N'02:56', N'03:16', N'A0428', 292.44),
(1, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97),
(2, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97),
(3, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97), <<--<<
(4, N'20190621', N'095609756', N'ILMEDCAR INC', N'19533PT', N'15:45', N'19:44', N'A0425, A0130', 100.97), <<--<<
(1, N'20190702', N'027693860', N'MED E', N'895583', N'21:02', N'22:00', N'A0428, A0422', 208.08),
(1, N'20190707', N'332066786', N'MED E', N'895550', N'09:43', N'10:38', N'A0428', 193.03),
(1, N'20190717', N'969316942', N'SUPAMB SERV', N'7-214-69', N'13:30', N'14:49', N'A0428', 223.13),
(1, N'20190720', N'165815440', N'ILMEDCAR INC', N'18302 PT', N'17:25', N'18:12', N'A0425, A0130', 16.509999999999998),
(1, N'20190723', N'176854016', N'ILMEDCAR INC', N'8364 MC', N'15:50', N'16:03', N'A0425, A0130', 21.66),
(1, N'20190724', N'107238503', N'MED E', N'8789MC', N'15:03', N'15:47', N'A0120', 15.22),
(1, N'20190724', N'317386456', N'MED E', N'895568', N'10:54', N'11:30', N'A0428', 148.23),
(1, N'20190731', N'112529763', N'MED E', N'895552', N'23:53', N'00:48', N'A0428', 176.23),
(1, N'20190731', N'028625614', N'MED E', N'895582', N'16:09', N'17:13', N'A0426', 250.31),
(1, N'20190801', N'099906182', N'MED E', N'895569', N'15:00', N'15:49', N'A0428', 170.63),
(1, N'20190801', N'177446952', N'MED E', N'895577', N'13:06', N'13:36', N'A0428, A0422', 202.48000000000002),
(1, N'20190801', N'177446952', N'MED E', N'895577', N'14:09', N'15:07', N'A0428, A0422', 202.48000000000002),
(1, N'20190802', N'222767865', N'MED E', N'895582', N'07:35', N'08:35', N'A0428', 176.23),
(1, N'20190805', N'195079058', N'MED E', N'895560', N'12:02', N'12:47', N'A0428', 159.43);
SELECT ca1.*
FROM #duplicatesample ds
CROSS APPLY (
SELECT
ds.ID, ds.Rownum,
ds.ldate, ds.RIN#,
ds.Provider_Name, ds.License#,
ds.pick_up, ds.drop_off,
ds.Modifier, ds.CALCULATED_FARE /*, ds....*/
WHERE ds.Rownum = 2
UNION ALL
SELECT
ds1.ID, ds1.Rownum,
ds1.ldate, ds1.RIN#,
ds1.Provider_Name, ds1.License#,
ds1.pick_up, ds1.drop_off,
ds1.Modifier, ds1.CALCULATED_FARE /*, ds1....*/
FROM #duplicatesample ds1
WHERE ds.Rownum = 2 AND
ds1.ID = ds.ID - 1
UNION ALL
SELECT
ds3.ID, ds3.Rownum,
ds3.ldate, ds3.RIN#,
ds3.Provider_Name, ds3.License#,
ds3.pick_up, ds3.drop_off,
ds3.Modifier, ds3.CALCULATED_FARE /*, ds3....*/
FROM #duplicatesample ds3
WHERE ds3.Rownum > 2 AND ds3.ID = ds.ID + ds3.Rownum - 2
) AS ca1
WHERE ds.Rownum = 2
ORDER BY ID
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
August 7, 2020 at 12:04 am
Scott, I'm looking through the query, running against my full dataset and it looks pretty close. I'll need to add more columns to my final dataset as there are a number of other columns that users will need to review, but I think this will work. It missed about 4 records which had triple matches (but you added that scenario in your example and it worked so not sure why it didn't pick up those in mine). It's close enough for me to be able to move forward, which is what I really needed.
This will save hours and hours of work for staff to go through and validate data. Thank you so much!
August 7, 2020 at 12:08 am
It missed about 4 records which had triple matches...
This is what I'm trying to get at. If you used the old standby of using ROW_NUMBER() to find dupes and then are trying to use the result of ROW_NUMBER() to list the full rows of the dupes, you're making it a bit difficult on yourself. I think I can help but I need the couple of answers to my short question on my previous post.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 7, 2020 at 1:00 am
Hi Jeff,
Below is a slightly bigger example dataset, 30 rows, more columns.
CREATE TABLE #duplicatetrips2(
[ID] [int] IDENTITY(1,1) NOT NULL,
[RowNum] [bigint] NULL,
[TRIP_DATE] [varchar](10) NULL,
[LDATE] [varchar](10) NULL,
[CL_ID] [nvarchar](255) NULL,
[PROVIDER_NAME] [nvarchar](255) NULL,
[VEH] [nvarchar](4) NULL,
[EMP_NAME] [varchar](1) NOT NULL,
[BookingID] [nvarchar](50) NULL,
[PICK_UP] [nvarchar](5) NULL,
[DROP_OFF] [nvarchar](5) NULL,
[PADDRESS] [nvarchar](9) NULL,
[DADDRESS] [nvarchar](9) NULL,
[PM] [nvarchar](255) NULL,
[DM] [nvarchar](255) NULL,
[MODIFIER] [nvarchar](519) NULL,
[BASE_RATE] [float] NULL,
[DIST] [float] NULL,
[RATE_PER_MILE] [float] NULL,
[COST_OF_MILEAGE] [float] NULL,
[CALCULATED_FARE] [float] NULL,
[VARIANCE] [float] NULL,
[type] [varchar](20) NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT #duplicatetrips2 ON
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5901, 1, N'01/15/2020', N'20200115', N'143136380', N'LifeLine Ambulance LLC', N'8430', N'', N'3954-20', N'11:15', N'11:19', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5902, 1, N'01/15/2020', N'20200115', N'212636062', N'LifeLine Ambulance LLC', N'8430', N'', N'5238-20', N'09:50', N'10:45', N'645 S CE', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 1, 1.03, 1.03, 15.479999999999999, 15.479999999999999, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5903, 1, N'01/15/2020', N'20200115', N'326959210', N'LifeLine Ambulance LLC', N'8431', N'', N'5466-20', N'14:36', N'15:27', N'850 W IRV', N'2451 W TO', N'P', N'R', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5904, 1, N'01/15/2020', N'20200115', N'080889777', N'LifeLine Ambulance LLC', N'8434', N'', N'5143-20', N'14:16', N'15:35', N'7531 S ST', N'2425 EAST', N'D', N'R', N'A0425, A0130', 14.45, 2, 0.515, 1.03, 16.509999999999998, 16.509999999999998, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5905, 1, N'01/15/2020', N'20200115', N'029404324', N'LifeLine Ambulance LLC', N'8436', N'', N'5226-20', N'10:57', N'12:28', N'1969 W OG', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5906, 1, N'01/15/2020', N'20200115', N'210811857', N'LifeLine Ambulance LLC', N'8436', N'', N'5218-20', N'08:51', N'10:52', N'1969 W OG', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5907, 1, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'240-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5908, 2, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'241-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5909, 3, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8437', N'', N'241-20', N'05:45', N'06:02', N'5130 W JA', N'1725 W HA', N'R', N'P', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5910, 1, N'01/15/2020', N'20200115', N'029404324', N'LifeLine Ambulance LLC', N'8438', N'', N'5225-20', N'06:30', N'08:30', N'4437 S C', N'1969 W OG', N'R', N'P', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5911, 1, N'01/15/2020', N'20200115', N'127312668', N'LifeLine Ambulance LLC', N'8769', N'', N'5284-20', N'08:06', N'09:07', N'4800 W CH', N'4437 S C', N'P', N'R', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5912, 1, N'01/15/2020', N'20200115', N'210811857', N'LifeLine Ambulance LLC', N'8769', N'', N'5217-20', N'06:00', N'07:24', N'4437 S C', N'1969 W OG', N'R', N'P', N'A0425, A0130', 14.45, 7, 0.14714285714285716, 1.03, 21.66, 21.66, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5913, 1, N'01/15/2020', N'20200115', N'212636062', N'LifeLine Ambulance LLC', N'8769', N'', N'5237-20', N'08:15', N'08:14', N'5130 W JA', N'645 S CE', N'R', N'P', N'A0425, A0130', 14.45, 1, 1.03, 1.03, 15.479999999999999, 15.479999999999999, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5914, 1, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'242-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5915, 2, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'241-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5916, 3, N'01/15/2020', N'20200115', N'182779694', N'LifeLine Ambulance LLC', N'8933', N'', N'242-20', N'08:16', N'09:12', N'1725 W HA', N'5130 W JA', N'P', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5917, 1, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5148-20', N'07:17', N'07:30', N'6141 N PU', N'1029 W HO', N'R', N'D', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5918, 2, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5148-20', N'07:17', N'07:30', N'6141 N PU', N'1029 W HO', N'R', N'D', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5919, 1, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5149-20', N'07:41', N'07:58', N'1029 W HO', N'6141 N PU', N'D', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5920, 2, N'01/15/2020', N'20200115', N'070390075', N'LifeLine Ambulance LLC', N'9052', N'', N'5149-20', N'07:41', N'07:58', N'1029 W HO', N'6141 N PU', N'D', N'R', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5921, 1, N'01/15/2020', N'20200115', N'326959210', N'LifeLine Ambulance LLC', N'9052', N'', N'5005-20', N'12:00', N'13:01', N'2451 W TO', N'850 W IRV', N'R', N'P', N'A0425, A0130', 14.45, 6, 0.17166666666666666, 1.03, 20.63, 20.63, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5922, 1, N'01/15/2020', N'20200115', N'112628763', N'MEDEX', N'8468', N'', N'20-02444', N'16:49', N'17:57', N'5700 S MA', N'2649 E 75', N'H', N'R', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5923, 2, N'01/15/2020', N'20200115', N'112628763', N'MEDEX', N'8468', N'', N'20-02444', N'16:49', N'17:57', N'5700 S MA', N'2649 E 75', N'H', N'R', N'A0425, A0130', 14.45, 5, 0.20600000000000002, 1.03, 19.6, 19.6, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5924, 1, N'01/15/2020', N'20200115', N'213439797', N'MEDEX', N'8469', N'', N'20-02338', N'11:27', N'12:25', N'5700 S MA', N'2525 S MI', N'H', N'H', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5925, 2, N'01/15/2020', N'20200115', N'213439797', N'MEDEX', N'8469', N'', N'20-02338', N'11:27', N'12:25', N'5700 S MA', N'2525 S MI', N'H', N'H', N'A0425, A0130', 14.45, 4, 0.2575, 1.03, 18.57, 18.57, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5926, 1, N'01/15/2020', N'20200115', N'100380807', N'MEDEX', N'8788', N'', N'20-02358', N'10:52', N'12:32', N'2544 W MO', N'912 E 133', N'H', N'R', N'A0120', 8.01, 22, 0.046818181818181821, 1.03, 30.67, 30.67, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5927, 2, N'01/15/2020', N'20200115', N'100380807', N'MEDEX', N'8788', N'', N'20-02358', N'10:52', N'12:32', N'2544 W MO', N'912 E 133', N'H', N'R', N'A0120', 8.01, 22, 0.046818181818181821, 1.03, 30.67, 30.67, N'Master Regular A0120')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5928, 1, N'01/15/2020', N'20200115', N'163225998', N'MEDEX', N'8955', N'', N'20-02460', N'17:43', N'18:46', N'1740 W TA', N'11045 S W', N'H', N'R', N'A0428', 208.5, 13, 0.43076923076923074, 5.6, 281.3, 281.3, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5929, 1, N'01/15/2020', N'20200115', N'019278001', N'MEDEX', N'8955', N'', N'20-02299', N'03:30', N'04:01', N'518 N AUS', N'8311 ROOS', N'H', N'H', N'A0428', 208.5, 6, 0.93333333333333324, 5.6, 242.1, 242.1, N'Master Regular')
GO
INSERT #duplicatetrips2 ([ID], [RowNum], [TRIP_DATE], [LDATE], [CL_ID], [PROVIDER_NAME], [VEH], [EMP_NAME], [BookingID], [PICK_UP], [DROP_OFF], [PADDRESS], [DADDRESS], [PM], [DM], [MODIFIER], [BASE_RATE], [DIST], [RATE_PER_MILE], [COST_OF_MILEAGE], [CALCULATED_FARE], [VARIANCE], [type]) VALUES (5930, 1, N'01/15/2020', N'20200115', N'093216711', N'MEDEX', N'8955', N'', N'20-02493', N'22:51', N'23:38', N'518 N AUS', N'836 W WEL', N'H', N'H', N'A0428', 208.5, 14, 0.39999999999999997, 5.6, 286.9, 286.9, N'Master Regular')
GO
SET IDENTITY_INSERT #duplicatetrips2 OFF
GO
but I ran the code Scott provided against this dataset and it still works, pulling up two sets of 3 dups. I'm stumped.
August 7, 2020 at 1:27 am
but I ran the code Scott provided against this dataset and it still works, pulling up two sets of 3 dups. I'm stumped.
Oh no... I'm NOT saying that Scotts code didn't work (although I've not actually tried it). I'm just going to try to demonstrate a different way and, hopefully, get it right.
My question is, are all of these rows dupes of at least a 2 count or is there, hopefully, some rows with no dupes?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 7, 2020 at 1:35 am
There are definitely rows without dupes. And in the full dataset that's the vast majority. Any records that have a 1 rownum, (which are not followed by a 2 or 3 rownum in the next row) are non-duplicates. And you were also right before, where I used the Row_number to create the rownum column. If you have a better way, I'm all ears!
Thanks!
August 7, 2020 at 1:45 am
And, to be sure, you said the following columns are what you used to identify dupes?
,ldate
,RIN#
,Provider_Name
,License#
,pick_up
,drop_off
,Modifier
,CALCULATED_FARE
If that's true, then there's trouble in paradise because there is no RIN# or License# column in the new test data you've provided.
So which columns are you actually using to determine if dupes exist?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply