Finding "sort of" duplicate records - multiple column aggregate??.. HELP

Question

Finding "sort of" duplicate records - multiple column aggregate??.. HELP

stephenmbell

SSC Eights!

Points: 926
More actions
December 9, 2011 at 10:41 am

#249401

I have a table full of customer transactions. I need to dump "possible" duplicates into a different table. The fields in my table are:
id, date, time, location, cardnumber, amount, fee, cardtype.
I need to put transactions with the same location / card number / amount into this table. It does not matter if they happened 12 hours apart, my users need to see this information.
This process is done during a daily import, when we are really only dealing with 1 days transactions.
I have given this some thought from multiple directions and I am not sure that I have been able to see it all the way through.
Any guidance would be appreciated.
Thanks
sb

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

ChazMan SSCrazy Points: 2259 More actions · Answer 1

I believe i understand what you typed, but I'm not sure if you are asking a question here.

--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.

stephenmbell SSC Eights! Points: 926 More actions · Answer 2

Sorry about that - my brain is in a cloud trying to figure this one out --

How would I write this query? I have thought of using multiple aggregates on the columns but I don't think that will show me the correct information. I have also thought of building an alias column comprised of a string of all 3 column values - but that may involve some conversions of numbers to strings and that doesn't seem like it would be the most straight forward way to accomplish this.

So my question is - how would you write this query?!?

thanks again

sb

Lowell SSC Guru Points: 323518 More actions · Answer 3

something with row number will probably get you pretty close to what you are after, so you can still visualize the data:

anything that is a duplicate (based on what you said: location,cardnumber,amount will have a value where RW > 1:

are you sure amount is supposed to be int eh grouping for duplicates logic?

SELECT ROW_NUMBER() OVER (PARTITION BY location,cardnumber,amount ORDER BY location,cardnumber,amount) AS RW,

id,

date,

time,

location,

cardnumber,

amount,

fee,

cardtype

FROM TheSourceTable

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

stephenmbell SSC Eights! Points: 926 More actions · Answer 4

Thanks for the reply.

I will give this a shot in my development environment and see how it works out.

As far as duplicate criteria - yes amount should be included. Basically - we get a file on a daily basis that contains transactional information from our transaction processor. Because of bugs in the software, we see instances where customers are charged more than 1 time for a transaction (it shows up 1x in our software, but the network actually charged them 3x)?!?!

This is what we are trying to view so that we can properly reconcile.

Thanks again - I will post back if this accomplishes the goal.

sb

drew.allen SSC Guru Points: 76996 More actions · Answer 5

There are several possibilities, but without sample data, it's difficult to determine which is best. Here are just two options.

WITH DupeCheck AS (

SELECT id, date, time, location, cardnumber, amount, fee, cardtype

, Count(*) OVER( PARTITION BY Location, CardNumber, Amount) - 1 AS DupeCount

FROM Transactions AS t

)

SELECT *

FROM DupeCheck

WHERE DupeCount > 0

SELECT id, date, time, location, cardnumber, amount, fee, cardtype

FROM Transactions AS t

WHERE EXISTS (

SELECT *

FROM Transactions AS t2

WHERE t.Location = t2.Location

AND t.CardNumber = t2.CardNumber

AND t.Amount = t2.Amount

AND t.[ID] <> t2.[ID]

)

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA