December 9, 2011 at 10:41 am
I have a table full of customer transactions. I need to dump "possible" duplicates into a different table. The fields in my table are:
id, date, time, location, cardnumber, amount, fee, cardtype.
I need to put transactions with the same location / card number / amount into this table. It does not matter if they happened 12 hours apart, my users need to see this information.
This process is done during a daily import, when we are really only dealing with 1 days transactions.
I have given this some thought from multiple directions and I am not sure that I have been able to see it all the way through.
Any guidance would be appreciated.
Thanks
sb
December 9, 2011 at 10:50 am
I believe i understand what you typed, but I'm not sure if you are asking a question here.
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
December 9, 2011 at 10:57 am
Sorry about that - my brain is in a cloud trying to figure this one out --
How would I write this query? I have thought of using multiple aggregates on the columns but I don't think that will show me the correct information. I have also thought of building an alias column comprised of a string of all 3 column values - but that may involve some conversions of numbers to strings and that doesn't seem like it would be the most straight forward way to accomplish this.
So my question is - how would you write this query?!?
thanks again
sb
December 9, 2011 at 11:13 am
something with row number will probably get you pretty close to what you are after, so you can still visualize the data:
anything that is a duplicate (based on what you said: location,cardnumber,amount will have a value where RW > 1:
are you sure amount is supposed to be int eh grouping for duplicates logic?
SELECT ROW_NUMBER() OVER (PARTITION BY location,cardnumber,amount ORDER BY location,cardnumber,amount) AS RW,
id,
date,
time,
location,
cardnumber,
amount,
fee,
cardtype
FROM TheSourceTable
Lowell
December 9, 2011 at 11:18 am
Thanks for the reply.
I will give this a shot in my development environment and see how it works out.
As far as duplicate criteria - yes amount should be included. Basically - we get a file on a daily basis that contains transactional information from our transaction processor. Because of bugs in the software, we see instances where customers are charged more than 1 time for a transaction (it shows up 1x in our software, but the network actually charged them 3x)?!?!
This is what we are trying to view so that we can properly reconcile.
Thanks again - I will post back if this accomplishes the goal.
sb
December 9, 2011 at 11:22 am
There are several possibilities, but without sample data, it's difficult to determine which is best. Here are just two options.
WITH DupeCheck AS (
SELECT id, date, time, location, cardnumber, amount, fee, cardtype
, Count(*) OVER( PARTITION BY Location, CardNumber, Amount) - 1 AS DupeCount
FROM Transactions AS t
)
SELECT *
FROM DupeCheck
WHERE DupeCount > 0
SELECT id, date, time, location, cardnumber, amount, fee, cardtype
FROM Transactions AS t
WHERE EXISTS (
SELECT *
FROM Transactions AS t2
WHERE t.Location = t2.Location
AND t.CardNumber = t2.CardNumber
AND t.Amount = t2.Amount
AND t.[ID] <> t2.[ID]
)
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply