Finding "sort of" duplicate records - multiple column aggregate??.. HELP

  • I have a table full of customer transactions. I need to dump "possible" duplicates into a different table. The fields in my table are:

    id, date, time, location, cardnumber, amount, fee, cardtype.

    I need to put transactions with the same location / card number / amount into this table. It does not matter if they happened 12 hours apart, my users need to see this information.

    This process is done during a daily import, when we are really only dealing with 1 days transactions.

    I have given this some thought from multiple directions and I am not sure that I have been able to see it all the way through.

    Any guidance would be appreciated.

    Thanks

    sb

  • I believe i understand what you typed, but I'm not sure if you are asking a question here.

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • Sorry about that - my brain is in a cloud trying to figure this one out --

    How would I write this query? I have thought of using multiple aggregates on the columns but I don't think that will show me the correct information. I have also thought of building an alias column comprised of a string of all 3 column values - but that may involve some conversions of numbers to strings and that doesn't seem like it would be the most straight forward way to accomplish this.

    So my question is - how would you write this query?!?

    thanks again

    sb

  • something with row number will probably get you pretty close to what you are after, so you can still visualize the data:

    anything that is a duplicate (based on what you said: location,cardnumber,amount will have a value where RW > 1:

    are you sure amount is supposed to be int eh grouping for duplicates logic?

    SELECT ROW_NUMBER() OVER (PARTITION BY location,cardnumber,amount ORDER BY location,cardnumber,amount) AS RW,

    id,

    date,

    time,

    location,

    cardnumber,

    amount,

    fee,

    cardtype

    FROM TheSourceTable

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for the reply.

    I will give this a shot in my development environment and see how it works out.

    As far as duplicate criteria - yes amount should be included. Basically - we get a file on a daily basis that contains transactional information from our transaction processor. Because of bugs in the software, we see instances where customers are charged more than 1 time for a transaction (it shows up 1x in our software, but the network actually charged them 3x)?!?!

    This is what we are trying to view so that we can properly reconcile.

    Thanks again - I will post back if this accomplishes the goal.

    sb

  • There are several possibilities, but without sample data, it's difficult to determine which is best. Here are just two options.

    WITH DupeCheck AS (

    SELECT id, date, time, location, cardnumber, amount, fee, cardtype

    , Count(*) OVER( PARTITION BY Location, CardNumber, Amount) - 1 AS DupeCount

    FROM Transactions AS t

    )

    SELECT *

    FROM DupeCheck

    WHERE DupeCount > 0

    SELECT id, date, time, location, cardnumber, amount, fee, cardtype

    FROM Transactions AS t

    WHERE EXISTS (

    SELECT *

    FROM Transactions AS t2

    WHERE t.Location = t2.Location

    AND t.CardNumber = t2.CardNumber

    AND t.Amount = t2.Amount

    AND t.[ID] <> t2.[ID]

    )

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply