Finding transposed data

  • How would I go about finding certain records with transposed data (typeos):

    Recordset

    Custid Id

    500 1234

    500 1234

    600 1233

    600 1232

    700 1000

    700 1000

    800 2001

    800 2002

    900 1500

    900 1500

    In this case I would want the custids of 600 and 800 returned because these maybe typos

    Make sense?

    Warm Regards,

    Arthur Lorenzini
    Sioux Falls, SD

  • Arthur, try this

    CREATE TABLE #test

    (

    CustID int,

    id int

    )

    INSERT INTO #test

    SELECT 500,1234 UNION ALL

    SELECT 500,1234 UNION ALL

    SELECT 600,1233 UNION ALL

    SELECT 600,1232 UNION ALL

    SELECT 700,1000 UNION ALL

    SELECT 700,1000 UNION ALL

    SELECT 800,2001 UNION ALL

    SELECT 800,2002 UNION ALL

    SELECT 900,1500 UNION ALL

    SELECT 900,1500

    SELECT

    t1.CustID,

    t1.ID

    FROM #test t1,

    #test t2

    WHERE t1.CustID = t2.CustID

    AND t1.id <> t2.id

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • That's not quite what I am after but close. The problem with this is if I have the following record:

    Custid id

    1000 5000

    1000 6500

    This will get picked up by the query but this is probably not a typo so I wouldn't not want it returned. I just need the records that may have been (fat fingered if you will), and are maybe just one number off. (Eg. 2002, 2001)

    Warm Regards,

    Arthur Lorenzini
    Sioux Falls, SD

  • While I really think that you need to clarify your definition a little better Arthur, here is a hack at it, using a modification of Greg's code:

    SELECT

    t1.CustID,

    t1.ID

    FROM #test t1,

    #test t2

    WHERE t1.CustID = t2.CustID

    AND (t1.id - t2.id) IN (1,-1,10,-10,100,-100,1000,-1000)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I think that will work except for a small issue with a conversion. In my actual table, the ID column is a CHAR(9)

    it actually holds SSN without the dashes. I get the following error:

    Msg 403, Level 16, State 1, Line 1

    Invalid operator for data type. Operator equals subtract, type equals char.

    Warm Regards,

    Arthur Lorenzini
    Sioux Falls, SD

  • Barry, what is the significance of the numbers in the IN group? If the ids are random, then couldn't the possibilies be endless? In light of the fact that we now know they are SSN's, I am still curious.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • Arthur.Lorenzini (9/17/2008)


    I think that will work except for a small issue with a conversion.

    Arthur, I think there is another issue. If you have a customer id, say 600, showing up with two different id's, which we now know are SSNs, then you need to be able to determine which id(SSN) is the correct one. And for the one that is determined to be incorrect for the customer id in question, what is the corrrect corresponding customer id. Just a thought.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • This is what the code is turn out to be:

    SELECT

    t1.PrimaryCustID,

    t1.PrimarySSN

    FROM CBDTransposeData t1,

    CBDTransposeData t2

    WHERE t1.PrimaryCustID = t2.PrimaryCustID

    AND (CONVERT(INT,t1.PrimarySSN) - CONVERT(INT,t2.PrimarySSN))

    IN (1,-1,10,-10,100,-100,1000,-1000,10000,-10000,100000,-100000,1000000,-1000000,10000000,-1000000,10000000,-10000000,100000000,-100000000)

    The second SSN could be a co-customer (joint applications) put we go from the thought the first entry is the Customer SSN ( this comes from the customer table).

    Warm Regards,

    Arthur Lorenzini
    Sioux Falls, SD

  • What is someone fat fingers the first digit instead of the last, will it still work?

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

  • yes , I just tested it and it works.

    Warm Regards,

    Arthur Lorenzini
    Sioux Falls, SD

  • Greg Snidow (9/17/2008)


    Barry, what is the significance of the numbers in the IN group? If the ids are random, then couldn't the possibilies be endless? In light of the fact that we now know they are SSN's, I am still curious.

    The significance is this spec from the OP:

    I just need the records that may have been (fat fingered if you will), and are maybe just one number off. (Eg. 2002, 2001)

    Mind you, I don't really trust that analysis either, which is why I said that he needs to think this through a little bit more.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Arthur.Lorenzini (9/17/2008)


    I think that will work except for a small issue with a conversion. In my actual table, the ID column is a CHAR(9)

    it actually holds SSN without the dashes. I get the following error:

    Msg 403, Level 16, State 1, Line 1

    Invalid operator for data type. Operator equals subtract, type equals char.

    Cast them as INT's:

    AND (Cast(t1.id as int) - Cast(t2.id as int)) IN (1,-1,10,-10,100,-100,1000,-1000)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • ... CONVERT() works as well. 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (9/17/2008)


    Greg Snidow (9/17/2008)


    Barry, what is the significance of the numbers in the IN group? If the ids are random, then couldn't the possibilies be endless? In light of the fact that we now know they are SSN's, I am still curious.

    The significance is this spec from the OP:

    I just need the records that may have been (fat fingered if you will), and are maybe just one number off. (Eg. 2002, 2001)

    I may not have explained this as well as I could have... This list (1,-1,10,-10,...) works because if you take any whole positive number and change just one digit by only +1 or -1, then the resulting difference between the correct value and the modified value is:

    (+1/-1)*10^n where "n" is what digit-place was changed (counting from the right, starting at 0).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yep, I finally got it. I was thinking that it would only work if the last digit were off by one, but I see now it will work regardless of which place it is. As long as it is only off by one.

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply