Parse CC in String

  • Hi Guys,

    I have a table with a column which is VARCHAR. This column has text and potentially Credit Card # and also has couple of dates. What I need to do is find all the records that may potentially have CC# in this column... I was thinking of using something like

    WHERE patindex('%[0-9][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]%',Memo) > 0

    patindex('%[0-9][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]%',Memo) > 0

    Is there a better way to look for numbers in the string that may potentially be CC#. It is fine if some of them are not CC#. Maybe someone has already worked on a function like this where you find CC# from a string... Any help is appreciated. I am not looking for CLR function though as I have to do this using Query Analyzer.

    Thanks,

    Laura

  • Didn't you post something just like this a day or two ago? I know I saw some other thread trying to do this exact same thing. I can't however find it now. :hehe:

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Laura i think it would be easier to delete dashes, and maybe also spaces from the string, and then look for 16 [0-9] digits in a row in the remaining string. (or 15 amex number?);

    i saw your previous post, but now that it has percolated a bit, i'm thinking you need to manip the comment bit first to make the search easier

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I saw something similar also, but I'm not sure it was from this OP. I can't find the post either.

  • That seems like a reasonable approach. Using REPLICATE() will make it easier to see how many numbers are in the pattern and/or adjust it later:

    WHERE PATINDEX('%[0-9]' + REPLICATE('[0-9 -]', 12) + '%', Memo) ...

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Laura_SqlNovice (3/29/2013)


    Hi Guys,

    I have a table with a column which is VARCHAR. This column has text and potentially Credit Card # and also has couple of dates. What I need to do is find all the records that may potentially have CC# in this column... I was thinking of using something like

    WHERE patindex('%[0-9][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]%',Memo) > 0

    Is there a better way to look for numbers in the string that may potentially be CC#. It is fine if some of them are not CC#. Maybe someone has already worked on a function like this where you find CC# from a string... Any help is appreciated. I am not looking for CLR function though as I have to do this using Query Analyzer.

    Thanks,

    Laura

    Do you actually want to extract the numbers (even if more than one in a row) or are you just trying to isolate the rows using the WHERE clause?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i'm afraid this task isn't as easy as it may sound. The difficulty comes from the complexity of credit card number formulation. Now let's assume we just want to find the most common cards AMEX, MasterCard, Visa, Discover, Carte Blanche/Diners Club and perhaps a few others. Even among just these major carriers the rules are different. For example, AMEX numbers are 15 chars, MasterCard 16, Visa 13 or 16, Discover 16, and CBDC is 14.

    Then there is the Issuer Identification Number (IIN) which varies from one digit (Visa) to as many as 6 digits (Discover). These IINs have changed over the years so there are many different ranges of digits for just about every type of card. These ranges can be looked up and put into a table to be used for CC validation.

    And there's more...the last number of every card number is a checksum digit calculated by the Luhn Algorithm which is public domain. Virtually all cards use the Luhn Algorithm for the checksum.

    So...just finding sets of 15 or 16 digits won't tell you whether it's likely to be a credit card number at all, much less if it's might be a valid card. A number can be well-formed and have a proper checksum and still not be valid if it's been retired or never issues for example. So all we can do is weed out numbers that break the rules. The rest can only be confirmed by the card issuer and that's another can of worms.

    I've dug out some functions I had for validating card numbers and made a few adaptations to make them suitable for using as an example.

    Function 1 is just a version of DelimitedSplit8K that splits EVERY character. Jeff Moden deserves most of the credit.

    CREATE FUNCTION [dbo].[DelimitedSplit8KByChar]

    (@pString VARCHAR(8000))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    cteTally(N) AS (SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E2

    )

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY t.N),

    Item = SUBSTRING(@pString,t.N+1,1)

    FROM cteTally t

    WHERE NULLIF(SUBSTRING(@pString,t.N+1,1),'') IS NOT NULL

    GO

    Function 2 parses the CC number and compares it to a table of Issuer Identification Number ranges.

    THIS IS ONLY A SAMPLE AND MUST BE UPDATED WITH OFFICIAL ISSUER DATA BEFORE USING IN PRODUCTION!!

    CREATE FUNCTION [dbo].[itvfGetCCIIN]

    (

    @CCNum VARCHAR(50),

    @CCLen INT

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    WITH

    cteIINValue(ID,CCType,StartIIN,EndIIN,CCLen) AS (

    SELECT ID,CCType,StartIIN,EndIIN,CCLen FROM

    (VALUES

    (1,'American Express',34,34,15)

    ,(2,'American Express',37,37,15)

    ,(3,'Diners Club',300,305,14)

    ,(4,'Carte Blanche',300,305,14)

    ,(5,'enRoute',2014,2014,15)

    ,(6,'enRoute',2149,2149,15)

    ,(7,'MasterCard',51,55,16)

    ,(8,'Visa',4,4,13)

    ,(9,'Visa',4,4,16)

    ,(10,'Discover',6011,6011,16)

    ,(11,'Discover',622126,622925,16)

    ,(12,'Discover',644,649,16)

    ,(13,'Discover',65,65,16)

    ,(14,'JCB',3528,3589,16)

    ) AS Data (ID,CCType,StartIIN,EndIIN,CCLen)

    ),

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+6 or 100,000,000 rows max

    cteTally(N) AS (SELECT 0 UNION ALL

    SELECT TOP (SELECT ISNULL(MAX(EndIIN),10000) FROM cteIINValue)

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8

    )

    SELECT

    ID

    ,IINType

    ,CCIIN

    ,IINLen

    ,StartIIN

    ,EndIIN

    ,CCLen

    ,Prefix

    FROM

    (

    SELECT

    ROW_NUMBER() OVER (ORDER BY t.N) AS ID

    ,cte.ID AS IINID

    ,CCType AS IINType

    ,N AS CCIIN

    ,CCLen AS IINLen

    ,StartIIN

    ,EndIIN

    ,@CCLen AS CCLen

    ,(CASE

    WHEN ISNUMERIC(@CCNum) = 1

    AND CONVERT(BIGINT,LEFT(@CCNum,LEN(t.N))) BETWEEN StartIIN and EndIIN

    THEN CONVERT(BIGINT,LEFT(@CCNum,LEN(t.N)))

    ELSE 0

    END)

    AS Prefix

    FROM

    cteIINValue cte

    CROSS APPLY

    cteTally t

    WHERE

    t.N BETWEEN StartIIN and EndIIN

    ) r

    WHERE

    1=1

    AND r.CCIIN = r.Prefix

    AND r.Prefix > 0

    AND r.IINLen = r.CCLen

    GO

    Function 3 is the Luhn Algorithm for getting the checksum.

    CREATE FUNCTION [dbo].[tvfLuhnValidation]

    (

    @CCStr VARCHAR(100)

    )

    RETURNS

    @CheckSumValidation TABLE

    (

    ID INT IDENTITY(1,1) NOT NULL,

    CCNum VARCHAR(20) NULL,

    CkSumRemainder INT NULL,

    PRIMARY KEY (ID)

    )

    WITH SCHEMABINDING

    AS

    BEGIN

    DECLARE

    @CCNum BIGINT

    ,@CheckIIN BIT

    ,@AllDigits BIGINT

    ,@ReverseDigits BIGINT

    ,@CheckSum BIGINT

    ,@CheckSumRemainder INT

    SET @CCStr = REPLACE(REPLACE(@CCStr,' ',''),'-','')

    IF PATINDEX('%[^0-9]%',@CCStr) > 0

    BEGIN

    INSERT INTO @CheckSumValidation

    (CCNum,CkSumRemainder)

    SELECT 0,99

    END

    ELSE

    BEGIN

    SET @CCNum = CONVERT(BIGINT,@CCStr)

    SET @AllDigits = @CCNum

    SET @checksum = CAST(RIGHT(@AllDigits,1) AS BIGINT)

    SET @ReverseDigits = RIGHT(REVERSE(@AllDigits),LEN(@AllDigits)-1)

    /* Get the check digit using the Luhn Algorithm */

    ;WITH cteCheckSum

    AS

    (

    SELECT

    (SUM(Item)+@CheckSum)%10 AS CheckSumRemainder

    FROM

    (

    SELECT

    s2.ItemNumber

    ,CAST(s2.Item AS BIGINT) AS Item

    FROM

    dbo.DelimitedSplit8KByChar(@ReverseDigits) AS s1

    OUTER APPLY

    dbo.DelimitedSplit8KByChar(s1.Item*2) AS s2

    WHERE

    s1.ItemNumber%2 <> 0

    UNION ALL

    SELECT

    ItemNumber

    ,CAST(Item AS BIGINT) AS Item

    FROM

    dbo.DelimitedSplit8KByChar(@ReverseDigits)

    WHERE

    ItemNumber%2 = 0

    ) d

    )

    INSERT INTO @CheckSumValidation

    (CCNum,CkSumRemainder)

    SELECT

    @CCNum AS CCNum

    ,CheckSumRemainder

    FROM

    cteCheckSum cs

    END

    RETURN

    END

    GO

    Last but not least is some script to tie all these functions together. It's not the swiftest code because a lot is going on internally and I'm sure someone (as always!) will be able to offer improvements.

    WITH cteSampleData --replace this with your real data

    AS

    (

    SELECT * FROM

    (VALUES

    (1,'VISA','4012888888881881','10/14')

    ,(2,'MasterCard','5269924854210552','06/15')

    ,(3,'Voyager','869994992762272','08/14')

    ,(4,'VISA','4539390243132435','12/15')

    ,(5,'enRoute','214992938007085','09/13')

    ,(6,'VISA','4485983356242218','11/14')

    ,(7,'JCB','3088518677707770','01/14')

    ,(8,'VISA','4532254137583730','07/14')

    ,(9,'JCB','3560777438925512','12/15')

    ,(10,'Discover','6011618612311087','11/14')

    ,(11,'VISA','4417123456789113','07/15')

    ,(12,'Diners Club','3022329080952x','12/13')

    ) AS Data (ID,CCType,CCNum,CCExp)

    )

    --SELECT * FROM cteSampleData

    SELECT

    r2.ID

    ,(CASE

    WHEN r2.CCNum IN (0,1) OR r2.CCNum IS NULL

    THEN CAST(r2.OrigCCNum AS VARCHAR(50))

    ELSE CAST(r2.CCNum AS VARCHAR(50))

    END) AS CCNum

    ,r2.IINType AS ProbableCardType

    ,(CASE

    WHEN v.CkSumRemainder = 0 THEN 'OK'

    WHEN r2.CCNum IN (0,1) OR r2.CCNum IS NULL THEN 'Invalid Number'

    ELSE 'Invalid CheckSum'

    END) AS CardNumberStatus

    FROM

    (

    SELECT

    cte1.ID

    ,(CASE

    WHEN cte1.CCNum IS NULL THEN 0

    WHEN PATINDEX('%[^0-9]%',cte1.CCNum) > 0 THEN 1

    ELSE CAST(cte1.CCNum AS BIGINT)

    END)

    AS CCNum

    ,cte1.CCNum AS OrigCCNum

    ,(CASE

    WHEN r1.IINType IS NOT NULL THEN r1.IINType

    ELSE 'Unknown'

    END)

    AS IINType

    FROM

    cteSampleData cte1

    LEFT OUTER JOIN

    (

    SELECT DISTINCT

    ROW_NUMBER() OVER (PARTITION BY r.ID ORDER BY r.ID) AS CCNumGroup

    ,r.ID

    ,r.CCNum

    ,iin.IINType

    FROM

    (

    SELECT

    cte.ID

    ,cte.CCNum

    ,LEN(cte.CCNum) AS CCLen

    FROM

    cteSampleData cte

    ) r

    CROSS APPLY

    dbo.itvfGetCCIIN(r.CCNum,r.CCLen) iin

    ) r1

    ON cte1.ID = r1.ID

    WHERE

    CCNumGroup = 1 OR CCNumGroup IS NULL

    ) r2

    CROSS APPLY

    dbo.tvfLuhnValidation(r2.CCNum) AS v

    ORDER BY

    ID

     

  • Yes Jeff I just need to isolate those row... I do not need to find the numbers. I will go through the response from everyone now. Thanks a lot to everyone in taking to respond to this. Yeah somehow my earlier post was deleted... may be they thought I had put actual CC# in the script I had... Or I might have done something wrong while creating the post. Thanks Steven for the scripts.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply