Is there a way to extract only credit card numbers from text

  • Solomon Rutzky (12/15/2016)


    And I just thought of another variation of what I proposed above, taking into account that the goal is to replace the offending card numbers and transaction IDs with "x"s. The following example is mostly the same as before, but shows using RegEx_Replace in two ways to facilitate an UPDATE statement: the first way, ReplaceExtractedCC, is a simple replace (using RegEx_Replace instead of RegEx_MatchSimple) that replaces whatever value is matched with a static string of 16 "x"s; the second way, ReplaceEachDigit, is a two-step approach that first uses RegEx_MatchSimple to extract the offending value, and then runs that through RegEx_Replace to replace each digit with an "x". The first approach is obviously simpler, but it can't make the "masked" value match the format of the original value. The second approach, by replacing each individual digit in the original number, matches the original format in terms of both number of digits and embedded spaces and/or dashes, if there are any. Enjoy...

    (frowny face in the RegEx expression is a colon followed by a left-parenthesis)

    DECLARE @test-2 TABLE ([memo] VARCHAR(400) );

    INSERT INTO @test-2 ([memo])

    SELECT 'abc this account used visa :1234567890123456' UNION ALL

    SELECT 'this account visa 1234567890123456 request for receipt' UNION ALL

    SELECT 'this account visa 123456789012345612/13 exp' UNION ALL

    SELECT 'invoice card used 4123 4567 8901 2345 some text' UNION ALL

    SELECT 'paid using 4123 4567 8901 2345 thank you' UNION ALL

    SELECT 'Transaction code 123456454636667445 copy invoice'UNION ALL -- this is 18 digits, not 16!

    SELECT 'mastercard cc#5987-6543-2109-8765. this order will be processed' UNION ALL

    SELECT 'mastercard 5987-6543-2109-8765, thank you' union all

    SELECT 'AMEX code 123454636667445 copy invoice'UNION ALL -- AMEX with no spacers

    SELECT 'AMEX two 1234 546366 67445 copy invoice'UNION ALL -- AMEX with spacers

    SELECT 'AMEX with exp date 12345463666744599/12 copy invoice'UNION ALL -- AMEX with no spacers + exp

    SELECT 'another AMEX with exp date 1234 546366 6744599/12 copy invoice'UNION ALL -- AMEX with spacers + exp

    SELECT 'AMEX missing 1 digit 1234567890123412/13 exp' UNION ALL -- INVALID, but 16 digits before the "/"

    SELECT 'tran code 1234567890123412234234213' UNION ALL

    SELECT 'any card 123456789012341223423421345/45 exp' UNION ALL -- INVALID, too long

    SELECT ' period 1999-10-18 - -2000-10-17 ';

    DECLARE @Pattern NVARCHAR(120);

    SET @Pattern = N'\b(?:(?:\d{4}(?:[ -]?\d{4}){3}|\d{4}[ -]?\d{6}[ -]?\d{5})(?=\d{2}/\d{2}|[^/0-9]|$)|\d{16,}(?=[^/0-9]|$))';

    ;WITH cte AS

    (

    SELECT [memo],

    SQL#.RegEx_MatchSimple4k([memo], @Pattern, 1, NULL) AS [ExtractedCC]

    FROM @test-2

    )

    SELECT [memo],

    [ExtractedCC],

    SQL#.RegEx_Replace4k([memo], @Pattern, 'xxxxxxxxxxxxxxxx', -1, 1, NULL) AS [ReplaceExtractedCC],

    REPLACE([memo],

    [ExtractedCC],

    SQL#.RegEx_Replace4k([ExtractedCC], N'\d', 'x', -1, 1, NULL)) AS [ReplaceEachDigit]

    FROM cte;

    The following is one line of the output so it will be clearer for those who don't run the code:

    memo mastercard cc#5987-6543-2109-8765. this order will be processed

    ExtractedCC 5987-6543-2109-8765

    ReplaceExtractedCC mastercard cc#xxxxxxxxxxxxxxxx. this order will be processed

    ReplaceEachDigit mastercard cc#xxxx-xxxx-xxxx-xxxx. this order will be processed

    Take care,

    Solomon..

    Thank G-d for copy/paste!! That sh-t makes my head explode! 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing post 31 (of 30 total)

You must be logged in to reply to this topic. Login to reply