SSN MAsking - Performance Issue

  • Actually we have an entire email content stored in a single column in a table. Our aim here is mask the SSN numbers in the email content like (123456789 to XXXXXXXXX) OR (123-12-1234 like xxx-xx-xxxx). The table has two million records, we have 10 different formats occuring in the email content. for example the SSN pattern can be like this.

    [1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]

    [ ][1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][ ]

    We have used PATINDEX to do this function. But what we found that it was taking very long time to execute. I have provided the script below.

    SELECT DISTINCT MLIEEID, EECLIENT, EEEMPIDCD, PATINDEX('%[0-9][0-9][0-9][./,_^*+\][0-9][0-9][./,_^*+\][0-9][0-9][0-9][0-9]%', [EEEMPIDCD])

    FROM [EEISTAT]

    WHERE PATINDEX('%[0-9][0-9][0-9][./,_^*+\][0-9][0-9][./,_^*+\][0-9][0-9][0-9][0-9]%', [EEEMPIDCD]) > 0

    The above script is taking a long time to execute. Can you please guide us in reframing the above query so that it takes a minimum amount of time to execute.

    Thanks in advance.

  • If you don't need to do this on the SQL side you could run a regex on the client side (meaning the calling process) before displaying/emailing the data. If you need mask on SQL then you may want to try a CLR function, as regex is one of the areas in which they perform much better than TSQL.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply