August 17, 2009 at 11:33 pm
Actually we have an entire email content stored in a single column in a table. Our aim here is mask the SSN numbers in the email content like (123456789 to XXXXXXXXX) OR (123-12-1234 like xxx-xx-xxxx). The table has two million records, we have 10 different formats occuring in the email content. for example the SSN pattern can be like this.
[1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
[ ][1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][ ]
We have used PATINDEX to do this function. But what we found that it was taking very long time to execute. I have provided the script below.
SELECT DISTINCT MLIEEID, EECLIENT, EEEMPIDCD, PATINDEX('%[0-9][0-9][0-9][./,_^*+\][0-9][0-9][./,_^*+\][0-9][0-9][0-9][0-9]%', [EEEMPIDCD])
FROM [EEISTAT]
WHERE PATINDEX('%[0-9][0-9][0-9][./,_^*+\][0-9][0-9][./,_^*+\][0-9][0-9][0-9][0-9]%', [EEEMPIDCD]) > 0
The above script is taking a long time to execute. Can you please guide us in reframing the above query so that it takes a minimum amount of time to execute.
Thanks in advance.
August 18, 2009 at 3:27 pm
If you don't need to do this on the SQL side you could run a regex on the client side (meaning the calling process) before displaying/emailing the data. If you need mask on SQL then you may want to try a CLR function, as regex is one of the areas in which they perform much better than TSQL.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply