Data privacy and protection has become a hot topic. Across the last few years, as I've worked with customers that deal with the GDPR or other laws, they've become more concerned and careful with their use of data. I find less and less resistance from developers that want to use sensitive production data in development environments, but still too much.
Using data that's stored in databases or other text files is one thing. What about data in less structured forms? I've dealt with a few customers over the years that recorded customer interactions for various purposes. Call center or financial organizations commonly do this, and sometimes deal with sensitive information. I know I've given my date of birth, credit card number, bank account number, and other data to representatives at various times.
Those calls are often recorded, and the IT staffs have often had to ensure extra security is applied to these files. Not everyone has access to listen for various reasons, but certainly when there is sensitive information inside the audio (or video), this data needs the same protection we'd apply to data in other forms. Providing protection, or redacting the information, isn't an easy task.
I saw recently that the Amazon Transcribe service will now redact some PII (personally identifiable information) data automatically. This service can be configured to automatically remove the information in text, which is fantastic. This is a great way to start to use technology in a safe way to ensure that we have less data leakage when we re-use data. Certainly people might look at this data to better train reps, but it's also likely someone wold look through transcripts to determine why customers are calling in and use that information to better design applications. In either case, there isn't any need to expose PII data to them.
This doesn't protect against the data inside the audio, but perhaps companies can delete and remove those recordings sooner with transcripts available and more quickly reduce their potential attack surfaces. We'll always have some liability, but reducing that and not unnecessarily creating issues is part of what we want to do when protecting data.