There is an introduction to differential privacy you can read, though it gets a bit complex with math. Essentially, the idea here is to somehow share some data anonymously but prevent the reverse engineering problem Netflix had when they released their dataset in 2007.
It's an interesting idea, but I'm not sure this works well, even if you are limiting queries from some client that is doing machine learning or open research. The big problem here is that when you allow open access, even if you limit access for one user, how do you know that a different user isn't collecting all the data? After all, even if you were allowing only students to access your database, can you be sure that they aren't collaborating and sharing the results of their queries?
The vast amount of data available on many people can cause problems with ensuring privacy, especially from third party data sets. This is one reason that I often use random birthdays on different sites, because I worry that my data is going to be lost and correlated with other data breaches. It's not much, but it is something.
There are companies that seem to think that differential privacy and a little noise in datasets helps ensure privacy. I think they're naive, or perhaps disingenuous, but either way, privacy remains a serious problem in development work of all sorts. Especially when we find that the controls around development data are much poorer than production data. Even that (arguably) isn't well secured.
There aren't great solutions, but I do think that for most companies, they should have some sort of curated data set for development purposes. Fake data that allows developers to build software, but isn't likely to cause an issue for human if the data is exposed. Unfortunately, I'm not hopeful that any significant number of companies will actually go down this path.