This editorial was originally published on Jan 9, 2020. It is being re-run as Steve is out of town.
Test data is hard to come by, and I agree with Brent Ozar: " I get so frustrated when I hear trainers/presenters/bloggers/idealists talk about how developers should be using purpose-built-from-scratch data sets with no real customer data in ’em. The real world just rarely works that way."
I agree, and I know this is a hard problem. I'm one of those that tries to get people to use non-production data. I know there isn't an easy solution, and there isn't a one-off solution here. Really, my frustration is with the fact that many development people just don't see the problem as one that requires ongoing effort. However, as I've seen unit testing grow and become commonplace, I do expect that eventually we will start to understand that it is worth some effort to produce test data.
Often we want a lot of things. We want large data sets, similar to production. We need some randomness in there because we won't think of every edge case. We need some consistent values because we will use them to initially evaluate our logic and finding new values all the time is frustrating.
Is there a good way to do this? I've seen some good, creative ways of building test data, such as Andy Mallon's post from last year. I do think routines like this are good, but they need to be a) shared with other developers, b) easy for other developers to find, and c) incorporated into some developer database creation process.
To me, this ought to be a process that is maintained by all developers and DBAs, and one that ensures a new development environment can be created in seconds, or possibly minutes. This often means some pre-staging work to ensure developers can quickly and easily hydrate or reset a new database. Certainly products like SQL Provision can help, but these still require some work to ensure that developers have the test data that is useful for them.
Ultimately, I think maintaining test data is part of being a developer. Having a central, consistent set (or two), is a part of working in a software team. When we start to accept this as part of our work, we can build more consistent and reliable processes.