I had never heard of data debt until I saw this article on the topic. In reading it, I couldn't help thinking that most everyone has data debt, it creates inefficiencies, and it's unlikely we'll get rid of it. And by the way, it's too late to get this under control. I somewhat dismissed the article when I saw this: "addressing data debt in its early stages is crucial to ensure that it does not become an overwhelming barrier to progress." I know it's a barrier, as I assume most of you also know, but it's also not stopping us. We keep building more apps, databases, and systems, and accruing more data debt. Somehow, most organizations keep running.
The description of debt might help here. How many of you have inconsistent data standards, where you might define a data element differently in different databases? Maybe you have duplicated data that is slow to update (think ETL/warehouses), maybe you have different ways of tracking a completed sale in different systems. Maybe you even store dates in different formats (int, string, or something weirder). How many of you lack some documentation on what the columns in your databases mean? Maybe I should ask the reverse, where the few of you who have complete data dictionaries can raise your hands.
For most of my career I've heard a couple of terms that I've never really seen implemented. There's the famous "single version of the truth" for a system, which seems to break down whenever we add a reporting or warehousing system. Even inside a single database, often an OLTP one, it's hard to get a truth because values are changing so fast. The other term is MDM (master data management), which promises to ensure that every element is tracked and tagged the same way. No misspelled customer names or outdated addresses. There have been no shortage of products I've seen to help people tackle this problem, but ultimately I think the amount of data debt is too high. When we realize we need MDM, we'll never pay down that debt, mostly because too many developers have too many habits and legacy ways of capturing data that will never get integrated into any MDM dictionary.
The article seems like a great academic set of principles. Make sure you label all your data. Put governance in place, with good access controls. Train workers, establish accountability to properly manage data. Invest in scalable architectures. How many of you can add scale to your system easily? It's always taken me jumping through a variety of hoops to do that. The cloud makes it easy.
For a month. Then when the bill comes, you'll be scaling back down.
Really, the chaos of the real world, where organizations are not one thing, but a large number of people and groups, each with their own goals and processes, just trying to get enough done to keep the organization moving forward is where we live. There's no real time to deal with data debt.
Except if you're the ETL person. We mostly pay you to move data around and clean it as best you can. At least then the problem remains hidden from the report readers, who trust you've actually done the T portion of ETL correctly.