January 31, 2025 at 8:16 am
At my last job, my primary responsibility was identifying and correcting years of data debt. The company management initiated the process and demanded that a thorough cleanup be accomplished. Years of data definition and SQL coding with little or no standards had created a scene from the wild west. The data cleanup process involved repetitively identifying, redefining, renaming, and describing database column names and attributes to be consistent across the entire collection of systems on SQL Server, which consists of more than 1,600 tables containing 26 million+ records. I worked side by side with the founder and CEO of the company who supplied all the background for each data element and a clear description of each element's purpose and interaction with other data elements. His remarkable memory and attention to detail made this work.
When we first started, it felt like trying to chew through granite. However, as time went on our methods were refined and improved which reduced the amount of time required to make the necessary changes. My "technical partners" were Microsoft Team Foundation Server (TFS) for configuration management, the Perl programming language, the Vim editor, and of course SQL.
The payoff has been huge. Standards are now in place with appropriate oversight to ensure the standards are followed. Testing new features has become easier and more seamless. Responding to bug reports no longer involves fighting to understand the code base. I am retired now, but what a great way to end an IT career that started in 1969!
Roy Fulbright
Computer Consultant
January 31, 2025 at 10:34 am
I worked for a company that had several different lines of business, all acting independently. This meant that recording the customer contact details was done in ways that each line of business had come up with. Some of those lines of business had come together to share a common approach because there were shared interests between them and an opportunity for cost savings. Standardising on a single vendor API for address validation made sense.
Over time more lines of business standardised on that single vendor API. Someone realised that a competing vendor API was close enough that putting an internal wrapper around the two would give us some advantages.
We also started to look at our reference data. In the UK insurance industry there are a number of standard reference data sets. Some of these have applicability to lines of business that weren't insurance.
This all sounds well and good and it would be but for human fallibility. There are always people who push against standards and norms. Occasionally the reasons are good because natural evolution applies to standards too if they are to remain relevant. Mostly its contrariness. One line of business decided that a UK address would be best represented as a single string/varchar column. That broke the geodemographic analysis and recommendation systems that relied on the different address parts. As one exasperated friend put it "is it really necessary for them to shoot themselves in both feet"!
One of the things I am looking into at the moment is bi-directional data contracts.
We know that we have complete freedom to change the contract for the 15 attributes no-one is using.
We also know that we need robust data tests for the 35 attributes that people are using if we change the mechanisms and data flows that supply the Data Product.
January 31, 2025 at 6:31 pm
First, "data debt" smells a lot like "technical debt".
Regarding data warehouses (not to be confused with data lakes), if it's architected based on something like the Kimball dimensional method, then it specifically addresses issues like integration of multiple source systems, de-duplication, and labeling.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy