October 8, 2018 at 9:08 pm
Comments posted to this topic are about the item Single Source of Truth
October 9, 2018 at 1:15 am
Single source of truth in one place is a paradigm that has scuppered many an EDW project. Once an organisation gets beyond a certain scale you are on a hiding to nothing.
To start with define "single source of truth". Is there a universally agreed truth? Would the finance department agree with the sales department on what constitutes truth? Does what constitute the truth change?
Does all the data within your organisation originate in your organisation? Do you have control over all data in your organisation?
I'm finding that the source of truth for "products" resides in system 'x', the source of truth for "customer" details resides in system 'y' and the source of truth for a large chunk of reference data resides externally.
October 9, 2018 at 1:53 am
Nice article, thanks.
In my opinion there's not really any such thing as the "truth" because data changes all the time. The focus should really be on consistency. The problem is that most clients hate inconsistency - yes really hate it - it makes them look bad to their internal clients and undermines confidence in their data/project. Most clients seem ok that things change over time, but they absolutely want reports to be consistent at least within a batch.
If data changes frequently then I find it's best to generate some reporting tables or a cache of some sort to contain a consistent set of data to produce all the reports from. This has the added advantage that you can compare your cache to future caches to see what has changed or caused changes. Clients love to know what causes change so they can explain it to the end user. The quicker you can explain any changes the happier the client is in my experience.
October 9, 2018 at 6:16 am
This is why I feel the data lake is a important component to the data warehouse. Often times, the data warehouse is not the raw source of truth. It's often the processed source of truth that is changing as the business requirements change. This means, we take raw data from a source system and conform it to a processed state for the data warehouse. Then we limit that process state with limited access and restrictions on what we can do with it (i.e.: no human intervention).
Copying the entire raw state of every source to a cheaper and more fault tolerant system before the warehouse seems to be the real source of truth. It's everything it is before you go into that processed state in the warehouse. It's the dirty and unfiltered data that can be accessed by all users to redefine the business requirements of the warehouse without actually impacting the data warehouse. It's also the one location that data can be explored and prototyped to only further enhance the data warehouse or give the data warehouse time to catch up on their own work to finally productionize what you developed.
I think in time, this will become the source of truth for many organizations just for the sheer fact it's so easy to do when in comparison of trying to develop a schema-on-write database that if harder to maintain as the raw source of truth in whatever database system you use.
October 9, 2018 at 9:19 am
.. The goal is to have a single truth for each datapoint, if we can achieve that then we can make decisions about whether to copy that data somewhere to simplify usage or to mash it all up at a layer above the database ..
The CAP theorem applies to attempts at using a central EDW as a Single Source Of Truth. You can't reliably aggregate point in time metrics from multiple distributed data sources without tolerating a certain degree of latency, occasional unavailability, and margin for error. What you're really providing is an [Official Version Of Truth]. For a governmental or corporate enterprise, it not so much important that everyone is operating with the most accurate ideal of truth, but rather that everyone at any moment in time is operating with the same good enough version of truth. For example, it's important that society as a whole accept the official outcome of a political election, even though many folks would argue on procedural or philosophical grounds that the outcome was incorrect. Settling for a margin of error is better than remaining in a constant state of disagreement.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply