G’day,
I was asked recently what a modern data warehouse is – and that’s a vey thought provoking question.
First, there’s the tag “Modern” – I think what’s modern now may not be quite so modern in a year or twos time – maybe sooner, judging by the speed of some technology changes at present.
My initial response – jokingly – was ‘it depends who you ask’, which is a pretty useless – but true – answer.
Firstly let me put some context around my answer. I’ve built my career around Microsoft technologies – so my answer is going to centre around their latest technologies – which are going to be all cloud based, and essentially incorporate Platform-as-a-Service (PaaS) functionality. Your experience may be different, you might be in a different cloud, you might have different ideas – but for me it’s PaaS Services in the Azure cloud. I’ll expand on this shortly.
But let’s (quickly) look back on how I’ve seen data warehouses develop – in my Microsoft world – over the years. They used to be simply built on SQL Server, incorporating fact and dimension tables. There’d be debate about star and snowflake schemas, about kimball or Inmon. More recently people talk about Data Vault. Essentially, people pulled data from their operational (transactional) system into their data warehouse (decision support system or Enterprise data warehouse or whatever they chose to call it).
They did this using some Extract, Transform and Load (ETL) tool (commonly called a data integration tool) like SQL Server integration services.
They might have had a single data source (one transactional system) or they may have had more data sources – so firstly staging the data made sense, then cleaning it and then putting it into the main data warehouse.
We’d have to deal with things like slowly changing dimensions and very often we may not have modelled data exactly as it is in the real world.
We might put analytical models over the data warehouse, initially these were multi-dimensional (MDX) but later Tabular. Then we’d either report straight off the data warehouse or the analytical model – using tools like SQL Server Reporting Services or – lately – Power BI.
For me, things have moved to the cloud and our options have expanded (or should I say exploded)
We can take data from literally any data source and land it in our Azure data lake. This area can be segregated, as it likely contains our raw data. we can now keep this – untouched – as space is literally no longer an impediment (ok, it still costs money, but it’s cheap, and we can have different degrees of storage (hot / cold /archive) and we have to trade off access times with cost. We also don’t have to worry about upgrading and migrating data at any time – this is all part of the service provide by a modern cloud.
We can then start segregating the data into different areas (of the data lake), say a staging area, a clean area, and maybe an area full of curated data sets.
We can pull in different types of data in different formats – CSV, JSON, AVRO, Parquet, and more.
Once we have the data in out desired area we can then pull it into a traditional data warehouse – say in Azure SQL Database for instance (although there maybe better cloud solution). We can use Polybase to virtualise the data.
We could also use Synapse, CosmosDB or maybe databricks.
We can build Machine Learning Models and enable Artificial Intelligence services.
Basically, I think the ‘modern’ data warehouse is about cloud based data services and simply adds to the traditional models of data warehousing.
Whereas the traditional data warehouse is about historical reporting, such as diagnostic and descriptive analytics – the new tools enable predictive and prescriptive analytics along with cognitive capabilities.
So, basically the answer to ‘what is the modern data warehouse?’ could be a lot of things, pieced together in many different ways. and we’re only going to get more as time marches on.
So maybe we should start talking about the changing data warehouse, or the evolving data warehouse or the cloud data warehouse – but not the modern data warehouse, as what’s modern today is sure to change tomorrow.
Anyway, these are just my thoughts. Some will agree, some won’t and they’ll be plenty of diverse ideas – that’s great for innovation. Um, maybe the innovative data warehouse is a better name
Anyway, have a great day
Cheers
Martin.