The Load of Real Time Data Warehouses

  • Comments posted to this topic are about the item The Load of Real Time Data Warehouses

  • Good question.

    I had a storage problem with a data warehouse and needed to remove some data.  I went to the various users of the warehouse to find out what could be got rid of.

    For Data Scientists the answer I got equated to "We need all data forever".  My experience in other organisations is that this answer is a typical Data Scientist answer.

    For other roles I have found that few people know how long data is needed and they are paranoid about allowing data to be deleted even though they can't tell you the purpose to which it might be put.

    I expected the READ:WRITE ratio to be heavily READ biased but it was near 1:1 and I put that down to a few things

    • Use of ELT tooling.  By definition these do a lot of reading & writing
    • Data warehouse projects are not necessarily driven by end-user needs.
    • Data lifecycle not understood or specified
    • People don't know what data is available to them or even what the various tables/columns are.  They don't read data because they don't know it is there.

    I have also observed that queries tend to be written in a way that does not get the best out of the data warehouse platform.  This means that reads, low though they may be, are artificially high.

  • When I think of a data-warehouse, I'm thinking OLAP, meaning data is updated on a schedule, contains versioned history of changes, and probably summarized as well for fast reporting and dashboards.

    However, if the database is expected to provide real-time updates to end user queries, and there is 1:1 ratio between reads and writes, then it's probably more like a domain specific data-mart providing applications with fast querying, especially if it gets event driven updates from other OLTP databases. This (and other data-marts) could then serve as ETL on an hourly or 15 minute schedule for the enterprise level data-warehouse.

    Ralph Kimball data modeling is still relevant in the era of big data and NoSQL.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • That's essentially what we had set up at my last job, a nightly load of the entire data warehouse with a specific set of jobs that ran pretty much continuously throughout the day targeting some very specific metrics.  The reason being that they wanted to be able to use the same reports during the day and for historical reporting so there was never a discrepancy in those metrics if someone pulled up those stats later.

  • Unfortunately there are no data-mart products or cloud services. Everyone sells a data warehouse and people are loading them constantly

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply