December 20, 2019 at 1:02 pm
Recently, our senior management went to some event and attended seminar on Data Lake. When they came back they are very exited to implement it due to following reasons :
Few important fact that is worth sharing :
This is very interesting topic but somehow I'm not experienced with either data lake or Hadoop so my fingers are crossed. Any feedback on whether we are on the right direction ? If yes, then what would be the correct approach to do it? would be really appreciated.
December 21, 2019 at 1:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
December 23, 2019 at 4:28 am
Any suggestions would be much appreciated !
December 24, 2019 at 8:57 am
I'm hoping to have some useful suggestion on this topic.
April 14, 2020 at 12:24 pm
Hi,
Bit late to the party so hopefully you still read this.
Using a document store or some form of data lake with a RDBMS is a good step. I've been using a managed data lake solution called Azure Data Lake Store with Azure Data Lake Analytics within Azure. Choosing this option means I don't have to manage the full Hadoop ecosystem such as Sqoop, ZooKeeper, etc and I can fully integrate Active Directory across the stack from the data lake -> data warehouse -> data mart.
Leveraging the data lake allows an accessible location to store all my raw data that can be queried in raw format by the end users BEFORE it's processed by whatever ETL or ELT you may use. It also allows me to offload, like you are looking to do, a lot of that process into another system, either for real-time loading of data or batch processing of data. This means, less stored procedures for SQL and most importantly, less work.
If SQL Server was a wheelbarrow, it was constantly carrying all the rocks. Leveraging a data lake means we have more than one wheelbarrow helping distribute those rocks so SQL Server can focus on more important things, like serving and securing the data.
So yes, it's good thinking. I just again, would leverage more managed services than trying to manage Hadoop yourself. Newer versions of SQL Server and Azure Data Warehouse all come with Polybase now, which is the equivalent of Hive for Hadoop. You can connect it to Azure Data Lake Store or S3-like buckets such as Blob storage. It's really nice.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply