March 29, 2018 at 7:30 am
If you want connectivity to this with SSMS, the best approach is likely using SQL Server or Azure Data Warehouse with Polybase enabled. Then you can set up those External Tables that point to the collection of documents in your data store. This will allow you to log into database from SSMS and then allow you to query that External Table directly on the documents themselves through your Azure Data Lake Store (or Blob storage if you don't use Data Store). This does not require you to have Azure Data Lake Analytics enabled.
I have yet to use tables in ADLA for the mere fact that you can just use USQL to output the structure you want to a new document. Then feed that new document into a database. But, that does not mean it's not useful setting up that table object to just read the table instead. It's essentially like a SQL View for the data store or for your database if you're using external tables with Polybase.
June 11, 2018 at 1:53 pm
Hey Mike, tried to use the files hosted here http://mcqtech.com/articles/usql/files/usql_files.zip - they look like html inside the csv's in the zip. Are these files broke?
NVM - saw the post about github. Thanks!
June 11, 2018 at 3:02 pm
Hi MP
Glad you managed to grab the files. Sounds like I need to update my Web site! I might just update the article to point at github. Thanks for taking the time to post the message.
Regards,
Mike.
June 12, 2018 at 6:44 am
Thanks, yeah a link back to github couldn't hurt. Anyways, great articles, it's helping me catch up. Keep them coming.
June 12, 2018 at 7:24 am
naunihal - Tuesday, March 13, 2018 11:38 PMHow does Database in ADLA compares to ADW.
ADW is limited to 240TB, does ADLA DB has any limitations ?
It's been awhile since I last commented on this article and since then, I've been using ADLA more and more. I'll toss some feedback if you are still around.
I use ADW the has data sourced from both Blob Storage and Azure Data Lake Store. Data is processed using Azure Data Lake Analytics (U-SQL).
The speed of ADLA compared to ADW is pretty amazing. I can take 3 billion row files and JOIN them together with related dimensional files extremely fast in ADLA without having to worry about indexes, statistics or anything of that nature. I'm pretty amazed by the speed compared to having to do the same in ADW where I have to scale up to say, 20+ USD a hour just to run the query.
However, for me at least, this is not a replacement for ADW. The benefits here is computation super fast. But when you need to secure, maintain, and expose the final output, this is where ADW comes into play the most. This is why I feed all my final data to ADW. I aggregate it down using ADLA and then expose it through ADW where it can be fully managed as well have scaling power if needed. It works out great when you then need to start sourcing say, PowerBI or Cubes or Data Marts located in Elastic SQL instances.
The other thing to mention here is that ADLA, much like Spark and much like MapReduce, rely on more object-oriented programming to make shine the most. Java for MapReduce, Python for Spark, and .NET(C#) for U-SQL. It becomes a real pain when having to convert a simple date string or handle records with double-quotes (although ADLA has extractors that handle quoting, it still fails once it splits the data up sometimes). Then things as simple as not having the ability to do JOIN >= is not supported among other basic approachs you may take in T-SQL. You will often spend your time trying to figure out the .NET approach or writing custom .NET functions to handle what you need than anything else, which is a pain for most DBA-types.
But hey, you really can't beat not having to define a schema, normalize/denormalize, index, add statistics or any of that jive to get to the data. You upload, you computate, you're done. You can do this on insanely large datasets at ease. It's freaking amazing. I'm really suprised when I talk to DBA's who smash data lakes and their usage. It's insanely fast and can save you a lot of time IF USED RIGHT. It doesn't replace the data warehouse, but it surely helps get that data into it's final resting place pretty damn fast and if you take a simplistic approach to USQL like I have, pretty damn easy too.
June 12, 2018 at 11:23 am
mpsmith 63921 - Monday, June 11, 2018 1:53 PMHey Mike, tried to use the files hosted here http://mcqtech.com/articles/usql/files/usql_files.zip - they look like html inside the csv's in the zip. Are these files broke?NVM - saw the post about github. Thanks!
I was able to get the files by choosing the raw option then saving the files down a few months back.
[font="Tahoma"]Cheers,
Sean :-D[/font]
June 12, 2018 at 2:24 pm
xsevensinzx - Tuesday, June 12, 2018 7:24 AMnaunihal - Tuesday, March 13, 2018 11:38 PMHow does Database in ADLA compares to ADW.
ADW is limited to 240TB, does ADLA DB has any limitations ?It's been awhile since I last commented on this article and since then, I've been using ADLA more and more. I'll toss some feedback if you are still around.
I use ADW the has data sourced from both Blob Storage and Azure Data Lake Store. Data is processed using Azure Data Lake Analytics (U-SQL).
The speed of ADLA compared to ADW is pretty amazing. I can take 3 billion row files and JOIN them together with related dimensional files extremely fast in ADLA without having to worry about indexes, statistics or anything of that nature. I'm pretty amazed by the speed compared to having to do the same in ADW where I have to scale up to say, 20+ USD a hour just to run the query.
However, for me at least, this is not a replacement for ADW. The benefits here is computation super fast. But when you need to secure, maintain, and expose the final output, this is where ADW comes into play the most. This is why I feed all my final data to ADW. I aggregate it down using ADLA and then expose it through ADW where it can be fully managed as well have scaling power if needed. It works out great when you then need to start sourcing say, PowerBI or Cubes or Data Marts located in Elastic SQL instances.
The other thing to mention here is that ADLA, much like Spark and much like MapReduce, rely on more object-oriented programming to make shine the most. Java for MapReduce, Python for Spark, and .NET(C#) for U-SQL. It becomes a real pain when having to convert a simple date string or handle records with double-quotes (although ADLA has extractors that handle quoting, it still fails once it splits the data up sometimes). Then things as simple as not having the ability to do JOIN >= is not supported among other basic approachs you may take in T-SQL. You will often spend your time trying to figure out the .NET approach or writing custom .NET functions to handle what you need than anything else, which is a pain for most DBA-types.
But hey, you really can't beat not having to define a schema, normalize/denormalize, index, add statistics or any of that jive to get to the data. You upload, you computate, you're done. You can do this on insanely large datasets at ease. It's freaking amazing. I'm really suprised when I talk to DBA's who smash data lakes and their usage. It's insanely fast and can save you a lot of time IF USED RIGHT. It doesn't replace the data warehouse, but it surely helps get that data into it's final resting place pretty damn fast and if you take a simplistic approach to USQL like I have, pretty damn easy too.
Hi xsevensinzx
Good to hear from you again! I actually agree with you - ADLA isn't a replacement for a warehouse. But it's a brilliant assistant! You're using it in the same way I've seen it used everywhere else so far - as a supplement to a relational or data warehouse system.
I think it's a great bit of technology, I agree about people who diss the Data Lake. It's just another tool in the data arsenal, and when a quick conversion of a large data set is needed, there aren't many things out there to touch it.
Hope you carry on enjoying U-SQL and the Data Lake! Thanks for taking the time to report back.
Regards,
Mike.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply