June 14, 2016 at 9:55 pm
Comments posted to this topic are about the item Stairway to U-SQL Level 1: Introduction to U-SQL and Azure Data Lakes
June 15, 2016 at 1:46 am
Great article, but I have to disagree with this:
"In the classic SQL Server stack, Analysis Services (SSAS) would be used to house the Data Warehouse"
A data warehouse is a database with a specific design based on a methodology. SSAS is a presentation layer above this (the DW can also be seen as a presentation layer) and should in no way "house" a DW.
I know it's picky but it bothers me. I won't sleep tonight now. Thanks. :crying:
June 15, 2016 at 1:53 am
Hi there PB_BI
Just trying to make the article fairly general for readers who don't know the stack inside out. I do agree with your point.
Hope I don't ruin your sleep too much!
Cheers,
Mike.
June 15, 2016 at 6:39 am
Good article Mike. Informative, precise and tother the point.
-- Itzik Ben-Gan 2001
June 15, 2016 at 7:42 am
Thanks Alan, glad you liked it!
Mike.
June 15, 2016 at 4:30 pm
Hi Mike,
Great article, these were things that I knew zero about before now, but I have 2 questions about your article:
1.
The challenge this approach doesn’t resolve is: what happens if the questions the users are asking change?
Isn't this the inherent challenge of Data Warehousing? Isn't that the thing that separates the men/women from the boys/girls? I am not a seasoned expert by any means, but I hope to be one day. Heck, I'm only taking my first swing at designing a data warehouse with the BI team that I'm on, but, that seems to be the elephant in the room, that you are attempting to (at the end of a rigorous process) create a system that will "be able to answer the questions that haven't been thought of yet". This question is not in an argumentative tone, but more to make sure that I haven't missed something. If we had all decided that the changing questions in the future would be unanswerable once we built a DW, then maybe I am not pursuing the most effective solution.
2. Isn't the Big Data arena (including this data lakes concept) really more suited for non or less structured data? I thought that was the main benefit, or, so to say, that whether you put highly structured data into an RBDMS or a Big Data Apparatus, there wouldn't be that much difference in what you could or couldn't do. However, if you have less structured data to deal with, you would be basically crippled by trying to handle that in an RBDMS, but the advantage of using Big Data for structured data would be negligible.
Once again, both of these are not meant as critical of your article, just want to see if I can confirm my own understanding. Your walkthrough of the Azure Data Lakes product is exceptional, and I know it took you a lot of your own personal time to put that together. You should know that your effort is appreciated. Thanks!
Clint
June 16, 2016 at 12:33 am
Great article! Thanks for taking the time to write it.
Shifting gears and without having anything to do with the article, I think U-SQL only being available to the cloud is a real shame. It's what some of us have been asking for in the local instance world for a long time and it would really be cool if they pushed it down from the cloud to us lowly Earthers that are grounded by necessary requirements.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 16, 2016 at 1:53 am
Hi Clint
Thanks for the kind words, glad you enjoyed the article.
I agree on both your points. I was trying to make the point that it should be easier to respond to the changing user questions in a Big Data area, precisely because the data is unstructured. You don't need to spend time modifying cubes, dimensions etc - you can "just" change the query.
So you definitely haven't misunderstood anything (in my view), I fully agree with both of your points.
Regards,
Mike.
June 16, 2016 at 1:55 am
Hi Jeff
Glad you liked the article, thanks for the kind words.
It is a shame U-SQL isn't available locally, although who knows what Microsoft will do in the future. There may be some possibilities on that front, if I come across anything I'll let you know.
Regards,
Mike.
June 16, 2016 at 3:28 am
Maybe I missed someting...
How does the U-SQL query know what field to use if there are no headers?
"IMPORTANT NOTE: Before you upload the files, open them in Excel and remove the first row (the header row). U-SQL does not recognise headers at the time of writing."
Great post by the way!
June 16, 2016 at 5:20 am
Hi MCDB
It's up to the developer to know what columns are in the file, and then apply them in the EXTRACT statement. As per this statement:
@results = EXTRACT postcode string,
total int,
males int,
females int,
numberofhouseholds int
FROM "/Postcode_Estimates_1_M_R.csv"
USING Extractors.Csv();
You have to specify all columns in the file, you can filter out unwanted columns in a later SELECT statement. This is discussed in more detail in the second part of the series.
Regards,
Mike.
June 16, 2016 at 5:26 am
Good intro, I learned some stuff!
The lack of header support, or the ability to store/relate stronger meta data seems like a serious weakness. I'm thinking about a lake with 1000's of files and the plan is to open each up to figure out the structure?
Data lake does sound cooler than "the data file share".
June 16, 2016 at 9:37 am
Hi Andy
A feature is coming called SkipFirstNRows, which will, er, let you skip a number of specified rows. That should sort out the header issue, which is a massive problem at the moment.
It is possible to add better structure to the data, that's all coming soon!
Regards,
Mike.
June 22, 2016 at 6:55 am
Question: is there an on-prem version of this? I'm in banking which is heavily regulated and generally paranoid (and rightly so!) We have an on-prem cloud for server/database deployments and could use something like data lakes. For us though, any off-prem cloud is automatically off the table.
Gerald Britton, Pluralsight courses
Viewing 15 posts - 1 through 15 (of 37 total)
You must be logged in to reply to this topic. Login to reply