Big Data

  • Dear all,

    I read that SQL Server 2016 has now a tool to integrate with Big Data.

    Before I understand this tool, probably it makes sense to understand the big data concept and be mastering big data (maybe certify myself).

    Can you please let me know what is the best path for this?

    Any good certification program to underdstand big data before I start with SQL Server new feature?

    Thank you

  • I'm sure others will likely have a lot more to say about this topic than me, but this is just my opinion.

    "Big Data" is a marketing term. Many have defined it as data that entails the 3 V's of high data velocity, lots of data variety, and large data volume to a point where your business cannot manage or support your data anymore. That you need new technology that can support it and scale. Blah, blah, blah...

    Realistically, "Big Data" is a very vague term that can mean a lot of things. It can mean all of those 3 elements or maybe one of them. Regardless of what you run into, complex data problems do exist, big or small. The only thing you should be concerned with is what tools are out there to deal with those problems. SQL Server is just one tool that can deal with those problems or incorporate into other tools that can deal with those problems.

    Therefore, I would not worry about trying to "certify" yourself in a marketing trend. Just focus more on certifying yourself in the tools you use that solve the business problems you face and do not get yourself hung up on the next new shiny unless that new shiny makes sense to your business where it's going to solve those real world business problems.

    For example, SQL Server 2016 is a tool that can handle high velocity, large volumes, and lots of variety (variety is what I feel is it's weakest link, but there are tools to handle variety). It can handle this up to a certain point where maybe SQL Server 2016 is not the best tool. At what point is that? Well, that's the vague part of this "Big Data" trend. It's going to be different for every company and every team behind those tools. I've seen companies say they have a big data problem because their tools can't handle their current business problems. I've seen those same companies solve those problems by simply having a different person look at that problem and come up with a better solution with the existing tools they have (i.e.: re-model, re-structure, etc).

  • Thank you very much for your reply. Appreaciated.

    This is why I think that knowing for example Hadoop may be important. If I understand it well (dominate it) I will be able to understand if this is realy gopod for the business.

    Probably there is out there same type of certification where I can understand hadoop but have at the same time the overview of how can it be connect to SQL Server. In case of interest.

  • SQL Server's builtin "big data" functionality is actually something called Polybase.
    Polybase allows users to interface with HDFS (Hadoop Distributed File System) which is the storage layer for a Hadoop environment.
    The nice thing about Polybase is that it is basically T-SQL so you don't need to learn Java in order to create MapReduce jobs on Hadoop.
    In Azure Data Warehouse you can also use Polybase to access files in a HDFS layer.

    The whole "big data" thing is a lot more involved than that but, in a nutshell, this is where SQL Server fits in.

  • river1 - Tuesday, October 31, 2017 6:36 AM

    Thank you very much for your reply. Appreaciated.

    This is why I think that knowing for example Hadoop may be important. If I understand it well (dominate it) I will be able to understand if this is realy gopod for the business.

    Probably there is out there same type of certification where I can understand hadoop but have at the same time the overview of how can it be connect to SQL Server. In case of interest.

    It just depends. Hadoop is also a vague term too because Hadoop is not just one piece of technology. In many cases, it's a collection (or framework) of software working together to form a data ecosystem. For example, you can use Hadoop just for the HDFS and store documents of data that feed into Polybase that feed into SQL Server 2016. Data lands in HDFS in a schema-less type environment and then feeds into SQL Server in a defined schema-on-write. The question is whether or not you need Hadoop versus just sending data directly to SQL Server. You may find there is not a business case for it.

    I won't say learning Hadoop would be a waste, it's not. Just don't get caught up in thinking any big data problem is only solvable by Hadoop.

  • I've found that a lot people want to get on the "Big Data" band wagon because there's some pretty good money to be had there but they try to do so without actually understanding or being able to handle "normal" or even "small" data. 😉

    And, yeah... I agree that "Big Data" has become mostly a marketing term much like DevOps has become a marketing term.  There are places where "Big Data" actually is big data but I think the opportunities there are far less than the opportunities for someone that has become a "Ninja" on the normal stuff.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply