Becoming a Data Scientist

  • Rod at work - Thursday, April 27, 2017 8:25 AM

    Tony++ (love that "++", btw), your post makes me wonder, are certain places better for data scientists than others? For example, when I was unemployed a few years ago, I went to the unemployment office several times seeking help and direction. During one of those visits the counselor that mentioned that I might have a harder time finding a new job in this state because for the most part most developer positions are for what he called, "lunchbox programmers". I didn't fit in with that.

    So perhaps the same is true about data scientists? Are more jobs for them in places like Silicon Valley, NYC or other major places of commerce?

    What is meant by the term "lunchbox programmer"; I googled it and found nothing specific, except for an IT staffing agency called "LunchBox". Perhaps lunch-boxing refers younger folks who pack their lunch to work and eat together as a group?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • With the inclusion of Python alongside R in SQL Server 2017 are we not all heading in a similar direction. I think whatever the title we're all data related professionals, and I think pretty soon even us hardcore DBA's will have to have a serious background in data analytics. Small companies cannot afford all the roles so they will want an Analytical DBA who can use Python, R and Power BI in the same way they now require a BI\Developer\DBA.

    I have also been working through the MPP Data Science course, loved the introduction to Power BI, and I have now brought that into my current environment in a big way. Looking forward to getting more involved with Python and R once we upgrade our environment to a version of SQL where we can use those.

  • Eric M Russell - Thursday, April 27, 2017 8:38 AM

    Rod at work - Thursday, April 27, 2017 8:25 AM

    Tony++ (love that "++", btw), your post makes me wonder, are certain places better for data scientists than others? For example, when I was unemployed a few years ago, I went to the unemployment office several times seeking help and direction. During one of those visits the counselor that mentioned that I might have a harder time finding a new job in this state because for the most part most developer positions are for what he called, "lunchbox programmers". I didn't fit in with that.

    So perhaps the same is true about data scientists? Are more jobs for them in places like Silicon Valley, NYC or other major places of commerce?

    What is meant by the term "lunchbox programmer"; I googled it and found nothing specific, except for an IT staffing agency called "LunchBox". Perhaps lunch-boxing refers younger folks who pack their lunch to work and eat together as a group?

    Sorry, it was a phrase that counselor at the unemployment office used. By "lunchbox programmer" he meant that the developer basically just writes simpler apps. They're not working at the latest level. Think of someone who writes simple queries, brings up reports either in MS Access or Crystal. Sort of like someone who punches in and out each day. I know most programmers, DBAs and the like don't, even in this state, but that's the idea. They work on what's been proven to work about 5 to 10 years ago and basically always do that. Not really interested in learning anything new. I hate to say it of the state I work in, but that describes the majority of developers here. There are exceptions and truly cutting edge development does occur here, but it is the rare exception rather than the rule.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • RandomEvent - Thursday, April 27, 2017 8:43 AM

    With the inclusion of Python alongside R in SQL Server 2017 are we not all heading in a similar direction. I think whatever the title we're all data related professionals, and I think pretty soon even us hardcore DBA's will have to have a serious background in data analytics. Small companies cannot afford all the roles so they will want an Analytical DBA who can use Python, R and Power BI in the same way they now require a BI\Developer\DBA.

    I have also been working through the MPP Data Science course, loved the introduction to Power BI, and I have now brought that into my current environment in a big way. Looking forward to getting more involved with Python and R once we upgrade our environment to a version of SQL where we can use those.

    just taken a look at the MPP Data Science course......wondered if you paid to get the certifications and if so, do you think it worth the money?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Rod at work - Thursday, April 27, 2017 8:50 AM

    Eric M Russell - Thursday, April 27, 2017 8:38 AM

    Rod at work - Thursday, April 27, 2017 8:25 AM

    Tony++ (love that "++", btw), your post makes me wonder, are certain places better for data scientists than others? For example, when I was unemployed a few years ago, I went to the unemployment office several times seeking help and direction. During one of those visits the counselor that mentioned that I might have a harder time finding a new job in this state because for the most part most developer positions are for what he called, "lunchbox programmers". I didn't fit in with that.

    So perhaps the same is true about data scientists? Are more jobs for them in places like Silicon Valley, NYC or other major places of commerce?

    What is meant by the term "lunchbox programmer"; I googled it and found nothing specific, except for an IT staffing agency called "LunchBox". Perhaps lunch-boxing refers younger folks who pack their lunch to work and eat together as a group?

    Sorry, it was a phrase that counselor at the unemployment office used. By "lunchbox programmer" he meant that the developer basically just writes simpler apps. They're not working at the latest level. Think of someone who writes simple queries, brings up reports either in MS Access or Crystal. Sort of like someone who punches in and out each day. I know most programmers, DBAs and the like don't, even in this state, but that's the idea. They work on what's been proven to work about 5 to 10 years ago and basically always do that. Not really interested in learning anything new. I hate to say it of the state I work in, but that describes the majority of developers here. There are exceptions and truly cutting edge development does occur here, but it is the rare exception rather than the rule.

    Like I said earlier, there some need for data science within the industry, even in the "fly over" states, but my belief is that even the most cutting edge IT organizations on the east and west coast are probably not really looking to add a Data Scientist to their full-time staff. The way I see it, Data Science (both the infrastructure and the skillset) is something most companies will want to spin-up as needed, sort of like marketing campaigns. So, if you want a career in data science, then you'll probably need to be a consultant or contractor willing to travel, or work for a company who specialize in in providing data science as a service.

    For example, back in the '90s "webmaster" was a hot job title, and every mid to large sized company had one on staff. However, today it's a commodity and most organizations outsource that.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Aaron N. Cutshall - Wednesday, April 26, 2017 6:04 AM

    To be honest, I'm rather suspicious of the whole "data science" aspect.  In my humble opinion, much of what I've done over my career could be considered "data science" with advanced data analysis, result aggregations and projections, comparative analysis and even including "what-if" scenarios.  All of that was done with familiar tools such as T-SQL, SSRS and occasionally Excel.

    Lately, I've seen multiple examples of queries from "data scientists" that were simply abysmal.  Either the queries were written by someone who clearly does not have the most basic concept of writing moderately efficient queries with a minimal understanding of relational concepts and a basic knowledge of the data organization or the queries were written by some sort of tool that produced queries based upon obscure criteria given by someone who still has a lack of understanding how the data is organized.  Not only were the queries horribly inefficient (for some queries in particular I could write a whole paper on everything that was done wrong and why), but they were executed on a production server while the primary daily job was running.  This impacted not only the system in a general sense but specifically impacted the very tables the primary job required for updates.  As expected, this had a severe negative impact on the Service Level Agreement (SLA) the customer expected of the primary job.  The fact that the SLA is without regard for whatever other customer activity that may occur on their production server is a topic for another discussion.

    I'm sure the "data scientist" had no inkling of the impact that he/she had on the system and merely wanted the data sought so it could be placed into a location for further analysis.  Yet, it seems prudent to hold a "data scientist" to a higher standard than a junior level database developer.  While statistics are often employed for data science, it is much more than merely applying statistical models against data to see what the outcome may be.  Much of it is truly understanding the data, reviewing it to look for patterns (or the lack thereof) to identify trends and to determine how to leverage that knowledge into useful activity to support the business endeavor.  In that very sense, many of us are already data scientists.

    Data scientist or not, we all have a responsibility to be prudent in how we perform our duties such as performing data analysis on a copy of production, not production itself, to avoid an adverse impact on production operations.  We need to be mindful of others and not operate in isolation.  In truth, we need to always consider the impact that we have on each other and the business.  After all, if we're not making a positive impact, why are we there?

    Let's not confuse acquiring data as the only job a data scientist has. This is by far the weakest part of most data scientist as they are likely not as technical as a SQL Developer, DBA, BI Developer or most importantly, the new trending hire with this role, the Data Engineer (the role I fill as the architect with my data science team).

    I think one must come to the understanding that simply querying data with SQL, spotting trends and putting it into a pretty graph within SSRS, Tableau, or PowerBI is not what makes up a data scientist. It's truly analyzing the data with algorithms/models they create or reuse, it's reconforming the data based on their hypothesis, and most importantly, it's using math and statistics to explain and or prove the results they come up with.

    As a data architect, I can easily conform the data to anything I want within my data warehouse. I can easily put data into a regression model and generate pretty scatter plots. I can show cool trends using other graphs and whatnot. I can even go as far as using SSAS, create a data mining project, feed it data and do forecasting with SSRS with the Time Series algorithms that is automated with the ETL. But, I cannot explain why I chose the data points I used nor explain why I eliminated the data points I didn't use with said math and statistics. And this is why I am not a data scientist just because I can query and push the data into pretty pictures.

    I have yet to meet a BI Developer, ETL Developer, DBA or whatever that can. They are all technical experts who know the engines, that know how data is structured, that know security, encryption and so much more that is not anything tied to telling me why in math and statistics this happened. And to be pretty honest, most don't want to. Their role is to take that spaghetti code the data scientist made along with the work he/she is trying to do and support it, secure it, and most importantly, automate and scale it. Not complain they can write it better.

    To end here. One of the other big points to highlight here is that most data scientist are domain specific experts. This is what separates them from just a statistician who lives and breathes statistical methodologies. The main reason I will never be a data scientist is the fact I am not the domain expert on what we are using the data explain. Almost every data scientist I work with are experts or on their way to being experts in the business. This means if you're working in advertising, they understand what the client is looking for or how they are applying it more than I do.

  • I work in an analytics department. We have two levels of modelers - Predictive Modelers (working on or have their masters degree in the relevant field) and Data Scientists (working on or have their PhD in the relevant field). These are some serious number crunchers.

    For them it isn't as much about going out and getting data as figuring out what data will be applicable to the outcome they need to model. So while they're doing queries and such they're also writing regressive R scripts using modeling techniques that I can't fathom. I tried looking them up. The math barely has numbers.

    I'm learning basic R so that I have a leg up when 2016 comes in here and to have some idea of what the scripts do. I'll be doing the same with Python.

    But true data scientists make data jump through a lot of hoops to come back with meaningful numbers.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply