Why no posts?

  • I've heard Microsoft promote their concept of the "modern data warehouse" and their tools to support machine learning.  In my mind, there are at least 2 big components embedded in what they say: marketing to promote adoption of their cloud services and genuine opportunities to add value to the community.

    When I hear the word "modern" in "modern data warehouse", I also think about the words "traditional" and "old-school" when considering the on-prem warehouse workloads I currently support.  My question is why is there significantly less activity in the Cloud Computing forums on this site compared to the SQL Server 20xx forums?  I think Microsoft would like us to feel pressure to quickly move forward with the cool kids and adopt all of their services, but I wonder what everyone else is doing?  Is your workload mostly on-prem or cloud?  Are you using Azure Machine Learning?

  • To answer both of your questions for what I'm currently doing... 100% on-prem for SQL Server.

    As for the MS versions of machine learning in general, the only thing I've done with it is watch a couple of really lame demos by MS people and others alike.  When I went back through the code of one individual that claimed strong ties with MS, I found that it wasn't actually a reasonable example because that person made no real attempt at using the "machine learning" part of the demo and it came up with wrong answers both for history and for the predictions.  A simple GROUP BY did a better job in the area of history and a simple use of a Tally Table and a single formula did a much more accurate job of predicting.

    I'm not saying that "Machine Learning" isn't a good thing.  In fact, I embrace it.  I've just not seen anyone do it right yet and the lame demos are leaving a really bad taste in my mouth.  Full disclosure:  I've been too busy to give it a fair chance by teaching myself from the documentation and I've not looked much online for good examples either because I also don't need it right now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff!

  • Where I'm at we do about 80% of our data warehouse work on premises and 20% in the cloud.  I've worked in IT for 25 years, and it's important to be able to separate the true trends from the hype.  Yes there are cases where cloud makes sense, depending on what the original source of the data is and who needs to access it, but honestly to me it seems the more things change the more they stay the same.  What's important is to measure the pros and cons of each method and find what works best for the situation you are in and problem you're trying to solve and separate facts from the hype.

  • I actually would love to play around with Machine Learning but not on Azure - unless it's about training very big models - as this would rather make me poor rather quickly.

    The issue is for me: I haven't figured out yet how to use it in a sensible way because if I had I would skip all this Node Red / Apache Spark / Flink / Kafka stuff and point some sensors directly to MSSQL Server. I do have a potential use case but at the current stage of the project and SQL Server Version I wouldn't want to invest the time into this, SQL 2019 Big Data Cluster might change that a bit but right now I'm not exactly holding my breath about eventually running a big fat SQL Server with GPGPUs in it.

    Things look more like using Single Board Computers (like Raspberry Pi, Nvidia AGX Xavier) and TPUs (like Intel Neural Compute Stick, Google Coral USB Accelerator) is the way to go which leaves us with not many things you might want to process somewhere else - at least in case of sensor data I believe, a GIS where your trucks' route to the next manufacturing plant could be changed in real-time to avoid traffic jams is something I think Machine Learning on R & MSSQL Services is suitable for.

    My workload is 100% on-prem, too well mostly. There is some PowerBI Project coming up which brings in the possibility of Azure even tho we're definitely going to deploy local Reporting Services.

  • Our warehouse is in the cloud, but really in an IaaS VM that runs a warehouse database.

    ML? It's hard, and as Jeff mentioned, often T-SQL finds similar results that are easier. I think ML works well in some domains, like imaging and speech, but in data warehouse analysis, not sold yet this is better, mostly because I think we, or data scientist (ish) people, struggle to know how to frame a question that isn't easily answered with traditional analysis.

  • I've been highly inactive for like a year it seems, but here is my input.

    I am still 100% invested in Azure Data Warehouse and Azure Data Lake Store with Azure Data Lake Analytics.

    Data -> Store -> Analytic Engine -> Warehouse -> SQL DB -> Power BI

    When it comes to activity, got to remember, a lot that applies to Azure Data Warehouse also applies to SQL Server. There are a few differences in what is available between cloud and on-prem. But, there is many common things too. Thus, you see a lot of related questions that may be tied to on-prem, but actually equate to both.

    For Azure Machine Learning, I have used it a lot. The main benefit of Azure Machine Learning is taking the ML out of your application (e.g.: hard coded) and putting it somewhere else where your app can interact with it via API's (or embedding it). When I went down the path of exploring Azure Machine Learning, I quickly realized I am not developing applications for my data. It's mostly for analytical and operational reporting use cases.

    Many of the tools we use have ML features now. Power BI for example has plenty of ML features that do not require Azure ML to thrive on. Azure Data Lake Analytics (the analytics engine in my flow) also has ML features such as a couple of options to wrap or upload full ML modules/code as part of the U-SQL jobs. Again, not needing Azure ML to function.

    Outside of that, I do love Azure ML. It allows for similar approaches that you may take with utilizing stored procs versus other approaches with your apps. Having the ML completely separate, outside of the raw code, allows for the data scientist to update and maintain that ML package easier. It's a really nice feature if you really want to enable your applications to have supervised or unsupervised learning on top of just providing ML to Excel and other apps your team may be using.

    The only downside is extracting the coefficients seem non-existent with Azure ML, which can be a pain for the DS teams.

  • xsevensinzx wrote:

    I've been highly inactive for like a year it seems, but here is my input.

    I am still 100% invested in Azure Data Warehouse and Azure Data Lake Store with Azure Data Lake Analytics.

    Data -> Store -> Analytic Engine -> Warehouse -> SQL DB -> Power BI

    When it comes to activity, got to remember, a lot that applies to Azure Data Warehouse also applies to SQL Server. There are a few differences in what is available between cloud and on-prem. But, there is many common things too. Thus, you see a lot of related questions that may be tied to on-prem, but actually equate to both.

    For Azure Machine Learning, I have used it a lot. The main benefit of Azure Machine Learning is taking the ML out of your application (e.g.: hard coded) and putting it somewhere else where your app can interact with it via API's (or embedding it). When I went down the path of exploring Azure Machine Learning, I quickly realized I am not developing applications for my data. It's mostly for analytical and operational reporting use cases.

    Many of the tools we use have ML features now. Power BI for example has plenty of ML features that do not require Azure ML to thrive on. Azure Data Lake Analytics (the analytics engine in my flow) also has ML features such as a couple of options to wrap or upload full ML modules/code as part of the U-SQL jobs. Again, not needing Azure ML to function.

    Outside of that, I do love Azure ML. It allows for similar approaches that you may take with utilizing stored procs versus other approaches with your apps. Having the ML completely separate, outside of the raw code, allows for the data scientist to update and maintain that ML package easier. It's a really nice feature if you really want to enable your applications to have supervised or unsupervised learning on top of just providing ML to Excel and other apps your team may be using.

    The only downside is extracting the coefficients seem non-existent with Azure ML, which can be a pain for the DS teams.

    It's nice to see someone that's eyeballs deep in it in a good way.  With that in mind, I'm incredibly curious as to what questions you're using ML to answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's nice to see someone that's eyeballs deep in it in a good way.  With that in mind, I'm incredibly curious as to what questions you're using ML to answer.

    Well, I work in advertising. There are plenty of use cases for ML in that industry. The first being forecasting how ads will perform. Others may be forecasting spend-to-conversions. Trying to find different data points that drive sales up or down etc.

    Other use cases are using ML to analyze creatives. Looking at RGB, brand detection, landmarks, if a photo is racy or not, etc to understand what is driving interaction or sales.

    Another more common one is just to help classify new datasets. For example, you can use K-Means Clustering to identify a combination of data points of having some type of relationship that may allow you as the database developer to create a new classification based on that output. Then that classification can be then used to make decisions.

    Let's not forget the targeting, audience segmentation, bidding, etc that has to happen in real time based on all this data. ML is used to computate, build, and automate these datasets where AI is used to make the decision with the result set.

     

    • This reply was modified 4 years, 11 months ago by  xsevensinzx.
    • This reply was modified 4 years, 11 months ago by  xsevensinzx.
    • This reply was modified 4 years, 11 months ago by  xsevensinzx.
  • Awesome!  I think you're the first person I've talked to that's actually using it for something that I think it's useful for.  As soon as you said that you work in advertising, I knew what was coming.

    You should pick one of the many things your using it for and write an article about how you did it.  I know that I'd be a reader of it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I mean, there are many other uses in other industries for sure. I had an interview with a wood pellet company for alternative fuel. They wanted to explore using ML on the data gathered from their many plants and equipment. The data, if used correctly, could optimize how they are making their products more efficiently.

    I mean, in theory, that's all we are doing in advertising. The machine is Google and the product is the ad. We use things like attribution to see how that ad is working across all channels, not just Google. Then optimize according to what ML is telling us.

  • We used to use ML type systems a decade ago to read marks on wood from a scanner. These were used to grade the wood and help workers reduce the manual work they needed to do. This was rudimentary by today's standards, but I could see this working much better now with better equipment.

    The best use I've seen is the jet engine twins from GE. The twins run models based on telemetry from the engines, including inspection videos, to help determine which ones need service now. It's both improved efficiency and also better targeted maintenance by catching items that inspectors sometimes miss.

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply