Can Data Save the World?

  • Good article, Steve. Got a question and a comment.

     First the question. What did the authors of Freakonomics mean by the "... data as an important part of analyzing data"? That's a very meta statement. I don't see how data can achieve consciousness to analyze itself. Sorry for being so obtuse about this.

     Second, I've made suggestions in the past to managers and co-workers (not other developers or DBAs) that we should analyze the data we had for insights. Working for non-profits as I have for so long, the motive isn't necessary to build a better widget, because we didn't build widgets. Perhaps that's why no one every fully took me up on the challenge. But I still felt like we could learn from what the data showed, or at least see trends. At my previous job we had over 15 years worth of data related to substance abuse. I brought up several times the idea that someone who knew the right questions to ask (I didn't) could probably mine that data for insights. Never happened. Now I work for state government and the same thing happens here. We collect data statewide and report it to the Federal government. The only important considerations are that the web app be up and running, the database be available, the data collected is accurate and the point at which we report to the Feds. Once those past, that data is for all intents and purposes, irrelevant. There's no interest, as far as I can discern, in seeing what trends there might be. The nature of the data here at this job is so specific that I cannot even begin to draw any conclusions from it. That's a shortcoming. You order to do any analysis, you've got to know what questions to ask. I don't know.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • HighPlainsDBA - Tuesday, December 5, 2017 8:53 AM

    I thinks it's much easier to teach someone a programming language (it's mostly specific rules) then critical thinking skills (analysis/insight). Data is pretty useless without good analysis. Data scientists are like economists in that they are only as good as the models they build. Perhaps a better comparison are real scientists at CERN poring over ludicrous amounts of data looking for patterns that describe new particles and trying to separate the noise from the music. I believe analytical skills are far more important to an IT professional than any language skill you will ever know. They're merely tools.

    Very good point, HighPlains. Without critical thinking skills, the programming is going to be worthless before and after.  So it takes those skills in all areas of data, from start to finish.  And while some of these thinking skills can be taught, I think there is some innate ability that is very helpful too.  Programming is both an art and a science.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Rod at work - Tuesday, December 5, 2017 9:23 AM

    Good article, Steve. Got a question and a comment.

     First the question. What did the authors of Freakonomics mean by the "... data as an important part of analyzing data"? That's a very meta statement. I don't see how data can achieve consciousness to analyze itself. Sorry for being so obtuse about this.

     Second, I've made suggestions in the past to managers and co-workers (not other developers or DBAs) that we should analyze the data we had for insights. Working for non-profits as I have for so long, the motive isn't necessary to build a better widget, because we didn't build widgets. Perhaps that's why no one every fully took me up on the challenge. But I still felt like we could learn from what the data showed, or at least see trends. At my previous job we had over 15 years worth of data related to substance abuse. I brought up several times the idea that someone who knew the right questions to ask (I didn't) could probably mine that data for insights. Never happened. Now I work for state government and the same thing happens here. We collect data statewide and report it to the Federal government. The only important considerations are that the web app be up and running, the database be available, the data collected is accurate and the point at which we report to the Feds. Once those past, that data is for all intents and purposes, irrelevant. There's no interest, as far as I can discern, in seeing what trends there might be. The nature of the data here at this job is so specific that I cannot even begin to draw any conclusions from it. That's a shortcoming. You order to do any analysis, you've got to know what questions to ask. I don't know.

    I didn't get a chance to listen through this particular podcast, but the point of the "data is an important part of data analysis" line is that it's not the ONLY part. In addition to the dat itself you need to be able to describe the actual analysis being done, whether additional data is needed or missing that could influence the analysis, etc... In order folk your audience to lend credence to analysis, it isn't unreasonable for them to expect that we (the data analysis team) can explain how  we came to the conclusions we're presenting, in "business terms".

    Good data with bad  methods or assumptions will yield bad answers just as fast as bad data will.  It's frankly healthy to have some due process where the user can review and choose to accept your findings or not: blindly accepting data analysis is almost as bad as blindly refusing to accept data analysis IMO.

    It's frankly not unlike the motto we see from any of the denizens here:  we can show you HOW to do something, but if you can't explain or understand what the code does, don't deploy it .

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Rod at work - Tuesday, December 5, 2017 9:23 AM

    Good article, Steve. Got a question and a comment.

     First the question. What did the authors of Freakonomics mean by the "... data as an important part of analyzing data"? That's a very meta statement. I don't see how data can achieve consciousness to analyze itself. Sorry for being so obtuse about this.

     Second, I've made suggestions in the past to managers and co-workers (not other developers or DBAs) that we should analyze the data we had for insights.
    ...
    You order to do any analysis, you've got to know what questions to ask. I don't know.

    Re: first : That phrase is mine, but the idea I gleaned from the podcast (there's a transcript as well) is that we need to ensure we have good data, it's intact, it's not been too massaged or altered. We need enough to be representative, and that data is an important part, but not the only part. You can listen/read to see if I've interpreted correctly.

    Second: This is true, which is often why I can't do much  more than try to discern some pattern in data. I have no idea if it's relevant, useful, or even correct without domain knowledge. Why those who analyze the data, technically, often need guidance from those that understand the information.

  • Heh... Granny use to have a great saying that applies to data... "Figures can lie... and liars figure".  A story can be warped to make anything it wants out of the data.  Here's one of my favorite examples...
    https://www.youtube.com/watch?v=lzxVyO6cpos 
    Of course, there are interpretations of what figures say that the answer should be without the incorrect math that Costello used. 
    https://www.youtube.com/watch?v=XM1NNRNmZ6c 
    And, you can justify just about any numbers you want and come to any conclusion you want.
    https://www.youtube.com/watch?v=IQd1oDsHVSc

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's another couple of problems that people from "all walks of data involvement" make... out of context assumption and just being in too much of a hurry.  Take a look at the mistake the instructor makes at time 18:54 in the following movie...
    https://www.youtube.com/watch?v=3c-Bf7F7gnI

    The instructor claims the second query has an "impossible predicate" and the BETWEENs should be separated by an OR instead of the given AND.  It's NOT an impossible predicate and it could be refactored as WHERE val BETWEEN 50 AND 100.  Changing the subject AND to an OR would change the nature of the predict and possibly produce an incorrect result set to be returned.

    I say "possibly" because the requirements that the second query aren't actually known.  It IS possible that whomever wrote the query made a mistake and it SHOULD have been an OR instead of an AND especially if the 1, 50, 100, and 150 were variables that defined limits. The AND says to return only the INTERSECTION (all common rows of both sets in this case even if duplicates occur) of the two sets of limits and the OR says to return UNION (ALL of both sets, in this case).  Depending on the requirements, one will be correct but the requirements weren't given.

    Shifting gears a bit, there's a super powerful tool that too many people forget to use to always make the requirements understood in code... Comments.  My stance is that if all code were removed, the comments that remain should allow you to build a functional flow chart and I enforce that in the peer reviews I do.  Heh... to paraphrase the old saying, "If you can't explain it simply, then you probably don't know it well enough".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, December 6, 2017 6:57 AM

    Shifting gears a bit, there's a super powerful tool that too many people forget to use to always make the requirements understood in code... Comments.  ....

    Back in my C coding days (and still), when creating a function, I would comment the definition, expected inputs and outputs before I wrote a single line of code.

    ...

    -- FORTRAN manual for Xerox Computers --

  • Jeff Moden - Tuesday, December 5, 2017 9:57 PM

    Heh... Granny use to have a great saying that applies to data... "Figures can lie... and liars figure".  A story can be warped to make anything it want's out of the data.  Here's one of my favorite examples...
    https://www.youtube.com/watch?v=lzxVyO6cpos 
    Of course, there are interpretations of what figures say that the answer should be without the incorrect math that Costello used. 
    https://www.youtube.com/watch?v=XM1NNRNmZ6c 
    And, you can justify just about any numbers you want and come to any conclusion you want.
    https://www.youtube.com/watch?v=IQd1oDsHVSc

    One of the modules in my first degree was Statistics. In the final lecture the lecturer quoted the phrase "Lies, damn lies and statistics!". The manipulation of data for an organisations own purposes is disgusting. Financial institutions produce graphs with suppressed zeros and a newspaper would rather say "Cases of Jungle Green Fever Double!" rather than report the number of cases in a population of 100 million increase from 1 to 2, i.e. statistically irrelevant.

    Data only benefits if the right data is collected and it is analysed properly. Currently to much data is being collected for data's sake.

  • Is telling a great story important when conveying insights from data? Absolutely. It's one the most saught out qualifications for most new data professionals today who work in some type of data analysis position such as a data scientist. Actually, most people hiring for data scientist positions for example, are trying to find more people who can tell a good story along with having the statistical chops to boot.

    But, outside of just telling a great story like being a great salesman, having the ability to extract meaningful insights is just as important as how you convey them. Anyone can tell a story, but in most cases, we are aiming for a non-fiction story than a fictional one.

    From my professional experience working in the BI or data science sector, the problems of putting experience over data is pretty common. I don't think experience is irrelevant, but human factors can skew the data from non-fiction to fiction based on human assumptions or gut feelings. 

    At the end of the day, I put a lot of stock in having good data first, ability to unearth deep insights second, and telling a story third. Some of the key parts of telling a story for me is not only the ability to convey what you found, but also if you tell the person(s) on the other end what they should do (recommendations) and how they should feel about those findings.

  • Matt Miller (4) - Tuesday, December 5, 2017 12:59 PM

    Rod at work - Tuesday, December 5, 2017 9:23 AM

    Good article, Steve. Got a question and a comment.

     First the question. What did the authors of Freakonomics mean by the "... data as an important part of analyzing data"? That's a very meta statement. I don't see how data can achieve consciousness to analyze itself. Sorry for being so obtuse about this.

     Second, I've made suggestions in the past to managers and co-workers (not other developers or DBAs) that we should analyze the data we had for insights. Working for non-profits as I have for so long, the motive isn't necessary to build a better widget, because we didn't build widgets. Perhaps that's why no one every fully took me up on the challenge. But I still felt like we could learn from what the data showed, or at least see trends. At my previous job we had over 15 years worth of data related to substance abuse. I brought up several times the idea that someone who knew the right questions to ask (I didn't) could probably mine that data for insights. Never happened. Now I work for state government and the same thing happens here. We collect data statewide and report it to the Federal government. The only important considerations are that the web app be up and running, the database be available, the data collected is accurate and the point at which we report to the Feds. Once those past, that data is for all intents and purposes, irrelevant. There's no interest, as far as I can discern, in seeing what trends there might be. The nature of the data here at this job is so specific that I cannot even begin to draw any conclusions from it. That's a shortcoming. You order to do any analysis, you've got to know what questions to ask. I don't know.

    I didn't get a chance to listen through this particular podcast, but the point of the "data is an important part of data analysis" line is that it's not the ONLY part. In addition to the dat itself you need to be able to describe the actual analysis being done, whether additional data is needed or missing that could influence the analysis, etc... In order folk your audience to lend credence to analysis, it isn't unreasonable for them to expect that we (the data analysis team) can explain how  we came to the conclusions we're presenting, in "business terms".

    Good data with bad  methods or assumptions will yield bad answers just as fast as bad data will.  It's frankly healthy to have some due process where the user can review and choose to accept your findings or not: blindly accepting data analysis is almost as bad as blindly refusing to accept data analysis IMO.

    It's frankly not unlike the motto we see from any of the denizens here:  we can show you HOW to do something, but if you can't explain or understand what the code does, don't deploy it .

    Thank you Matt, I appreciation the clarification.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Steve Jones - SSC Editor - Tuesday, December 5, 2017 4:45 PM

    Rod at work - Tuesday, December 5, 2017 9:23 AM

    Good article, Steve. Got a question and a comment.

     First the question. What did the authors of Freakonomics mean by the "... data as an important part of analyzing data"? That's a very meta statement. I don't see how data can achieve consciousness to analyze itself. Sorry for being so obtuse about this.

     Second, I've made suggestions in the past to managers and co-workers (not other developers or DBAs) that we should analyze the data we had for insights.
    ...
    You order to do any analysis, you've got to know what questions to ask. I don't know.

    Re: first : That phrase is mine, but the idea I gleaned from the podcast (there's a transcript as well) is that we need to ensure we have good data, it's intact, it's not been too massaged or altered. We need enough to be representative, and that data is an important part, but not the only part. You can listen/read to see if I've interpreted correctly.

    Second: This is true, which is often why I can't do much  more than try to discern some pattern in data. I have no idea if it's relevant, useful, or even correct without domain knowledge. Why those who analyze the data, technically, often need guidance from those that understand the information.

    Thank you Steve, I appreciate the clarification.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • I think data people get blindsided by cognitive bias and sometimes the cynicism that comes with age helps to predict how people will react to a result from data.
    If the data disproves the pet theory of a senior decision maker then the data must be wrong.  If it confirms their prejudice, even if woefully sparse and poor quality it must be right.
    Sometimes the above has tragic consequences i.e. death rates at North Staffordshire hospital.  Sometimes the right conclusion is drawn but not welcome

  • David.Poole - Wednesday, December 6, 2017 12:18 PM

    I think data people get blindsided by cognitive bias and sometimes the cynicism that comes with age helps to predict how people will react to a result from data.
    If the data disproves the pet theory of a senior decision maker then the data must be wrong.  If it confirms their prejudice, even if woefully sparse and poor quality it must be right.
    Sometimes the above has tragic consequences i.e. death rates at North Staffordshire hospital.  Sometimes the right conclusion is drawn but not welcome

    Agreed.  To make a very long story much shorter, I was exposed to some data at a Fortune 100 company that I used to work at.  It was my job to put the data into presentable format way back in the era of 9 pin Dot Matrix printers.  Being of a curious nature, I analyzed each of the 50 major projects (1 per page) and figured that the company would have to lay off half the people about 2 years in the future (I named the year and month in my estimation) unless something changed.  I took it up the chain of command all the way to the GM and I was told "You don't have a degree and you're not a CPA and so you don't know what you're talking about.  The people that do have degrees and are CPAs have all stated that everything is good" at every level.  I tried to convince them otherwise using the very numbers the others had used (the numbers were, in fact, accurate) for more than a year and got the same line of BS.

    On the month and year that I had predicted, they had to lay off half the people (about 1500 or so).  The really aggravating part is they maintained that "everything is good" even in the company wide meeting just two weeks before the layoffs.

    It hurt a lot of people and I've never forgiven them for their arrogance or their stupidity and was a demonstration of a "business as normal" attitude that caused me to seek employment elsewhere even after 15 years of good service to the company.  It's also why I say that "Business Intelligence" is frequently an oxymoron.  The data was there... their brains were not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Getting back to the subject at hand... "Can data save the world"?  Yes it can.  We just need to find people that know how to use the data correctly and then stop beating on them just because they're the minority.  A lot of that kind of garbage happens even in our own industry.  If you don't think so, look at some of the things that people have gotten away with calling a "Best Practice" (and, no... I'm NOT saying true best practices are bad) and how many people follow those "Best Practices" simply because they've mistaken the din of the crowd as the wisdom of the group.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, December 6, 2017 8:30 PM

    We just need to find people that know how to use the data correctly and then stop beating on them just because they're the minority. 

    Do you think that is a realistic proposition?

Viewing 15 posts - 16 through 30 (of 67 total)

You must be logged in to reply to this topic. Login to reply