Amateur Data Analysis

  • Steve Jones - SSC Editor (8/11/2015)


    I agree you need to know something, but what is that bar? Can I analyze college spending? I think I can, though there are probably things I don't realize about the numbers. However if someone can point out a problem with my hypothesis and we debate it, I change and improve, that's progress.

    Can I analyze drug test results? No. I have no concept of how to even approach the data, so I'd have to seek some help to understand what the data means. However I could go from there with some help. Or I could help someone with knowledge learn to put together a report or analysis with my skills in working with data.

    I would say there are lots of data sets on which those of us working with data could begin an analysis.

    Well, I think that's really the separation in train of thought.

    Simple analysis like spending over time and so forth are just that, simple. Most analysis in my experience are doing the deep dives. They are jumping into areas where they are going beyond those simple things anyone can figure out.

    That's when I think the business knowledge comes into light the most.

    But, I'm not trying to shoot you down here or anything. No knowledge of the business can be beneficial too, but not ideally from an amateur standpoint. Someone who is experience in one industry who has no prior knowledge of the new industry he/she is working in can bring tactics into a mix that maybe has not been used before in that industry. The result would be similar to someone thinking outside of the box in a business where all the other analysis are stuck inside that box and always looking at the data in the same way.

    While you can argue that particular analysis has no business knowledge of that particular business, I'd still argue they have general business knowledge from other businesses versus someone completely wet behind the ear. It would likely be a good comparison of me switching from SQL Server to Oracle or Ruby to Python. Normally some standards do carry over that's relevant to both.

  • xsevensinzx (8/12/2015)


    Steve Jones - SSC Editor (8/11/2015)


    I agree you need to know something, but what is that bar? Can I analyze college spending? I think I can, though there are probably things I don't realize about the numbers. However if someone can point out a problem with my hypothesis and we debate it, I change and improve, that's progress.

    Can I analyze drug test results? No. I have no concept of how to even approach the data, so I'd have to seek some help to understand what the data means. However I could go from there with some help. Or I could help someone with knowledge learn to put together a report or analysis with my skills in working with data.

    I would say there are lots of data sets on which those of us working with data could begin an analysis.

    Well, I think that's really the separation in train of thought.

    Simple analysis like spending over time and so forth are just that, simple. Most analysis in my experience are doing the deep dives. They are jumping into areas where they are going beyond those simple things anyone can figure out.

    That's when I think the business knowledge comes into light the most.

    But, I'm not trying to shoot you down here or anything. No knowledge of the business can be beneficial too, but not ideally from an amateur standpoint. Someone who is experience in one industry who has no prior knowledge of the new industry he/she is working in can bring tactics into a mix that maybe has not been used before in that industry. The result would be similar to someone thinking outside of the box in a business where all the other analysis are stuck inside that box and always looking at the data in the same way.

    While you can argue that particular analysis has no business knowledge of that particular business, I'd still argue they have general business knowledge from other businesses versus someone completely wet behind the ear. It would likely be a good comparison of me switching from SQL Server to Oracle or Ruby to Python. Normally some standards do carry over that's relevant to both.

    The issue I have seen is that someone not aware of the data rules can misunderstand the results and make a loud statement of their findings. Depending on the "rock star" this can be a huge problem and costly to the business not only in money but in time, and added friction between groups. There is something for a "fresh pair of eyes" on the results and data. I don't mind the questions and clarification but I do mind the rash statements from a "professional rock-star" aka bloody idiot!!

  • Yet Another DBA (8/12/2015)


    xsevensinzx (8/12/2015)


    Steve Jones - SSC Editor (8/11/2015)


    I agree you need to know something, but what is that bar? Can I analyze college spending? I think I can, though there are probably things I don't realize about the numbers. However if someone can point out a problem with my hypothesis and we debate it, I change and improve, that's progress.

    Can I analyze drug test results? No. I have no concept of how to even approach the data, so I'd have to seek some help to understand what the data means. However I could go from there with some help. Or I could help someone with knowledge learn to put together a report or analysis with my skills in working with data.

    I would say there are lots of data sets on which those of us working with data could begin an analysis.

    Well, I think that's really the separation in train of thought.

    Simple analysis like spending over time and so forth are just that, simple. Most analysis in my experience are doing the deep dives. They are jumping into areas where they are going beyond those simple things anyone can figure out.

    That's when I think the business knowledge comes into light the most.

    But, I'm not trying to shoot you down here or anything. No knowledge of the business can be beneficial too, but not ideally from an amateur standpoint. Someone who is experience in one industry who has no prior knowledge of the new industry he/she is working in can bring tactics into a mix that maybe has not been used before in that industry. The result would be similar to someone thinking outside of the box in a business where all the other analysis are stuck inside that box and always looking at the data in the same way.

    While you can argue that particular analysis has no business knowledge of that particular business, I'd still argue they have general business knowledge from other businesses versus someone completely wet behind the ear. It would likely be a good comparison of me switching from SQL Server to Oracle or Ruby to Python. Normally some standards do carry over that's relevant to both.

    The issue I have seen is that someone not aware of the data rules can misunderstand the results and make a loud statement of their findings. Depending on the "rock star" this can be a huge problem and costly to the business not only in money but in time, and added friction between groups. There is something for a "fresh pair of eyes" on the results and data. I don't mind the questions and clarification but I do mind the rash statements from a "professional rock-star" aka bloody idiot!!

    I hear that.

    That's why when we look at analyst, we look for people who typically work in the same industry who do understand something about the data rules or business. Ideally, not everyone is doing the same thing we are doing in our industry, but that's what training is for.

    New analyst who are wet behind the ears are like most positions, having their hand held until they gain enough business knowledge and experience to analyse the data correctly.

    Without it, you might as well be throwing random darts at a dart board. You need others familiar with the data and the business to justify your practice in order to move forward.

  • I think we're in close agreement here for the most part. In a business context or other organizational/decision recommendation context, I do think some knowledge is needed.

    My original piece was on doing this as a hobby, both to bring some new view to a problem, but also to practice analysis and presentation skills. With some comments and debate, I think one can gain some skill in better performing an analysis.

  • Practice analysis is well and good. The problem really is that putting visualizations around an analysis lends an amateur analysis an air of credibility to the lay audience, just like adding descriptive statistics does. The problem with both is that one can NOT necessarily draw any meaningful conclusions from either without a proper context.

    I've struggled in my position as an analyst to get staff, administrators, management, and senior management to understand how important context is to the work that I do, and to providing a valid interpretation for the data provide - for without the context, you don't have information, you just have data. I've struggled to get them to understand that with descriptive statistics, you can't infer conclusions without there being historical data and an analysis of variance of that data - and, even then, you still need a context to interpret that variance over time informed by external and internal factors which influence the business.

    They have come a long way in the eight years I've been on the job, but staff turn over and management changes provide a constant challenge in an organization that self-proclaims wanting to be a "data driven organization" -- to the extent that it is building a data warehouse, and attempting to provide standard business terms and metadata definitions for the entire enterprise so staff at various levels can self access and begin analyzing data within the context of their own job scopes. That last, in and of itself, is quite a challenge because the vast majority of staff are de facto "amateur data analysts." For individual units, like ours, there are folks like myself who can help educate staff by providing training, and within the enterprise there are training offerings in the various tools available as well as analytical techniques. We also have user groups across the enterprise to help bring up the standards, and several analytic teams available to assist individual units with training, consulting, or just developing additional tools for their staff to use.

    I'm all for more people learning, but what I fear is people developing skills, posting materials with dubious conclusions to social media or to "news sites" that have a political spin that sow misinformation rather than information -- as is all to common with amateur statistical analysis let alone analysis involving more modern and more powerful data mining and data visualization techniques (the latter of which are much more impressive to the lay person.)

  • casachs 74147 (8/12/2015)


    I've struggled in my position as an analyst to get staff, administrators, management, and senior management to understand how important context is to the work that I do, and to providing a valid interpretation for the data provide - for without the context, you don't have information, you just have data. I've struggled to get them to understand that with descriptive statistics, you can't infer conclusions without there being historical data and an analysis of variance of that data - and, even then, you still need a context to interpret that variance over time informed by external and internal factors which influence the business.

    They have come a long way in the eight years I've been on the job, but staff turn over and management changes provide a constant challenge in an organization that self-proclaims wanting to be a "data driven organization" -- to the extent that it is building a data warehouse, and attempting to provide standard business terms and metadata definitions for the entire enterprise so staff at various levels can self access and begin analyzing data within the context of their own job scopes. That last, in and of itself, is quite a challenge because the vast majority of staff are de facto "amateur data analysts." For individual units, like ours, there are folks like myself who can help educate staff by providing training, and within the enterprise there are training offerings in the various tools available as well as analytical techniques. We also have user groups across the enterprise to help bring up the standards, and several analytic teams available to assist individual units with training, consulting, or just developing additional tools for their staff to use.

    Sounds exactly like my situation. But, we are pretty big on ensuring the analyst are educating the teams on how to pull and consume the results.

  • Jeff Moden (8/6/2015)


    I've found that many people already have a notion of what they want to find in the data rather than finding what the data is telling them.

    So true.

  • Iwas Bornready (8/18/2015)


    Jeff Moden (8/6/2015)


    I've found that many people already have a notion of what they want to find in the data rather than finding what the data is telling them.

    So true.

    Except some of this is the scientific method. Have a hypothesis of what is there. Test, and then change. It's the last part some people forget.

  • Steve Jones - SSC Editor (8/18/2015)


    Iwas Bornready (8/18/2015)


    Jeff Moden (8/6/2015)


    I've found that many people already have a notion of what they want to find in the data rather than finding what the data is telling them.

    So true.

    Except some of this is the scientific method. Have a hypothesis of what is there. Test, and then change. It's the last part some people forget.

    Some people, especially those in upper management or in some brands of "think tanks", have preconceived notions of what they want the data to say and look for data to tell the story they want to tell, not the story the data itself tells, and they toss anything akin to the "scientific method" right out the window, unfortunately.

    Such is the way of the world... as Benjamin Disraeli put it, “There are three types of lies -- lies, damn lies, and statistics.” with which I agree when statistics are misrepresented and misused. But this needs to be brought up-to-date: "There are many types of lies -- lies, damn lies, statistics, and data visualizations." :w00t:

  • I'm going to suggest that the suggested example website of amateur data analysis (http://devnambi.com/2015/uc-analysis/) is a good example of well-intentioned BI off the rails.

    1. This Smacks of a Rhetorical Axe-grinding

    The author comes across as analytical b/c of the use of public data sets and graphs, but the rhetoric of his write-up makes it clear he is anything but objective about his subject. Phrases like these suggest bias and call into question the author's intent: "it's clear that all non-teaching activities are the tail that wags the dog", "the teachers and researchers are there as window dressing", "Administrators have no incentive to cut their own budgets". This last one is a conclusion that absolutely cannot be drawn from the data set: maybe there are incentives, but they aren't working? maybe there are external forces like new regulatory requirements or state policies, beyond administrators' control?

    But the average reader will nod and agree that spending is out of control and there aren't any measures in place to stop it.

    An honest researcher/analyst will not mix agenda with analysis and will, in fact, bend over backwards to try to look for ways to invalidate his/her own conclusions.

    2. Missing Essential Data

    Hands up: how many of you read this post and discovered that he presents no data to indicate how fast or even if tuition at UC is increasing?

    Yup, that's right. He shows a shocking graph at the beginning showing massive increases in college tuition over time in the United States. Nothing specific to UC.

    Then the rest of the article presents cost breakdowns by group, as a percentage of total costs at UC. What happened to the initial premise that tuition was too high? Go ahead, search the page: the word tuition doesn't even appear anywhere in the body of the presentation. "Tuition" is in the lead-in hook paragraph and in his wrap-up broadside "Implications" section.

    It's implied but never stated that the average, overall increase in tuition for the country applies equally well to UC. Maybe so, maybe not, but that's a sloppy (or possibly deliberate) omission.

    I could continue, but I think I've made my point. Mind you, I don't actually think this is a particularly bad presentation, because at least he's transparent in his biases and an astute reader will see this quickly.

    This to me highlights the delight and danger of the web: data sets are ubiquitous, but so are soapboxes.

    Rich

  • No one is born an expert so amateur analysis is a great idea (especially with it being debated). Yet we must carefully control how data is analysed and findings communicated (to avoid the "Rock Star" issue).

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply