Too Good at Data Analysis

  • Comments posted to this topic are about the item Too Good at Data Analysis

  • I've come across a parallel example which also gives an interesting insight into unexpected results of data analysis. My wife, as I think I've said before, is a Statistician. She works for an insurance company, specifically working on their models for predicting risk related to motor insurance policies. Obviously, they have huge amounts of historical data on which to base their models. They have the ability to build VERY accurate models, albeit taking in so many variables as to make them operationally unviable. However, it's also highlighted the interesting fact that models which are too accurate can be commercially disadvantageous.

    People buy insurance as a bet against the unknown. Insurance companies take on that bet since, just like bookmakers, they can work the odds to lie in their favour overall. As the predictions get better, so insurance companies can identify the bets that are most likely to cost instead of make money, and so avoid them. However, once predictions become accurate enough, there's little real uncertainty and so no real unknown to bet against, leaving the company with no real room to maneouvre. In short, you've removed the uncertainty that prompts people to have insurance in the first place (although the legal requirement for a driver to have insurance blurs the line slightly).

    So can you over-analyse? Can you have too good a picture? Yes.

    Semper in excretia, suus solum profundum variat

  • Often when I find myself struggling with inconsistent data, the best solution is reducing granularity until the inconsistencies drop under the threshold. So less data can be more if chosen wisely. Choosing the less accurate but consistent model saves a lot of work and is often all the business really needs. Overengineering is much too common unfortunately. KISS 😉

  • Personally i don't think you can blame technology for the current global financial crisis, I'm sure the data analysis undertaken by banks throughout the world identified, to a certain degree, the risk involved in the sub-prime and 100+% mortgage markets. I'm sure the same analysts identifed the potential profita and revenue that could be made from such products then some suits chasing a bonus made decision based on 'the facts' In hindsight I don't think anyones' analysis identified the global death spiral that followed no matter what technology you used. Do you think someone said, well if the sub-primes don't get re-paid the world ecomony will shatter into little pieces...i don't think one bank or analyst made that prediction atleast not until it was too late.

    For me this proves two things - Free market ecomomics doesn't work unregulated in certain industries and you can BI all you like, if the risk is acceptable to the ulitmate decision maker then the risk gets taken.

    Gethyn Elliswww.gethynellis.com

  • Let's see...I think it was in Hitchiker's Guide to the Galaxy...they took a galactic census but had to toss out the results when they discovered that the average galactic citizen has three legs and owns a goat.

    I don't think the amount of data is necessarily the problem. Seems like it's the conclusions that you draw from the data that are usually the problem. I think people tend to see what they want to see and disregard the rest, and tend to build models to support what they want to see.

  • It's not exactly a technolgical issue but it is related. One discussion suggested that they were treating mortgage defaults as unrelated variables (which they normally are) without allowing for the fact that they can suddenly become related (by a bump in the economy--perhaps the oil runup). It's a case of symmetry breaking in a spin-glass, and random failures are no longer random.

    One big problem is that over engineering models makes them *appear* far more accurate and reliable, but (as is usually the case) there are significant weaknesses in underlying assumptions which irretrievably but silently damages the model's reliability. Sometimes a number of 'competing' models can predict the same result (which gives a feeling of confidence) but if the models, at some point, share underlying assumptions, they can all be comparably wrong.

    ...

    -- FORTRAN manual for Xerox Computers --

  • As undergrad engineering students we were drilled on the uncertainty principle. That is, a measure can never be known precisely. We talked about the true value of a thing being within a "band" given certain odds or statistical probability, usually 20:1. In physics and chemistry we were hammered about "significant digits". And then there is that great little book, How to Lie with Statistics.

    Your point is well taken. We think we have certainty, but no part of the measuring system is actually certain. It would be an interesting exercise to calculate the uncertainty band around many of our conclusions.

    Robin Hood

  • Over-engineering can definitely come into play. Several times I have seen simple models degenerate into a sick game played by the executives. "You've proven that Widget C is unprofitable in the New England States, but I'd like you to redo this. Divide these numbers by the number of babies born in the Pacific Rim and then multiply by the hour of the day, and we can prove that Widget C is doing fine."

    The problem is that the work of people who understand the models and how to do such an analysis often gets trumped by people in power...usually the same people who ordered the work to begin with. And they're not looking to glean knowledge from the numbers. Rather, they set out to prove their self supporting hypothesis at any cost, regardless of how absurd the model becomes.

    In my opinion, that's how we found low interest mortgages given to high risk people.

  • The problem I see with the models they used is that, as is so often the case, their precision exceeded their accuracy. It's one of the most common mistakes in data analysis of all sorts. About the only people who avoid it are well-trained, experienced scientists and engineers.

    As soon as precision excedes accuracy, it's all wasted effort. Useless. But people like to rely on it and make decisions based on it, even if it's essentially luck at that point.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • bmcnamara (10/14/2008)


    ... such an analysis often gets trumped by people in power...usually the same people who ordered the work to begin with. And they're not looking to glean knowledge from the numbers. Rather, they set out to prove their self supporting hypothesis at any cost, regardless of how absurd the model becomes.

    ...

    ohh yes! I also see that in ROI calculations (even by us IT people), a little bit of data selection goes a long way

    ...

    -- FORTRAN manual for Xerox Computers --

  • A few weeks ago I read a similar item in 'New Scientist' magazine.

    From what I understand, various financial institutions have been using mathematical models to calculate risk which they knew weren't accurate, but which were used because they didn't have anything better.

    Apparently, as was known beforehand, a major failing was that 'the market' was treated as a single black box which could absorb and dilute risk without being noticably affected. So a mortgage lender would take on a bad debt (e.g. sub-prime) and then 'spread it around' to reduce the risk. Of course, in hindsight, it's obvious that 'the market' is finite so with enough bad debt the spreading doesn't reduce the risk, it just infects other institutions. This was exacerbated by the fact that many insititutions were being sold a 'pig in a poke', i.e. they didn't really know what they were buying.

    I think this is one of those cases where more data wouldn't help as it requires a restructuring of the analysis rather than more information. What is needed is to model 'the market' as multiple instutions which vastly increases the complexity, but doesn't actually need much more data.

    As an aside, I gather a large part of the current market fall is due to 'short' selling. Can someone explain to me how this isn't theft? In summary, a 'short' seller borrows shares (sometimes paying a fee), sells at a high price, waits for a fall and buys at a low price, pocketing the difference (less fee) and returning the shares. The argument is that the share price was going to fall anyway, but recent trends seem to indicate that excessive 'short' selling can actually drive the price down. As I see it, it's the shareholder taking the loss and in many situations the 'short' seller does not even inform the shareholder that their shares have been sold 'short', so all they see is that their share value has fallen. And this is legal?!

    Derek

  • It comes down to the first acronym I ever learned having to do with computers and computing: GIGO -- Garbage In, Garbage Out.

    True for both data and code, which always embodies the biases of its writers.

  • GSquared (10/14/2008)


    As soon as precision excedes accuracy, it's all wasted effort. Useless. But people like to rely on it and make decisions based on it, even if it's essentially luck at that point.

    Of course, as well as people beleiving precise numbers which may be wildly inaccurate...

    88.23% of all statistics are made up on the spot [Poll of 6,302 statisticians]

    ...the other effect is where a number of dubious value gets manipulated to appear to be more accurate than it really is. If I ever see a situation like this, I tend to run a mile (1,609.344 meters)! 🙂

    Derek

  • Short selling isn't stealing, and actually short sellers can help ensure the market doesn't grow too wildly. (http://blogmaverick.com/2008/09/30/the-stock-market-is-not-a-barometer/)

    The problem is that we were allowing people to borrow on stock they didn't own to sell short. Or at least that's one problem I know about. Short sales are fine, but you're on the hook for the repurchase back at some point. Which means if you short Apple at $100, thinking it will fall, then it can go to $500 and you owe $400 (per share). Essentially you are carrying a loan and at some price that loan has to be called. The danger is that you have an unlimited liability.

    Like many of the loans, short sellers aren't regulated enough. They shouldn't be allowed to short more than they can cover in cash.

    The models being used were supposed to be proven by the market, which is just crazy. If you want to do that, then they have to be "test" models, not models on which you can base your business.

  • It is important to discover the reason why all these financial institutions got into trouble at the same time. Certainly, one company's portfolio could affect another's, but why did so many portfolios tank at once?

    I think the economist Jesus Huerta de Soto makes a great case for why regulation and government intervention in the market is the reason why were are in this situation: http://mises.org/story/3138

    Data mining is fine, and as some in this thread have suggested, the individual analyzing the data must look at it with the proper context. Unless you have a background in Austrian Economics, it's difficult to understand that Fed indicued easy money policies mixed with miguided political decisions like making home ownership "easy" for everyone via Fannie May & Freddie Mack will eventually lead to market corrections.

    Keith Wiggans

Viewing 15 posts - 1 through 15 (of 43 total)

You must be logged in to reply to this topic. Login to reply