Is Big Data good for Data Professionals?

  • Here is a question for everyone. Is big data simply large amounts of data or is it large amounts of intelligent data? By intelligent data, I mean data that is strategic and actually helps a company make better decisions. Usually this would be data sets that not only contain transactions but also contain elements about the people involved in the transactions. Like buying customer information or gathering census data and mapping it to a set of transactions to see what their customer or potential customer base really looks like. That way they can try to look for common elements and then do target marketing to them differently, advertise more efficiently, build strategic partnerships with other companies to generate more revenue, etc..

    Or is it simply that you have a really large data set? Like having 5 TB worth of call segments that don't really tell you anything special other than who called where and when?

    I tend to think of big data as the former so that it really isn't about size as much as it is about "big picture" data. Granted, you could have a fairly small data set of intelligent data but it seems that the data collection processes of websites and companies is what started this whole big data conversation in the first place. And that data collection was mostly in order to gain as much information about users/consumers as possible in order to generate more revenue.

    That's just my view.

  • Michael Valentine Jones (1/9/2013)


    Or at a lower level:

    We have "Big Data" = We have "Big D***s"

    I'm not sure that "lower level description" was really necessary to get your point across here. 😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • Donald Burr (1/9/2013)


    If a new term helps to create dialog in the C-Suite as to what the REAL needs for data are and what practical and competative purpose they serve then fine, but otherwise it once again marketing driving technology needs.

    I would argue it's up to technology pros to help facilitate this dialog and warn people about unnecessary spending.

  • KWymore (1/9/2013)


    Here is a question for everyone. Is big data simply large amounts of data or is it large amounts of intelligent data?

    ....

    I tend to think of big data as the former so that it really isn't about size as much as it is about "big picture" data. Granted, you could have a fairly small data set of intelligent data but it seems that the data collection processes of websites and companies is what started this whole big data conversation in the first place. And that data collection was mostly in order to gain as much information about users/consumers as possible in order to generate more revenue.

    That's just my view.

    I think it's the former, just lots of data. Is it intelligent? Sometimes we don't know, perhaps lots of times. I think it's the analysis and work that needs to be done on data sets to try and understand if there is information in there, or is it just a data point.

  • Donald Burr (1/9/2013)


    "What data professionals have been trying to do for years." is exactly where the problem lies (imho). Way too many IT departments trying to drive the ship instead of the business leaders.

    IT and technology is ALWAYS an enabler, we need to understand the levers of the business and provide the correctly implemented and right-sized solutions. Technology for Technologies sake is neraly always a disaster.

    Business needs to understand what it needs and how it will be used before it is pushed into the ecosystem

    Donald - Complete agreement! However, we have to be at the shoulder of the driver to tell them if what they are looking into doing is technically possible and financially within grasp of the company. But they drive, as they listen.

    M.

    Not all gray hairs are Dinosaurs!

  • If big data is just large amounts of data then the entire conversation of big data isn't anything new then. We have been running into sizing issues for years. I think the changing of data collection processes to encompass every touch and every connection between items is what has bloated our sizes. If big data is simply big data then it is all hype and marketing to get people to focus on upgrading storage.

  • TravisDBA (1/9/2013)


    Michael Valentine Jones (1/9/2013)


    Or at a lower level:

    We have "Big Data" = We have "Big D***s"

    I'm not sure that "lower level description" was really necessary to get your point across here. 😀

    Well said Travis!

    Not all gray hairs are Dinosaurs!

  • My understanding of the term 'Big Data' is data of a scale wherein the inferences from which are important, rather than the day-to-day reporting. I understand that one applies complex statistical algorithms to it in order to derive meaning from it. It is something that one would relish data-mining.
    In contrast to Very Large Databases, which are just ordinary databases scaled out. We have a 100GB DB, which would have been a VLDB in the 2000 *, but had we been able to afford the hardware then that we have now, we surely would have had a 100GB DB then too. We are doing much the same things then as now, just with more data.

    * Oh, when a 4-processor server with 4GB RAM was considered quite impressive...

  • I have had the (mis)fortune to work with 2-3 huge databases, however, to me Big Data infers the scale of data capture that IoT and such like requires without it being filtered to a level that can call it information. For me Big Data is raw data akin to a Data Warehouse (DW) except usually a lot of work has been put into the design and implementation of a DW project.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • There are certain circumstances when 'big data' can be useful. But too often it's a marketing shtick that can sell services and equipment to managers desperate to 'keep up'.

    Size of data is much less important than quality of data, bigger amounts of poorly selected or understood data is as bad (or worse) than smaller amounts. Unfortunately getting and understanding quality data is hard, getting more is easy.

    ...

    -- FORTRAN manual for Xerox Computers --

  • ...I hope that our managers start to think that when our systems run slower that we're dealing with big data, and we need more resources. The whole big data phenomenon could be a way for data professionals to start a new hardware renaissance, where hardware budgets grow and we begin to replace our current systems with bigger, faster servers...

    The problem is when an organization believes that the solution to any database performance problem is to throw more computing resources at it. Many organizations don't have the database engineering know-how to properly design databases or trouble-shoot root causes, so they treat it as an issue of scaling up or scaling out the platform operations. However, a single strategically placed index can save $100,000 in new hardware or cloud hosting.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I'm going to say yes to this.

    My reasoning is that it gets people thinking about data in new ways even if it does produce a circular story.  It stops you wearing holes in the same old path.  

    The reality is that many organisations only scratch the surface with the potential of "small data".  The message "you don't need Big Data" is as popular as Piers Morgan at a NRA shooting party.  What you need to do is embrace the tech, get it on your CV, and then when the C levels have realised that the cost of a Big Data solution is vast, especially compared to it's likely ROI, you will look like a forward thinking superhero for being able to cope for only a "slight increment" on your current budget.
    Play the game.  Don't stress out over the futility of it all, just milk it for the amusement factor.

  • If you don't really know T-SQL programming or database design / tuning, then even a moderately sized database (ie: 100 GB) is going to seem like Big Data (at least to you).

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Big Data is a business.

    Business of selling software solutions.

    To sell them a solution you need them to have a problem first.

    So, you're running around the market place yelling how cool is "unstructured data", how convenient to store the data as a raw document.

    Surely enough - there are plenty of IT professionals who hate the word "normalization", so they are easily bought into that.

    And then - here you are, offering your product to solve the problem you've just convinced them to create!

    Just like MS SQL Server - to sell the new version you need them to have a problem that version help to resolve. And it's better be a problem you created, so you'd know how to resolve it.

    So, you first convince your customers that denormalized data storage is the best option for data warehouses, and then - you offer them a columns tore index, which is - yep, normalising data in a column.

    It's not so convenient and effective as an ordinary normalised data storage would be, it requires off-line time to refresh the index, but normalisation is so last decade! It's embarrassing even to mention it on a business summit.

    They pretty much have no choice other then to buy your solution to the problem you convinced them to create.

    _____________
    Code for TallyGenerator

  • Eric M Russell - Monday, March 27, 2017 3:45 PM

    If you don't really know T-SQL programming or database design / tuning, then even a moderately sized database (ie: 100 GB) is going to seem like Big Data (at least to you).

    If we take a database made of 100 GB of unique bigint values (pretty much the worst case scenario in terms of indexing) then the leaf node of index will contain 1k records per page, a page from the next node will cover 1M records, 3rd node will cover 1B records (8GB), and 4th node will cover the whole database.
    So, here we are, to find any record from 100GB of unique numbers SQL Server needs to read 4 pages of index. 
    It's 32k bytes to read from a disk.

    Even 20 years ago 32k bytes took no time to read  for a desktop computer.
    That's how big 100GB of data is.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply