Most of us know that there is a cost associated with the management and storage of data. After all, it's part of the reason many of us have jobs as data professionals. Our salary is a cost, though we also realize there are real costs in electricity, on-line storage, backup storage, and more that relate to the stewardship of data.
Many US government agencies are being asked to share more of their data publicly, but without additional funding. That could mean less research is funded as the cost of opening their databases rises. How this should be handled is open to debate, but Vint Cerf and others have noted that someone has to pay to manage data, and public/private partnerships are a potential solution. However, the fate and cost of public data isn't the core issue.
The core issue is that data sets are growing, and in some cases, growing at phenomenal rates. We see companies getting caught up in the Big Data hype, often gathering and storing data just because they have the ability to capture more data. However as our data volumes, and rates of growth increase, that impacts the performance, and perhaps availability, of our systems. Perhaps more to the core for many of us, all the data impacts our performance at our jobs. Systems might run slower, we have more work to do to analyze data, write reports, or deal with the every widening array of tools that people want to use to analyze the bits and bytes we manage.
The costs of data storage, of tracking and managing systems, or administering a multitude of environments, of supporting new tools for analysis, are going to grow. Most likely that means that each of us will be asked to handle more data, with few staff increases. We will be more valuable, but more will be demanded of us. The physical costs will be easy to measure, but I do worry about the personal costs, in stress and pressure, as we each continue to do more and more every year.