I started in the early days of SQL Server, when having a gigabyte of disk storage was unheard of, much less a gigabyte of RAM. My watch has more storage space than the mainframe we replaced with an early version of SQL Server years ago. The technical possibilities and amounts of data we are capable of storing have changed rapidly every year of my professional life. When I got started, we strictly focused on automating processes that were previously paper-based. Once organizations started to get paper-based processes automated, they realized they could start analyzing that data and find out interesting things a lot faster than with paper spreadsheets.
With basic processes automated, we started attempting to store not only store transactional data, but data about every movement that any customer makes on our websites, social media, along with recording and translating phone calls (with permission, obviously) to hopefully find details that can increase the income of companies without increasing spending. One of my first databases I designed as the lead data architect was to capture data from student web browsers working for a school system ISP. We captured a lot of data, but never did much with it because, as we realize now, relational databases are not great for capturing and analyzing un-, or even semi-structured data, especially when you have to adapt the structures over time for new needs and requirements.
New, fail-fast development methodologies coupled with the desire to capture more and more data is equal parts amazing and terrifying. With the advent of unstructured data storage, there is a feeling that we can just store the data, then go back and analyze it later when we know what we are looking for. There is nothing generally wrong with this concept, but there is an old saying that the devil mixes truth in with the lies to make them sound believable. The problem lies with dealing with this vast data that is being stored. If it is not designed and catalogued as well as possible initially, the burden of analyzing the data will be too much.
Let’s make the data of today into the answers to the questions of tomorrow. Just by spending a reasonable amount of time designing, ideally when data structures are new, we can make it that much easier on future data scientists to figure out what we were doing and come up with realistic understanding of the past that can help predict and shape the future.