This is part of a series on observability, a concept taking hold in modern software engineering.
One of the interesting things I saw in an engineering presentation on Observability from Chik-Fil-A was that they are sometimes bandwidth constrained at remote sites. In an early version of their platform, they sent logs back to HQ, and their logs used all the available bandwidth, so they were unable to process credit card transactions.
While most of us don't deal with lots of remote offices sending data back to a central data warehouse, we do often work in distributed environments, and we may send data to/from a cloud or even employees' remote offices. Bandwidth is very good in many parts of the world, but it isn't infinite.
In the presentation, they talked about a tool, called Vector, that can work with data lots and slice/dice/aggregate/sample/etc. the data and then send the results to a sink location. This works like many other ETL tools that have a source and sink, along with various transforms that operate on the data.
It's an interesting philosophy to try and send back metrics that might be useful to developers or Operations staff in understanding the performance of their system. By only sending metrics, the load on downstream systems is reduced. This also allows us to store less data and read metrics sooner rather than storing all the data and processing it each time someone needs a metric.
The flip side of this is that taking this approach means that the consumers of the metrics need to ensure they are getting useful and actionable information. Determining what is needed will be like any development project, something built, iterated, re-tested, and repeated. This might even be an ongoing part of building software as new features and logging are added to your software or system.
In general, I prefer to have more data over less, but the volumes of logging and instrumentation data have grown dramatically. Some systems are producing more log data than actual data on a daily basis. Like audit data, we likely need to reduce and limit the amount of data stored long-term. However, we want to keep the important data that we find useful.
I am looking forward to trying out Vector and seeing what's possible. Having good CLI-based tools that can work with data is becoming more important all the time, especially as more of us move to DevOps flows, coding our systems operation in text, storing it in version control, and deploying on demand.
If you've used Vector, let us know what you think, and if you prefer another tool, share why today.