This editorial was originally published on Jan 20, 2009. It is being re-run as Steve is at SQL Bits.
I'm not sure how I ran across Splunk, but it intrigued me. It's a data management application, similar to something I've thought about building myself (snap fingers, darn, wish I'd done it first) when I was managing a lot of servers.
There's no denying that IT infrastructures are becoming incredbibly complex. Think about some of your mission critical applications and potentially how many different pieces of equipment or software interact to make them work? At Peoplesoft we had a failure one Thanksgiving and as I was listening to people talking about the various things that seemed to be working, I was amazed to find out we had over a dozen different servers all running parts of this one piece of our ERP infrastructure. And that's without counting network devices!
Trying to track down issues becomes hard when you need to correlate the logs from all these devices and match things up by time. Having some type of system that can bring together multiple logs, similar to the Log Viewer in SQL Server 2005 that lets you see multiple logs at one time is important. Splunk seems like a great type of application to help with this.
As I was reading about this product, I was wondering how this data is stored. After all, you'd think that a database would be natural here for indexing this type of material. I thought it likely that MySQL was being used but as I dug in I saw some interesting facts. First splunk stores a compressed copies of the various log data it reads, which can be a lot of data. However they use flat files and then create indexes on those files for quick querying. They say that their index size is 30% of the data size, which is reasonable, but I would have assumed they'd have multiple indexes to handle different queries.
In many ways this seems like a specialized search engine, with bots sent out not to read web pages, but log data. However archiving, scaling out to multiple machines, and helping you to answer questions about your environment are all search engine type features.
I'm still wondering how much data they produce and how easy this is to manage in the real world. I have friends running Openview and Unicenter, and both of those systems are cumbersome and complicated, not to mention expensive. If anyone's using Splunk, I'd be interested to hear how easy it is to use. and how well it manages what must be a tremendous amount of data.
And if anyone is using to solve SQL Server issues or answer BI type questions about their environment, please, write us an article.
Steve Jones
The Voice of the DBA Podcasts
The podcast feeds are now available at sqlservercentral.mevio.com and you can see more great shows there. I've linked the feeds below.
Overall RSS Feed: or now on iTunes!
- Windows Media Podcast - 26.8MB WMV
- iPod Video Podcast - 23.5MB MP4
- MP3 Audio Podcast - 4.8MB
Today's podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.
I really appreciate and value feedback on the podcasts. Let us know what you like, don't like, or even send in ideas for the show. If you'd like to comment, post something here. The boss will be sure to read it.