At first glance, this seemed to be quite a statistic: most network data sits untouched. NetApp did a study of traffic patterns on their file servers and they found that the majority of the data, 90% of it, that was written was never accessed, and 65% of the data was accessed once. Now the study was relatively short term, only three months, but it's still a very interesting trend to be aware of. A recommendation made by the study was that most data should be offloaded to slower media.
I'm not quite sure what slowed media would be. Is it tape with some HSM system? The old SAN? Perhaps optical media? I think this is one area that IT hasn't matured very well.
And it got me thinking about whether or not there is some similar pattern for database data. After all, we deal in tremendous amounts of data that are often stored away and then not used, or perhaps not used very much.
My guess is that this doesn't hold true for lots of data stored in databases since we often aggregate or compile sets of information. But it does bring up an interesting thought in that perhaps we should consider archiving older data and maintaining some amount of summary information in current tables.
SQL Server 2005 has some great partitioning functions, unfortunately most of these are Enterprise Edition only features, but they great ease the burden of archiving off data to separate filegroups or even instances of SQL Server. I have not had the need to work with these new functions, but I have tried to archive off data in previous versions and it was always a cumbersome task.
It makes me wonder if there is any research going on in the database world to try and increase the level of automation here. Perhaps some way to automatically "summarize" data and store off details into less accessed storage. Any way to handle slower streams of data from other types of media, perhaps some form of optical media? Are there new methods to allow the DBA to easily separate out sets of media, perhaps have partially read-only tables that could have very high fill factors?
I think that the storage industry really needs to do a lot of work here, building some medium speed, high capacity storage that's between tape and disk instead of only trying to create faster and faster types of high end storage. One we have something like this, perhaps some evolution of the old optical jukeboxes, I'm sure we'll see databases taking advantage of this.
Steve Jones
The Voice of the DBA Podcasts
The podcast feeds are now available at sqlservercentral.mevio.com to get better bandwidth and maybe a little more exposure :). Comments are definitely appreciated and wanted, and you can get feeds from there.
or now on iTunes!
- Windows Media Podcast - 32.9MB WMV
- iPod Video Podcast - 25.5MB MP4
- MP3 Audio Podcast - 5.2MB
Today's podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.
I really appreciate and value feedback on the podcasts. Let us know what you like, don't like, or even send in ideas for the show. If you'd like to comment, post something here. The boss will be sure to read it.