Last night’s Oregon SQL user
group meeting was very enlightening. Mark Simms form the SQLCAT team came
down from Redmond to talk about StreamInsight, a substantial new feature in SQL Server
2008 R2 that allows data to be analyzed and aggregated in-memory as it flows through
the server. I’ve read a little about this new capability and have seem some
short demos but Mark gave us a comprehensive look under the hood after walking the
group through a few scenarios that really made this technology very relevant.
StreamInsight is part of the SQL Server product suite, which makes sense because it
is a platform for working with and analyzing data. However, it has very little
to do with traditional database storage and relational or stored analytical data and
query paradigms. It looks like more of a programming tool than a database technology.
StreamInsight sits on top of Dot Net and uses data adaptors to received and send streams
of data that travel through memory – a little bit like SQL Server Integration Services
(SSIS). It uses a scripting language called LINQ, which is actually a set of
.NET extension libraries, typically used with the C# .NET programming language.
The tenants if StreamInsight are events, windows and various operators such as aggregates.
Like .NET object-oriented method coding, nested commands and operators can be built
into named code constructs called macros but this code doesn’t have to be pre-compiled
into assemblies. In fact, the programming and debugging experience feels a lot
like writing and testing SQL using rich programming tools.
Mark did a great job of demonstrating some very meaningful scenarios. By itself,
StreamInsight has nothing to do with data storage. The data flowing through
an application can originate as live transactions, streaming video, messages or something
like chats and traffic on a social network. Most applications are used to analyze
data in-flight without the latency and overhead of disk I/O; but all or part of this
data could also be read from persistent storage or written to a database after it’s
been processed. It would normally run on a dedicated server that just pipes
the data through for analysis and to control other components of a larger solution.
He gave examples of reading flow information from oil pipelines, monitoring large-scale
streaming media events like the Olympics or a championship game, active stock trading,
and monitoring traffic patterns on a highway system. There’s little value in
storing all these events but it’s critically important to monitor it in real-time
to capture samples and states form time to time. I think I get StreamInsight
now and I can envision a variety of useful applications.
One of my takeaways as a BI practitioner is this: In our practice, we are often
asked to work with a consulting client to build business dashboards. It’s common
to sit down with a business user who says something like “I would like to see a dashboard
that shows me a timeline of historical KPIs, measures and dimensions” – that’s typical
BI analysis that we would read from a data warehouse. …”then I’d like to see the continuum
of that data as it relates to our daily production activity”. This is operational
data, stored in a live, transactional system. The volume of data is low but
and the data can change. It comes from system that stores records as they occur.
“Finally, I want to see what’s going on in our live processes, how our production
equipment is operating and what components are performing above or below capacity.”
That’s not necessarily information they may need to keep around but it is certainly
useful data during and shortly after thousands of events are taking place.
A recording of Mark’s session will be available soon with sample and code. I’ll
update this post accordingly.
Weblog by Paul Turley and SQL Server BI Blog.