By Steve Bolton
…………As mentioned previously in this amateur self-tutorial series on the most neglected component of Microsoft’s leading database server software, SQL Server Data Mining (SSDM) can be extended through many means, such as Analysis Services stored procedures, CLR functionality, custom mining functions and plug-in algorithms. I had originally planned to close out this series with at least two separate articles on the subject of writing custom means of data visualization, but these plug-in viewers turned out to be simpler to implement than I expected. The deployment aspects are naturally the most difficult aspects, but the code behind custom data mining viewers is actually much less challenging than that for plug-in algorithms, which we already discussed in depth in previous articles. They’re by no means trivial, but programmers familiar with the .Net framework shouldn’t find them too difficult, especially if they’re already experienced with Window Forms user interface elements that were popular in the second half of the last decade. Throughout this series I have tried to provide Visual Basic code samples because most of the SSDM tutorials available on the Web are written in C#, the other major .Net language. Likewise, I originally intended to write a viewer using a more modern Windows Presentation Foundation (WPF) architecture, since all of the tutorials I’m aware of are for Windows Forms. Even though it would be child’s play to use a WPF datagrid in a WPF container control in Windows Forms, I decided it against it in order to simplify the instructions and avoid another layer of controls to debug. I also decided to adhere to the Keep It Simple Stupid (KISS) design principle by writing a viewer that performed a very basic task with easily decipherable output; in this case, we’ll simply be using a datagrid to display the output of a simple Data Mining Expressions (DMX) query, which selects some familiar metadata from the PluginModel.CONTENT we’ve used throughout this series, like ATTRIBUTE_NAME, MSOLAP_NODE_SCORE and 17 other columns.
…………Despite the fact that it was written in August of 2004, nearly a decade ago, the original tutorial provided by Microsoft for plug-in viewers, by Jesper Lind and Scott Oveson is still quite valuable. There is no custom viewer tutorial in the SDK and the only other resource I know of on the topic is a four-page discussion in Data Mining with Microsoft SQL Server 2008, the indispensable book by former members of Microsoft’s Data Mining Team. In order to make use of the Lind-Oveson tutorial, however, you’ll need to download the original SQL Server 2005 samples for plug-in algorithms and custom viewers from the respective links at Codeplex. As I did with Bogdan Crivat’s sample code in the from software development kit (SDK) for custom algorithms, I’ll start off with their codebase, but it will look nothing like the original after I get done converting the C# to Visual Basic, renaming the variables to my liking and repurposing the methods to create a completely different kind of viewer (and perhaps mangling it beyond recognition in the course of adding my own bugs). Most of the code is fairly straightforward, so there are only so many ways you can write it anyways, although there are some methods like LoadViewerData that provide more wiggle room for creativity.
…………Writing a viewer is not really much harder than using ADOMD in an ordinary Windows Forms or WPF application. The only substantial difference is that we’ll compile it in a class library that can run inside Visual Studio or SQL Server Management Studio (SSMS), like other data viewers do. The process is a little counterintuitive at first, in that you being by creating a .Net class project, but instead of using a class object (the one automatically generated by Visual Studio can be deleted), you use Add…New Item from the context menu to create a Windows Forms User Control class. This is not a tutorial on basic Visual Studio functionality, so to avoid the common affliction of “tutorial clutter” I’ll leave out the details on how to work with user controls, just as I have left the common Windows Forms code and user interface declarations out of the code in Figure 1. Next you’ll need to add a reference to Microsoft.DataWarehouse.Interfaces, which should be listed on the .Net tab of Visual Studio’s Add Reference dialog once you right-click the class and select Add Reference… on the context menu. The ViewerUserControl class also Inherits System.Windows.Forms.UserControl and Implements IMiningModelViewerControl2, an upgrade available in SQL Server 2008 from the original plug-in viewer interface, which introduces further slight differences between my class and Lind and Oveson’s example. One of the crucial mistakes I made at this juncture was failing to coordinate my assembly, class name and namespace, which led to a lot of weeping, wailing and gnashing of teeth when I tried to register them during deployment. The winning combination for me was “PluginViewer.PluginViewer.ViewerUserControl.”
Figure 1: Partial Code for the ViewerUserControl Class
Imports System.Collections
Imports System.ComponentModel
Imports System.Drawing
Imports System.Data
Imports System.Data.OleDb
Imports System.Windows.Forms
Imports Microsoft.DataWarehouse.Interfaces
Imports Microsoft.AnalysisServices.AdomdClient
Namespace PluginViewer
Public Class ViewerUserControl
Inherits System.Windows.Forms.UserControl
Implements IMiningModelViewerControl
Private ClassConnectionString As String = String.Empty
Private ClassMiningModelName As String = String.Empty
Private ClassServiceProvider As IServiceProvider
Private ClassConnection As System.Data.IDbConnection
Private ClassCommand As New Microsoft.AnalysisServices.AdomdClient.AdomdCommand
Private ClassAdapter As New Microsoft.AnalysisServices.AdomdClient.AdomdDataAdapter
Private Sub ViewerUserControl_Load(sender As System.Object, e As System.EventArgs)
‘this procedure was automatically added by Visual Studio, not in the original sample code
End Sub
Public Shared ReadOnly Property DisplayName() As String
Get
Return “My PluginViewer”
End Get
End Property
Public Property ServiceProvider() As IServiceProvider Implements IMiningModelViewerControl.ServiceProvider
Get
Return Me.ClassServiceProvider
End Get
Set(Value As IServiceProvider)
If Me.ClassServiceProvider IsNot Value Then
Me.ClassServiceProvider = Value
End If
End Set
End Property
Public Property ConnectionString() As String Implements IMiningModelViewerControl.ConnectionString
Get
Return Me.ClassConnectionString
End Get
Set(value As String)
If Me.ClassConnectionString <> value Then
Me.ClassConnectionString = value
End If
End Set
End Property
Public Property MiningModelName() As String Implements IMiningModelViewerControl.MiningModelName
Get
Return Me.ClassMiningModelName
End Get
Set(value As String)
If Me.ClassMiningModelName <> value Then
Me.ClassMiningModelName = value
End If
End Set
End Property
Public Sub ViewerActivated(IsActivated As Boolean) Implements IMiningModelViewerControl.ViewerActivated
End Sub
Public Function LoadViewerData(LocalContext As Object) As Boolean Implements IMiningModelViewerControl.LoadViewerData
ClassConnection = New AdomdConnection(Me.ConnectionString)
ClassConnection.Open()
‘I’m going to use a Data Adapter instead of an ADOMD command
ClassCommand = ClassConnection.CreateCommand()
ClassCommand.CommandText = String.Format(“SELECT
MODEL_CATALOG, MODEL_NAME, ATTRIBUTE_NAME, NODE_NAME, NODE_UNIQUE_NAME,
NODE_TYPE, NODE_DESCRIPTION, NODE_CAPTION, NODE_RULE, MARGINAL_RULE,
[CHILDREN_CARDINALITY], [PARENT_UNIQUE_NAME], NODE_SUPPORT,
NODE_PROBABILITY,MARGINAL_PROBABILITY, NODE_DISTRIBUTION,
MSOLAP_MODEL_COLUMN,MSOLAP_NODE_SCORE, MSOLAP_NODE_SHORT_CAPTION FROM [PluginModel].CONTENT”, Me.MiningModelName)
ClassAdapter.SelectCommand = ClassCommand
ClassAdapter.Fill(Me.PluginViewerDataset1, “PluginViewerDataTable“)
ClassConnection.Close()
Return True
End Function
End Class
…………Most of the code above consists of simple Get and Set statements for viewer properties, like DisplayName, ServiceProvider, ConnectionString and MiningModelName. The DisplayName is a description of your choice which will be displayed in the Viewer dropdown box in SSDM projects (which does not necessarily have to be the class name) while the other properties are mostly taken care of by SQL Server. In my class, these property statements are used to populate the class-scoped ClassConnectionString, ClassMiningModelName and ClassServiceProvider variables. The trickiest part of this for me was using the right variable types in my Implements statements. The IMiningModelViewerControl2 class also has a property that wasn’t in the previous interface version, DBConnection, of type System.Data.IDbConnection; I’m unfamiliar with it and the code works fine as is, so I’ll omit discussion of it.
…………When loading a viewer, the ViewerActivated function will also be called; during testing I wrote messages to the Event log in the procedure just for tracking purposes, but I’ve left it blank here in order to simplify the tutorial. Suffice it to say that you can include your own initialization code here for events that occur after the viewer is activated. Lind and Oveson state that “If the viewer has another window open such as a legend, it may want to bring that window to the user’s attention when the view is activated (that is, clicked on, or selected using the tabs in the viewer or by selecting it using the dropdown box).” The meat and potatoes are found in the LoadViewerData routine, where the connection is opened and a command is issued to load the data. Much of the action takes place off-screen in my example because I used a data adapter directly bound to a Windows Forms datagrid, which hides a lot of the code-behind from view. I took a cue from A Rickety Stairway to SQL Server Data Mining, Part 12: Accessing SSDM with AMO, ADOMD, SSIS, XMLA and RS and used an ADO.Net dataset rather than the old OLEDB interface in the Lind-Oveson tutorial, or the ActiveX Data Objects Multi-Dimensional for .Net (ADOMD.Net) objects cited in MacLennan et al. If you’re using ADOMD, you may have to download and install the right package for the version of SQL Server you’re working with, which for SQL Server 2012 are the SQL_AS_AMO.msi and SQL_AS_ADOMD.msi installers on the SQL Server 2012 Feature Pack webpage. MacLennan et al. also use a DataReader object and Lind and Oveson use an OLEDB command to access the data in their respective tutorials, so I opted for using an ADO command object just to add a little contrast; the DM Team’s book also discusses some nuances about the proper use of connection objects in LoadViewerData, which may be helpful in circumstances that I didn’t encounter in my brief foray into writing viewers.[ii] The most helpful suggestion I can provide for this procedure is to test out any DMX query you call here in a SQL Server Management Studio (SSMS) window first, because it’s much more cumbersome to troubleshoot incorrect syntax from within tests of a custom viewer. Also keep in mind that if you execute a simple metadata query like mine, the PARENT_UNIQUE_NAME and CHILDREN_CARDINALITY column names must be in brackets. I don’t know the reason for this strange limitation, but it’s a consistent syntax requirement of any DMX query.
…………The most challenging aspects of writing a plug-in viewer are the setup and testing, but the process is not much different than the one described for custom algorithms in A Rickety Stairway to SQL Server Data Mining, Part 14.3: Debugging and Deployment. In some ways it is actually much easier, because we don’t need to reference a wrapper like DMPluginWrapper.dll and coordinate such things as .Net and bit versions between them our classes. We can also repeat some of the procedures discussed there, such as creating a Strong Name Key File (.snk) for our project; in this case, however, we will need to find the public key token and include it in the registry. It is possible to run sn.exe from a command prompt to extract the public key token, but I added a command to the Visual Studio tools menu in order to reuse it in future projects, by following the directions at the MSDN posts Getting Public Key Token of Assembly Within Visual Studio, Visual Studio Tip: Get Public Key Token for a Strong Named Assembly and Visual Studio 2010 Get Public Key Token. The gist of it is that you assign the Title of the command as it will appear in your Tools menu, then the full path to the command (which in my case was C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin\NETFX 4.0 Tools\sn.exe) and -Tp $(TargetPath) in the Arguments section. The Use Output Window checkbox should also be selected. Some of the sources I read said to use quote marks around $(TargetPath) but that produced an error message saying that the file could not be read because it was missing. Once the command completely successfully, copy the 16-digit code after the phrase “Public key token,” which in my case was “b51c4a48a2da4bbd.” To register the viewer, we will have to create a registry key with the name of our viewer and four entries within that key. The Algorithms entry must be to a value equal to the name of the plug-in algorithm we will use it with, which in the case of the practice project we’ve been using in the Rickety series would be Internal_Name_For_My_Algorithm. We also need to create an entry named Assembly with our namespace, followed by a comma and the public key value we just extracted. In this case, the registry entry value would be “PluginViewer, PublicKeyToken= b51c4a48a2da4bbd” sans the quote marks. It is also necessary to create a Class key set to our project’s namespace plus the name of the class, which corresponds to ‘PluginViewer.ViewerUserControl” and a DisplayName key with the string you want to appear in the Viewer dropdown on the Mining Model Viewer tab.
…………The tricky part is to get add the subkey to the right registry keys. The Lind-Oveson tutorial says that it must be added to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\Tools\BIShell\Packages\{4a0c6509-bf90-43da-abee-0aba3a8527f1}\Settings\Designers\DataMiningViewers to make a custom viewer work with Visual Studio and HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\Tools\Shell\Packages\{4a0c6509-bf90-43da-abee-0aba3a8527f1}\Settings\Designers\DataMiningViewers for SQL Server Management Studio (SSMS). That tutorial was designed with SQL Server 2005 and Visual Studio 2005 in mind though. On my development machine, I had to install them to the HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\VisualStudio\10.0\Packages\{4a0c6509-bf90-43da-abee-0aba3a8527f1}\Settings\Designers\DataMiningViewers key for Visual Studio 2010 and HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Microsoft SQL Server\110\Tools\Shell\Packages\{4a0c6509-bf90-43da-abee-0aba3a8527f1}\Settings\Designers\DataMiningViewers for the SQL Server 2012 version of SSMS. There are also DataMiningViewers keys in HKEY_CURRENT_USER, but these are unimportant for our purpose. Note that the GUIDs are the same as in the Lind-Oveson tutorial, but that there are some subtle differences, such as the SQL Server version number (90 vs. 110), the name of the tool (“BIShell” vs. Visual Studio 10) and the fact that in my case, they occurred under the Wow6432Node (which is mystifying, because this key is for 32-bit compatibility, but I’m running Analysis Services 64-bit on a 64-bit version of Windows). There is probably some shortcut to determining the key locations that would make writing setup programs for custom mining viewers much easier, but so far I’ve merely done a search for the DataMiningViewers key in Regedit. Once that determination is made, you can add the registry entries in whatever way you prefer, such as reg.exe at a command prompt, merging .reg files or editing the registry manually with Regedit. In order to familiarize myself with the process of writing setup programs for custom viewers, I opted to add a deployment project to the MonitorMiningSolution that all of the projects in this series belong to. In Figure 3 you can see an example of my registry settings; note that I used the Software key instead of explicitly specifying the Wow6432Node, which will assign it the latter through reflection when the 32-bit registry is viewed in the 64-bit version of Regedit.
…………This is not a lesson in how to work with Visual Studio 2010 deployment projects, for which there are certainly many freely available tutorials; the aim here is to point out nuances to it that are relevant to the writers of custom mining viewers. One thing such developers need to be aware of is that the Lind-Oveson tutorial advises to copy our PluginViewer.dll file to the same location as the out-of-the-box viewers, which in the case of SQL Server 2005 is apparently C:\Program Files\Microsoft SQL Server\90\Tools\Binn\VSShell\Common7\IDE\. I can’t find any path named “VSShell” on my machine and no folder that would be roughly equivalent, even after substituting the 110 version for 90 and using a Program Files (x86) location. I did, however, find Microsoft.AnalysisServices.Viewers.dll in two locations in a hard drive search, C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio and C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies. After moving the DLL to the latter folder and restarting Visual Studio the viewer loaded correctly. The Lind-Oveson tutorial also advises to use the gacutil utility to register the assembly, like so: gacutil /i PluginViewer.dll, which should add it to the Global Assembly Cache (GAC). Figures 2 and 3 depict the settings I used for the deployment project in the Visual Studio 2010 File System Editor and Registry Editor.
Figure 2: Visual Studio 2010 File System Editor Example for the PluginViewer Project
Figure 3: Visual Studio 2010 Registry Editor Example for the PluginViewer Project
…………The directions Lind and Oveson give for associating a particular algorithm with a new viewer seem to be applicable only to COM projects written in C++. These days, plug-in developers are probably going to use the .Net platform exclusively, so my segment of the Rickety series on custom algorithms was strictly limited to that architecture. My sample code was written in Visual Basic, whereas other SSDM tutorial writers have opted for C#, but both compile down to the same base .Net language. The solution they provide involves adding a variable to the GetViewerType method of a C++ file that doesn’t even exist in the .Net wrapper DMPluginWrapper.dll we had to compile back in A Rickety Stairway to SQL Server Data Mining, Part 14.1: An Introduction to Plug-In Algorithms, so that we could reference the plug-in architecture in the .dll of our custom algorithm. As discussed in A Rickety Stairway to SQL Server Data Mining, Part 14.2: Writing a Bare Bones Plugin Algorithm, the GetViewerType method of AlgorithmMetadataBase is ultimately responsible for associating a custom algorithm with specific viewers. Normally you’d return at least one of the seven values of the MiningViewerType class, but since the return value type is just a string, we can substitute any value we want – including the name of our PluginViewer, sans the UserControl portion of the class name. Simply edit it, then recompile that class and perform all of the other steps discussed in Post 14.3 for redeploying a custom algorithm.[iii] If redeployment of the algorithm is successful, the new data mining viewer should be listed in the VIEWER_TYPE column of the DMSCHEMA_MINING_SERVICES rowset in the same row as the custom algorithm. As Lind and Oveson mention, you can associate multiple viewers with an algorithm by separating their names with semi-colons in the algorithm code, which in our case would be in the string returned by AlgorithmMetadataBase.GetViewerType. For example, we could specify our custom mining viewer in GetViewerType together with the one of the enumeration values for the standard viewers like so: Return “PluginViewer;” + MiningViewerType.MicrosoftCluster. Since PluginViewer is listed first it would be listed first in the VIEWER_TYPE column and at the top of the dropdown menus in SSMS and Visual Studio. I’m not aware of any means of enabling one of the nine out-of-the-box algorithms we discussed much earlier in this series to use a custom viewer (even by editing the msmdsrv.ini file by hand), but that does not indicate that they do not already exist or cannot be invented.
…………Be aware that when a custom viewer fails to load, no messages are recorded in msmdsrv.log or the event logs. If an error occurs too early in the loading process, the viewer simply won’t be listed in the VIEWER_TYPE column of the DMSCHEMA_MINING_SERVICES rowset, but if it occurs late enough you may see the non-specific message like “An error occurred attempting to create the viewer MyCustomMiningViewerName.” So far, I have been unable to debug custom mining viewers in Visual Studio because I invariably receive messages on every breakpoint that, “The breakpoint will not currently be hit. No symbols have been loaded for this document.” This is true whether I attach to msmdsrv.exe or to the SSMS process that the viewer is supposed to run in. Perhaps there is a more straightforward way of doing this by debugging through the same means that are used with ordinary Windows Forms user controls, which I am more than a little rusty on. The path of least resistance for me was to simply write messages to the Event log that tracked exceptions and the routines they occurred in, which helped expose the call stack. From what I can gather, the viewer loading sequence begins with DisplayName and is followed by calls to Initialize Component, Set ServiceProvider, Set ConnectionString, Set MiningModelName, LoadViewerData, Get ConnectionString and Get MiningModelName, in that order. Once I debugged my viewer sufficiently well to get through this sequence, I was able to visualize the PluginModel’s metadata in a Windows Forms datagrid, as depicted in the figure below.
Figure 4: The PluginViewer in Use in a Visual Studio Data Mining Project
…………Now that we’ve gotten through the most difficult aspects of custom algorithms and viewers, we can touch on the subject of GetNodeIDsForCase, a function in AlgorithmBase that I neglected till now because it is only required for plug-ins that make use of data mining dimensions and drillthrough. The former is a feature that we skipped over because it is of its limited utility, whereas drillthrough can’t be implemented in the Generic Content Viewer and thus is more applicable when custom algorithms are used with custom viewers. Returning a DrillThroughNodeFlags.Content value in GetNavigator rather than DrillThroughNodeFlags.DimensionContent will direct SSDM to apply it to drillthrough rather than mining dimensions. It is called internally by SSDM (usually at least once after InsertCases and possibly before SaveContent) so the developer has little direct control over it. Basically, it returns the NodeUniqueNames of content nodes that are associated with a particular case as an array of strings; according to Books Online, it “Maps cases to node IDs, which are simply labels for nodes in a tree layout of your model content.” Crivat also says in the tutorial in the SDK “that if you are implementing a decision tree algorithm, it is likely that more nodes (the full path from root to leaf) will contain any given case. You can push all the node unique identifiers for those nodes in the string array returned by this function.” When the CaseID is -1 and there is no InputCase, the root node will be returned, but otherwise the cases associated with particular nodes are returned. I added the code in Figure 5 to the custom algorithms I posted a few articles ago and verified that it worked correctly, although I found no immediate use for it. When debugging code like this, note how CHILDREN_CARDINALITY changes for the root node.
Figure 5: An Example of the GetNodeIDsForCase Method
Protected Overrides Function GetNodeIDsForCase(nodeFlags As DrillThroughNodeFlags, CaseID As Long, inputCase As MiningCase, maxNodes As ULong) As String()
Dim ReturnString() As String
Select Case CaseID ‘I implemented this as a Select in case we want to add custom logic later to make the type of node added conditional on the case
Case -1 ‘the flag for the root node at the top of the hierarchy, containing all cases
ReDim ReturnString(0)
ReturnString(0) = “Root Node”
Case Else
ReDim ReturnString(AttributeSet.GetAttributeCount)
For I As UInteger = 0 To AttributeSet.GetAttributeCount
‘in this model, we have 20 rows/input cases, with two columns; I want each row to be associated with both columns
‘normally you’d associate the NODE_UNIQUE_NAME with it
ReturnString(I) = “This case is associated with the “ + AttributeSet.GetAttributeDisplayName(I, True) + “column.”
Next I
End Select
End Function
…………Another SSDM feature I neglected to delve into was Classified columns, which seem to be beneficial only in custom algorithms and then in certain narrow use cases. I also neglected some broad topics that might have certain nuances in SSDM, but are generally taken care of by other components of SQL Server, such as security. This is handled by the ordinary Analysis Services mechanisms, which have already been fully fleshed out elsewhere. I don’t mean to relegate such an important topic to an afterthought, but I’m not conscious of any earthshaking distinctions between the ways SQL Server handles dimension and model permissions. Setup programs are a topic I’ve dealt with on an ongoing basis as part of other tutorials; there really isn’t any separate mechanism specific just to data mining projects, aside from the tricky business of deploying custom viewers and algorithms. The latter was tackled in Post 14.3, which really represented the pinnacle of difficulty in this series. As expected, deployment and debugging were also the biggest hurdles with custom viewers, but I seriously overestimated how difficult it would be to write them. I knew from the beginning of this series that I would make a lot of mistakes because I was in over my head, but decided that the best way to learn was to dive in – especially when I could provide some badly needed free press to the most underrated feature of SQL Server. I could probably make a further contribution by writing on the topic of data mining performance, but that will have to wait until I get a much firmer grasp of SSDM internals. A year and a half ago I won a copy of Kalen Delaney’s incredibly useful book on SQL Server 2008 R2 internals, but there is no such guide for SSDM. There isn’t even an equivalent to the Analysis Services Performance Guide, which nevertheless has some hints about how SSDM data might be stored to disk in a manner similar to SSAS dimensions. I’m still new to this, so my experience is only as valuable as the pitfalls a rookie may encounter. I really ought to thank SQLServerCentral for their support in publishing more than a dozen of the articles in this series despite my inexperience, which was of immense help.
…………At present, the best way I can probably make a worthwhile contribution while still acquiring some of the data mining skills I need is to evaluate SSDM’s competitors, and explain how to use them in conjunction with SQL Server. Some of the tools I may survey include Weka, RapidMiner, Autobox and Predixion Software, the last of which was established by former members of Microsoft’s Data Mining Team. This will be my second attempt at this upcoming series, which never even got off the ground when I tried to launch it in 2012 with evaluations of IBM and Oracle’s data mining products. As explained in Thank God I Chose SQL Server part I: The Tribulations of a DB2 Trial and Thank God I Chose SQL Server part II: How to Improperly Install Oracle 11gR2, I gained new respect for SQL Server after encountering numerous usability problems that made even the setup processes for their competitors a real challenge. SQL Server simply has its competitors beat hands down when it comes to usability and user interface design, which aren’t trivial matters; just think of the many ways a defective user interface might burden a DBA whose trying to troubleshoot a production server that’s crashed, for example. And as I discovered a few days ago when reading up on Oracle the other day, I discovered that their data mining offerings are light years behind SSDM. There is so little content there that the 2007 edition of Oracle Data Mining: Mining Hold from Your Warehouse consists of little except giant screenshots and white space. At the time, Oracle only supported two kinds of clustering, adaptive Naïve Bayes networks and support vector machines (SVMs, which overlap Linear and Logistic Regression). Two years earlier, at the time the Lind-Oveson tutorial was written, SSDM was already almost in its present state, with additional algorithms like Association Rules, Sequence Clustering, Time Series and Neural Networks that Oracle simply couldn’t compete with. To illustrate just how great of a lead the SSDM user interface enjoyed, consider that Oracle didn’t have any graphical means of viewing Decision Trees yet.[iv] Thanks to years of neglect of data mining by the top brass at Microsoft, Oracle and other competitors may well have slowly inched their way towards competitiveness with SSDM. Yet it is impossible to determine just how good Oracle’s data mining capabilities are because the user interface and setup procedures have regressed since then, to the point that it’s practically unusable. Installing DB2 is likewise a chore, but their Windows GUI is currently the equivalent of the junkyard truck in Sanford and Son. That is why I believed it was entirely reasonable to refer to decade-old instructional materials on SSDM throughout this series, like the Lind-Oveson tutorial. In my next series, I’ll investigate whether or not SQL Server’s other competitors have succeeded in surpassing SSDM’s capabilities over the last decade of neglect, much like the tortoise and the hare. There’ll be a long learning curve, but hopefully along the way I’ll be able to provide some practical advice on how to mining SQL Server data with these tools, as well as some guidelines for when they would be appropriate to use rather than SSDM.
see pp. 558-562, MacLennan, Jamie; Tang, ZhaoHui and Crivat, Bogdan, 2009, Data Mining with Microsoft SQL Server 2008. Wiley Publishing: Indianapolis.
[ii] IBID., pp. 558-559.
[iii] When writing this, I originally thought that MiningViewerType was an enumeration and that I was therefore limited to using only the seven values predefined by Microsoft. Then I discovered that you can override predefined enums using the Shadows keyword, as described in the user “Boop Boop”’s response in the MSDN thread [RESOLVED] Extending an Enum. This would have been an ironic solution to keep us from wandering in the obscurity of the underlying C++ code, down in the murky depths a couple of assemblies below our custom algorithm code.
[iv] p. 96, Hamm, Carolyn K. 2007, Oracle Data Mining: Mining Hold from Your Warehouse. Rampant TechPress: Kittrell, North Carolina.