Introduction
So, for whatever reason you are fed up with your marriage to XML and you want
to go back to storing your data in (gasp!) a database. This article explains
one approach for doing just that. In the example, we are using Windows
Scripting Host to iterate through a few thousand XML files that will be
imported into a SQL Server table. There is an added level of complexity in that
the content of the XML files is formatted in HTML.
The Tools
The Situation
You are in the situation of working with thousands of XML files used to show FAQs on a web site. The
structure of the XML files are as such:
<?xml version="1.0" encoding="utf-8" ?>
<FAQ ID="020001" LanguageCode="EN" Priority="1">
<Question><!-- How can I solve my problem? --></Question>
<Summary><!-- Run the hardware utility. --></Summary>
<Content>
<!--
<div id="contentcenter">
<p>Perform a <b>hardware</b> check with the utility to assure
the quality of the system driver, etc., etc.</p>
</div>
-->
</Content>
</FAQ>
If your content is not stored as HTML or you already
have it enclosed in CDATA tags, then you can skip directly to the section titled "Importing the XML
Files."
In order to use SQLXML to import the data, you need to get
rid of the comment tags. There are probably ways to create an XML Schema
that will ignore the comment tags, but I chose instead to find and replace the
comment tags with CDATA tags. CDATA is used to tell the XML
parser to ignore anything between the tags. For reasons that I
won't get into, I chose to use Perl to handle the text manipulation.
(NOTE: I know many developers would suggest using the REPLACE function,
or .NET XmlTextReader with an
INSERT statement to import to the database. As I mentioned, this is just
ONE way to import the data. An example from a guy using
VBScript is
here.)
Open Notepad and save the code below as a file called
Replace.pl. After installing
ActivePerl, you open the file from a command
prompt and run it. You will be prompted for what you want to replace. The
find and replace process is twofold: replacing the beginning comment
(<!--) with the beginning CDATA tag (<![CDATA[) and the end comment
(-->) with the ending CDATA tag (]]>).
#!/usr/bin/perl # # NOTE: the find-and-replace IS case-sensitive. $dir = $ARGV[0] || '.'; $!=1; chdir($dir) or die $!; opendir(DIR,$dir) or die $!; my $changes=0; print "\n :: find :. "; $find=<STDIN>; chop($find); print " :: replace :. "; $replace=<STDIN>; chop($replace); $/=undef; foreach my $file (readdir(DIR)) { if($file=~/^.+\.xml$/) # adjust pattern to your needs { print "parsing file $file..\n"; open(FILE,"<$file") or die $!; my $file_data=<FILE>; close(FILE); $changes += $file_data =~ s/$find/$replace/gme; open(FILE,">$file") or die $!; print FILE $file_data; close(FILE); } } print "Voila. ($changes changes)\n";
The data are now ready to be imported.
Importing the XML Files
You will need to create a table that has the same format as the
XML schema used to import the data. The XML Schema is below;
the table name is referenced as sql:relation="<tablename>".
The following is saved as a file called FAQschema.xml.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:element name="FAQ" sql:relation="FAQ" >
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Question" type="xsd:string" sql:use-cdata="1" />
<xsd:element name="Summary" type="xsd:string" sql:use-cdata="1" />
<xsd:element name="Content" type="xsd:string" sql:use-cdata="1" />
</xsd:sequence>
<xsd:attribute name="ID" type="xsd:string" />
<xsd:attribute name="LanguageCode" type="xsd:string" />
</xsd:complexType>
</xsd:element>
</xsd:schema>
Once the table and the schema are ready, you can use the
SQLXMLBulkLoad object and the File System Object to iterate through each
file and insert the data into the table. The following is a VBScript
file used to do the import and write success/errors to a file. Save the
file with a .vbs extension and run from a command prompt in the same
directory as the XML files.
set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.3.0") objBL.ConnectionString = "provider=SQLOLEDB;datasource=localhost;database=TEST;integrated security=SSPI" objBL.ErrorLogFile = "C:\FAQImport\error.log" ' Here is the path to your XML files Const path = "C:\FAQImport\FAQs\" Dim Text, Title, oFile Dim fso, oFolder, oFiles, wsh ' Object variables Text = "Folder " Title = "XML Files" Set wsh = WScript.CreateObject("WScript.Shell") ' Create FileSystemObject object to access the file system. Set fso = CreateObject("Scripting.FileSystemObject") ' Get Folder object. Set oFolder = fso.GetFolder(wsh.ExpandEnvironmentStrings(path)) ' Get All Files Set oFiles = oFolder.Files For Each oFile In oFiles If oFile.Type = "XML Document" Then Text = Text & oFolder & vbCrLf & vbCrLf Text = Text & "Name" & vbTab & vbTab & "Size" & vbCrLf Text = Text & oFile.Name & vbTab Text = Text & FormatNumber(oFile.Size, 0) & vbCrLf objBL.Execute "C:\FAQImport\FAQschema.xml", oFile.Name End If Next Dim IFile Set IFile = fso.CreateTextFile("C:\FAQImport\results.log", True) IFile.WriteLine(Text) IFile.Close MsgBox "FAQ import successful!", vbOKOnly + vbInformation, Title Set objBL = Nothing Set fso = Nothing
Conclusions
You can usually find many examples about how to
consume XML data in SQL Server or inversely how to import/export a single XML file. This article looks at how to import multiple XML files.
Personally, I couldn't find much on this particular topic so I decided
to write this up. I hope you find it useful!
Credits
1) VBScript: The basis of the file/folder iteration
come from a script developed by
Günter Born.
2) XSD and SQLXML examples are derived from the Help files in
the SQLXML download.
3) Perl Script: This is something I got off of
CPAN a few years back and I've found
it to be invaluable. If you know the source PLEASE contact me so I
can give him or her proper credit.