SSIS with vbscript - Code Sample

  • I have never used the scripting feature of SSIS.

    I have added a code sample, which works outside of ssis, but dont know why i am getting errors.

    Thanks

  • Looks like you are putting vbscript into a script task. Script task is written in VB.NET in SSIS 2005. Use the ActiveX Script Task for this

  • Apologies on not noticing your header, "Write scripts using Microsoft Visual C# 2008."

    Still the same answer but wanted to correct my version inaccuracy in my last reply 🙂

  • Thanks for your reply !

    Is it possible to convert VBscript into MS Visual C# 2008 ?

  • I don't know of any tools to do so. There are some for VB.NET to C# but most always require some "help". I may have time to post a conversion for you later if no one else does. Otherwise you may be better off sense you know vbscript to convert to VB.NET. Is lessons the learning curve but be warned not to write VB.NET as if it was VB6 🙂

  • Thanks for your help Ted!

  • Ted Krueger (5/14/2009)


    I don't know of any tools to do so. There are some for VB.NET to C# but most always require some "help". I may have time to post a conversion for you later if no one else does. Otherwise you may be better off sense you know vbscript to convert to VB.NET. Is lessons the learning curve but be warned not to write VB.NET as if it was VB6 🙂

    Ted if you had time to convert this to Visual C# 2008, that would be great ! I am clueless regarding VC# 🙁

  • Sense this is SSIS I figured I'd show how I would do it as a conversion. Looking at the code I'm guessing you are preventing none email formatted strings into the data pump. In SSIS you can do this pretty easily and quickly with a DataFlow Task and a Script Component acting as a transform. To do this create your dataflow task. Add the reader and destination. Now bring over a script component into the data flow panel. You should see a popup asking what the type of component you want. Connect you reader (source) and then connect the script component to the destination. Select transform and go into the properties. In the properties select the column(s) from your reader. Create an output column then in the I\O tab. Make sure the types are the same and you need to change while selecting the root level "Output 0" None for SychronousInputID. In the design of your script you can use the following. It's pretty straight forward.

    using System;

    using System.Data;

    using Microsoft.SqlServer.Dts.Pipeline.Wrapper;

    using Microsoft.SqlServer.Dts.Runtime.Wrapper;

    using System.Text.RegularExpressions;

    [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]

    public class ScriptMain : UserComponent

    {

    public override void PreExecute()

    {

    base.PreExecute();

    }

    public override void PostExecute()

    {

    base.PostExecute();

    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

    if(ValidateEmailRow(Row.EMAILADDRESSTXT.ToString()))

    {

    Output0Buffer.AddRow();

    Output0Buffer.emailout = Row.EMAILADDRESSTXT;

    }

    }

    public Boolean ValidateEmailRow(string email)

    {

    Regex reg = new Regex(@"\w(\w|\.)*@(\w|\.)*\w");

    Match validate = reg.Match(email);

    return validate.Success;

    }

    }

    Save that and give it a try. I've attached a example project that I tested this out for you. To make it work as is, create a DB "DBA05" and two table as

    CREATE TABLE [dbo].[EMAILADDRESS](

    [EMAILADDRESSTXT] [nvarchar](50)

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[emails_temp](

    [EMAILADDRESSTXT] [nvarchar](50)

    ) ON [PRIMARY]

    All on a local instance

  • Ted,

    Thanks very much for your efforts on this:-D

    I have run into a couple of issues:

    I created the new DBA05 database and two tables. My input text resumes are in a flat folder.

    My first data flow is a flat file pointing to my desktop folder.

    When i add the script box and select source i get an error.

    With Transformation i dont get an error.

    What input source are you using ?

    You also suggest bringing the source and destination data flow boxes over prior to the script. When i do this, it wont let me connect the script box in between ?

    Apologies for being so dumb, but i have about 16 hours of SSIS experience 🙁

  • I finally got your package to run but it keeps locking up SSIS.

    I dont have local setup,(which maybe why SSIS is locking up) so i changed the Input file to my text resume, with one big long row of 30k.

    I also changed the destination DB to be a SQL instance that i commonly use.

    The project runs thru ok, but i dont get any writes to the DB ?

    How do you view the VC# script in the script component? The edit feature is gone ?

    Thanks

  • Update...I finally got your package working after i removed 'local' and changed to my instance and removed 'foo' from the script.

    The downside is that it outputs exactly what you input ?

    If the input record is 'cvxcvxcv bsugar@xyz.com dasdadasd'

    The output record is exactly the same, instead of just 'bsugar@xyz.com'

  • Can you give me an example string before and then after the validation/formating that needs to be performed?

  • Ted,

    May i send you a direct email as it may be quicker ?

    Regards

    Ian

  • Ted,

    My goal is to read a folder of text resumes and extract all the emails for EACH resume.

    If you look at the resume in the attachment, it has a generic email at the top and one embedded in the body of the resume.

    With my current package i can read each resume in a loop and extract the email, BUT it only grabs the first email, which in this case is invalid!

    It does NOT find the second email which IS valid.

    The VBscript fixes that problem and grabs BOTH emails, which is what i want.

    In summary:

    Read input resumes from flat file folder in a loop

    Extract ALL emails

    Write to DB all emails AND original resume for searching using FTI

    Regards

    Ian

  • If I had to do this entire process I would

    Step 1) bulk insert the text into a table with a primary key being a seed or unique value identitfying the individual.

    Step 2) OLEDB source in the resume text and the keys. use the script component to get your emails out of the column and then add the rows to another table with the details and foreign relating back to the resume. You know more details will want to be extracted at some point so keep it normalized. This is only if I had to store the files in the DB. Rather use pointers and bring the files in with System.IO and get the contents or use a flat file source. The task would be much faster.

    Step 3) Here is code to grab all the emails per the package I wrote as an example

    public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

    string m = ValidateEmailRow(Row.RESUMETEXT.ToString());

    if (m.Length > 0)

    {

    Output0Buffer.AddRow();

    Output0Buffer.emailout = m;

    }

    else

    {

    Output0Buffer.AddRow();

    Output0Buffer.emailout = "No emails found";

    }

    }

    public string ValidateEmailRow(string email)

    {

    string emails = "";

    Regex reg = new Regex(@"\b([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})\b");

    Match validate = reg.Match(email);

    while (validate.Success)

    {

    emails += validate.Value.ToString() + ";";

    validate = validate.NextMatch();

    }

    return emails;

    }

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply