Request for Help w/ SSIS: Import & Edit Many XML Files

  • Hi all,

    I am new to SQL Server and thus my question may be simple to many of you. Regardless, I appreciate any help that you may provide and/or resources (e.g., online tutorials, etc.) that you can point me to.

    In my online searches and subscription to Lynda.com, I haven't been able to figure out how to do this explicitly using SQL Server 2012. (I am using the 180 day free trial Enterprise Edition.)

    I have many xml files (>100) in a folder that all have the same structure but different number of entries. I'd like to create a SSIS package that enables me to upload them all, edit some of the information, look for and delete replicate entries, and then plop the data into a single table in a new database.

    Data in the xml files would ideally be reorganized. Here are some examples of the restructuring that I'll need:

    -Data field has many spaces throughout the text string that need to be deleted;

    -Data field has multiple names that are separated by the pipe symbol; ideally, these names would be catalogued individually with the record number rather than listed in a single cell -- perhaps this data should be placed in a separate table (?)

    -Data field for name is: Last First; ideally, column would be split into two columns (or 3 because of middle initials) based on text recognizer of a space between the Last and First

    -Data field for another name is: Last, First; ideally, same as Last First except using the delimiter of a comma

    -Data field has a lot of repetitive information (e.g., many records have the same information that I'd like to code: "University of California" replaced with "UC" and simultaneous creation of additional table containing the lookup codes--UC--and text--University of California)

    There are several more things that I'd like help with related to the specifics of editing fields prior to creation of a table in a new database, but I'll wait for those questions until I get a handle on the basics.

    Thanks in advance for any help!

    -K

  • The first part is easy. SSIS can do a For Each loop over the files in a directory, and it's easy to get it to import all of them into a staging table. If you Bing/Google "ssis import multiple files", you'll find numerous tutorials, walk-throughs, articles, discussions, etc., on that subject. Start with MSDN. Shouldn't take long to get that up and running.

    If you run into bugs/questions on that stage, feel free to ask here. SSIS is harder to debug on a forum than T-SQL, because you can't just copy-and-paste code, but we can usually help anyway.

    On the data cleanup, is it necessary to do all that inside the XML? It will be much, much easier to stage the XML into a table with an XML datatype column, then extract the data into relational format, clean it up in tables, and then rebuild it into XML if you need it back in that format. Will that work for you?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply