urgently need advice: mapping XML data to destination columns

  • This is not just an academic exercise. Before me is a the task of mapping all columns ensconced within multiple (25 or so) XML documents, and mapping them to their destination columns in a data model, basically, to create a data dictionary. I need direction, or links for tutorials, that will help me decipher what are the tables and columns within these XML sources.

    My hopes were high when I thought I could create a table containing an XML data type field, and then load that table using an external data import function called OPENROWSET, to view the source of information in a relational database format. But once I tried it my hopes were dashed. This method just imports the XML file and a select against it just opens a page like you find in a browser.

    At this point I just need to be able to quickly ascertain what are the tables and columns within these XML files, so that I can deliver a complete mapping for a new data model.

    Thank you!!!

  • For example, is there a tool that will convert an xml document into an entity relationship diagram? I need to identify the tables and columns.....

  • XML documents use an xml schema to define the content architecture.

    An xml schema (XSD) can be "inline" - contained in the xml - or a seperate file.

    IF you have xml but no xsd, read the following link but be aware that XSD inferrence can lead to incorrect data typing

    http://msdn.microsoft.com/library/x6c1kb0s.aspx

    If you already have all the XSD files, you can use SSIS to create the tables I believe from the XSD.

    Try this link for starters - it has a very brief guide to how to do that:

    http://www.fryan0911.com/2008/10/create-sql-table-script-from-xml-schema.html

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thanks for replying. I was aware of this capability in ssis but I considered it a workaround. In my case, when I use the xml transformation it creates an XSD for me however it then interprets my first xml source as having 31 tables, some of them nothing more than a table with a date column.

    1. Do I trust ssis to be interpreting the xml source correctly, or is this the kind of xsd inference you made reference to, that is potentially erroneous?

    2. Is this an efficient method for determining what the columns of the xml source are for the purposes of building a data dictionary of column mappings?

    3. Is this how the expert SSIS developers (who have not XML experience) do it?

    Thanks, hxkresl

  • Since there are multiple way to present data in an XML format based on the same relational schema I don't think the tools provided by SSSIS are sufficient enough to "automagically reverse-engineer" an ERM based on the XML structure.

    From my point of view the format itself doesn't matter. Regardless of XML, txt, CSV or whatever format the data are in, whoever is responsible to design the table structure needs to understand the data and the relationship between.

    To answer your questions:

    I do not trust SSIS in terms of suggesting the correct table structure and I don't think it's an efficient way to do it. SSIS is of a great help to shred data. But not for DB modelling.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I have been supplied with XML files in place of direct access to the source database, for extracting the source columns containing the data that needs to be ETL'd over to the new data model. Therefore, I am not asking SSIS to do modeling, per se, but to reveal the underlying structure of the XML source.

    I understood shredding and reverse engineering the underlying data structure to be one and the same...

    My task is to create a data dictionary mapping source to destination columns. What is the best way to get at the underlying data structure of an XML source that has been supplied in place of direct access to the source database?

  • Here's an example:

    DECLARE @xml XML

    DECLARE @xml2 XML

    SELECT @xml='

    <Person>

    <Name ID="1" First="John" Last="Doe">

    <email>1@a.com</email>

    <email>2@b.com</email>

    </Name>

    </Person>'

    SELECT @xml2='

    <Person>

    <Name ID="1" First="John" Last="Doe">

    <email>1@a.com</email>

    </Name>

    <Name ID="1" First="John" Last="Doe">

    <email>2@b.com</email>

    </Name>

    </Person>'

    SELECT @xml,@xml2

    You can shred it so you have one table with on or two rows or two separate tables, or even three tables. How do you know the original structure (assuming the table design can be denormalized...)??



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • hxkresl (1/25/2011)


    I have been supplied with XML files in place of direct access to the source database, for extracting the source columns containing the data that needs to be ETL'd over to the new data model. Therefore, I am not asking SSIS to do modeling, per se, but to reveal the underlying structure of the XML source.

    I understood shredding and reverse engineering the underlying data structure to be one and the same...

    My task is to create a data dictionary mapping source to destination columns. What is the best way to get at the underlying data structure of an XML source that has been supplied in place of direct access to the source database?

    I suggest what you do is look at one of the files. Does it contain an <xs:schema> node?

    Inside the xs:schema node is the definition for the "tables" within the document.

    They may make sense in terms of a SQL database or not. You need to review them to see if they make sense. If they do not, you will need to build the SQL schema yourself.

    If you don't have the XSD either inline (as xs:schema) or seperately as an "xsd" document, then I suggest you request them from the supplier of the xml. They should not have a problem doing that for you.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I suggest what you do is look at one of the files. Does it contain an <xs:schema> node?

    They don't contain an xs:schema node.

    Inside the xs:schema node is the definition for the "tables" within the document.

    They may make sense in terms of a SQL database or not. You need to review them to see if they make sense. If they do not, you will need to build the SQL schema yourself.

    Worried

    If you don't have the XSD either inline (as xs:schema) or seperately as an "xsd" document, then I suggest you request them from the supplier of the xml. They should not have a problem doing that for you.

    If they provide one then I understand that I trust SSIS's SQL CREATE TABLE scripts, right?

  • hxkresl (1/25/2011)


    If they provide one then I can trust SSIS's SQL CREATE TABLE scripts?

    If they provide one you stand a much better chance, but you should still review the tables it suggests/creates to see if they make sense for your requirements.

    At least SSIS would not be "guessing" the schema, so it would hopefully work, but there is always the possibility that the XML files are not a straight dump of the original tables , so the schema could be less than optimal.

    For example, I might have a table called Roles

    CREATE TABLE Roles(

    Role int not null primary key,

    RoleName varchar(50) not null,

    RoleActive char(1) not null

    )

    Even if the xml contains a straight dump of the table, you will not have information about Role being the Primary Key and you will have no information regarding indexes or constraints.

    I can't remember if you would get Foreign Key information.

    Also, there is the possiblity that the XML contains a subset of the data or a mixture of data from various tables.

    For example, SELECT EmployeeName,RoleName,RoleActive FROM Employees JOIN Roles ON Employees.Role = Roles.Role

    This would come with a schema that did not relate to the original tables at all.

    This is why you must review the schema.

    Perhaps the other option is to ask the XML supplier to provide CREATE TABLE scripts for you?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • LutzM, I looked at a applying Data Binding to these xml document using instructions at the below site, but found that the tutorial offered a how-to only for XML files that didn't contain attributes. eg. http://www.devguru.com/features/tutorials/XML/beginning_xml_part_two.html

    Your example SQL Queries are the same. They don't work with XML elements that have many attributes...like the XML files I'm working with :w00t:

  • Even if the xml contains a straight dump of the table, you will not have information about Role being the Primary Key and you will have no information regarding indexes or constraints.

    The new model only needs to know the name of the table and column as it isn't importing anything beyond the value in the column. I think the constraints in the source are immaterial since the new model is completely redesigned. If my thinking is suspect, please tell me.

    Also, there is the possiblity that the XML contains a subset of the data or a mixture of data from various tables.

    For example, SELECT EmployeeName,RoleName,RoleActive FROM Employees JOIN Roles ON Employees.Role = Roles.Role

    Since the XML files will be the only delivery mechanism for the source data, and the data model will never connect to the source database, does this still matter, ie. the possibility that the data in the XML is a product of JOINS? Could I eliminate this worry by confirming with customer that the structure of the XML files will never change?

    Perhaps the other option is to ask the XML supplier to provide CREATE TABLE scripts for you?

    Is this customary to ask? I would like to but even the client for whom the data model is being built is not the same entity as that which is delivering the XML files. I will be frank, I do not know what to say if they push back because I don't have the experience to know if I am asking for something quite normal.

  • If you are not concerned about modelling the source database, then things are certainly better.

    Ask for the XSD documents so that there is not guesswork involved and then use SSIS to create a test database and just check that you are happy with the tables.

    It is perfectly normal to ask for the XSD schema documentation.

    Good luck.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • hxkresl (1/24/2011)


    For example, is there a tool that will convert an xml document into an entity relationship diagram? I need to identify the tables and columns.....

    Look at the Altova MissionKit. It's very good at dealing with this kind of challenge. Between MapForce, XMLSpy and UModel, you should have pretty much all of the tools you need to model this out, and operationalize it (MapFoce will actually operationalize the mappings through generating XSLT, Java or C# code)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Look at the Altova MissionKit. It's very good at dealing with this kind of challenge.

    Altova MissionKit for once I have the XSDs, right?

    Thanks to you both for your direct individualized help.

  • Viewing 15 posts - 1 through 15 (of 16 total)

    You must be logged in to reply to this topic. Login to reply