XML Inserts

Question

XML Inserts

nileshsane

Default port

Points: 1465
More actions
June 14, 2006 at 11:45 pm

#366469
I am doing some investigations to find out if inserts happen in a more efficient way using XML.
It needs to be seen as to how the XML can be supplied for this type of operation. Can it work based on a file on disk? Can it work with an object that is present in memory?
- Is it possible to insert multiple rows at the same time?
- Is it possible to insert rows into multiple tables by supplying XML?
- It needs to be seen if there can be a control over the granularity of the transactions involved in this type of data insertion.
The structure of the XML would be
<root>
<Customer>
   <Cust_id>101</Cust_id>
   <Cust_fname>"George"</Custf_name>
   <Cust_lname>"Spencer"</Cust_lname>
   <Cust_address>"Some Address"</Cust_address>
   <Cust_dob>
      <day>2</day>
      <month>10</month>
     <year>1963</year>
   </Cust_dob>
   <Cust_Spends>
      <cust_currency>"USD"</cust_currency>
      <cust_spendamount>1000</cust_spendamount>
   </Cust_Spends>
<Customer>
</root>
The table structure is
Cust_id
Cust_fname
Cust_lname
Cust_address
Cust_dob
cust_currency
cust_spendamount
I know that the XML is not in the format which it ideally should be in for me to use open XML
EG:
INSERT CUSTOMERS
SELECT *
FROM OPENXML(@hDoc, N'/ROOT/Customer',2)
     WITH CUSTOMERS
The Problem is that the app is designed to return that XML in the XML format specified, and I need to insert it in the table. Additionally, I need to know about the transaction support provided, if any, and if I have 10000 customer records along with some records which have to be inserted into another related table, can I do it as a batch job.
Any help would be appreciated.
What I hear I forget, what I see I remember, what I do I understand
Is it possible to insert multiple rows at the same time?
Yes - see above example.
Is it possible to insert rows into multiple tables by supplying XML?
Aside from multiple openxml statements, I guess you could insert into a view with some instead of triggers to process and route it to your tables as you require.

You may well be best using a staging table and processing from there (as Peter suggested) - I just wanted to point out that you can do it without.

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

What you just selected is XML

<Cust_id>101</Cust_id>

<Cust_fname>"George"</Custf_name>

<Cust_lname>"Spencer"</Cust_lname>

<Cust_address>"Some Address"</Cust_address>

<Cust_dob>

</Cust_dob>

<Cust_Spends>

<cust_currency>"USD"</cust_currency>

<cust_spendamount>1000</cust_spendamount>

</Cust_Spends>

You will have to create a staging table, inserting CUSTID together with all attributes and afterwards, selecting from the staging table and insert into customers table.

stage table

-----------------------------

custid int

attributename varchar

attributevalue varchar

like

101 fname george

101 lname spencer

101 address some address

101 dobday 2

101 dobmonth 10

101 dobyear 1963

101 currency usd

101 spendamount 1000

and then select from insert table into customer table

insert customers

select custid,

max(case when attributename = 'fname' then fname else null end) fname,

max(case when attributename = 'lname' then fname else null end) lname,

max(case when attributename = 'address' then fname else null end) address

from stagetable

group by custid

N 56°04'39.16"
E 12°55'05.25"

RyanRandall SSChampion Points: 13623 More actions · Answer 2

Staging table? Peter - do you know we can just do this?

--data

declare @doc varchar(8000)

set @doc = '

<root>

<Cust_id>101</Cust_id>

<Cust_fname>"George"</Cust_fname>

<Cust_lname>"Spencer"</Cust_lname>

<Cust_address>"Some Address"</Cust_address>

<Cust_dob>

</Cust_dob>

<Cust_Spends>

<cust_currency>"USD"</cust_currency>

<cust_spendamount>1000</cust_spendamount>

</Cust_Spends>

</Customer>

<Cust_id>102</Cust_id>

<Cust_fname>Fred</Cust_fname>

<Cust_lname>Bloggs</Cust_lname>

<Cust_address>Somewhere</Cust_address>

<Cust_dob>

</Cust_dob>

<Cust_Spends>

<cust_currency>GBP</cust_currency>

<cust_spendamount>12345</cust_spendamount>

</Cust_Spends>

</Customer>

</root>'

--calculation

declare @CUSTOMERS table (

Cust_id int,

Cust_fname varchar(10),

Cust_lname varchar(10),

Cust_address varchar(30),

Cust_dob datetime,

cust_currency varchar(5),

cust_spendamount int)

declare @hDoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc

INSERT @CUSTOMERS

select Cust_id, Cust_fname, Cust_lname, Cust_address,

dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),

cust_currency, cust_spendamount

from (

SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)

WITH (Cust_id int 'Cust_id',

Cust_fname varchar(10) 'Cust_fname',

Cust_lname varchar(10) 'Cust_lname',

Cust_address varchar(30) 'Cust_address',

Cust_dob_day int 'Cust_dob/day',

Cust_dob_month int 'Cust_dob/month',

Cust_dob_year int 'Cust_dob/year',

cust_currency varchar(5) 'Cust_Spends/cust_currency',

cust_spendamount int 'Cust_Spends/cust_spendamount')) a

EXEC sp_xml_removedocument @hdoc

select * from @CUSTOMERS

/*results

Cust_id Cust_fname Cust_lname Cust_address Cust_dob cust_currency cust_spendamount

----------- ---------- ---------- ------------------------------ ------------------------------------------------------ ------------- ----------------

101 "George" "Spencer" "Some Address" 1963-10-02 00:00:00.000 "USD" 1000

102 Fred Bloggs Somewhere 1974-12-03 00:00:00.000 GBP 12345

*/

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

RyanRandall SSChampion Points: 13623 More actions · Answer 3

RyanRandall

SSChampion

Points: 13623

June 15, 2006 at 4:50 am

#643909

nileshsane - re a few of your questions:

nileshsane Default port Points: 1465 More actions · Answer 4

Thanks for responding guys, I am going to try out the solutions which you'll posted and get back here with the results.

What I hear I forget, what I see I remember, what I do I understand

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 5

Yes. I did not understand he was knowledgeable about XML otherwise, since he wrote only for customer node.

And I also know that the technique you are using does not work when having for example two addresses. You end up with just one if writing as above. You must "start" at the address depth and dig yourself down to root level to get all addresses, for example.

N 56°04'39.16"
E 12°55'05.25"

RyanRandall SSChampion Points: 13623 More actions · Answer 6

Peter - Could you please show me an example of what can't be done? I'd like to understand.

Here's one example of dealing with multiple addresses, but I'm guessing this isn't what you had in mind.

--data

declare @doc varchar(8000)

set @doc = '

<root>

<Cust_id>101</Cust_id>

<Cust_fname>"George"</Cust_fname>

<Cust_lname>"Spencer"</Cust_lname>

<Cust_address>"Some Address"</Cust_address>

<Cust_address>"Some other Address"</Cust_address>

<Cust_dob>

</Cust_dob>

<Cust_Spends>

<cust_currency>"USD"</cust_currency>

<cust_spendamount>1000</cust_spendamount>

</Cust_Spends>

</Customer>

<Cust_id>102</Cust_id>

<Cust_fname>Fred</Cust_fname>

<Cust_lname>Bloggs</Cust_lname>

<Cust_address>Somewhere</Cust_address>

<Cust_dob>

</Cust_dob>

<Cust_Spends>

<cust_currency>GBP</cust_currency>

<cust_spendamount>12345</cust_spendamount>

</Cust_Spends>

</Customer>

</root>'

--calculation

declare @CUSTOMERS table (

Cust_id int,

Cust_fname varchar(10),

Cust_lname varchar(10),

Cust_address varchar(30),

Cust_address2 varchar(30),

Cust_dob datetime,

cust_currency varchar(5),

cust_spendamount int)

declare @hDoc int

EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc

INSERT @CUSTOMERS

select Cust_id, Cust_fname, Cust_lname, Cust_address, Cust_address2,

dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),

cust_currency, cust_spendamount

from (

SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)

WITH (Cust_id int 'Cust_id',

Cust_fname varchar(10) 'Cust_fname',

Cust_lname varchar(10) 'Cust_lname',

Cust_address varchar(30) 'Cust_address',

Cust_address2 varchar(30) 'Cust_address[2]',

Cust_dob_day int 'Cust_dob/day',

Cust_dob_month int 'Cust_dob/month',

Cust_dob_year int 'Cust_dob/year',

cust_currency varchar(5) 'Cust_Spends/cust_currency',

cust_spendamount int 'Cust_Spends/cust_spendamount')) a

EXEC sp_xml_removedocument @hdoc

select * from @CUSTOMERS

/*results

Cust_id Cust_fname Cust_lname Cust_address Cust_address2 Cust_dob cust_currency cust_spendamount

----------- ---------- ---------- ------------------------------ ------------------------------ ------------------------------------------------------ ------------- ----------------

101 "George" "Spencer" "Some Address" "Some other Address" 1963-10-02 00:00:00.000 "USD" 1000

102 Fred Bloggs Somewhere NULL 1974-12-03 00:00:00.000 GBP 12345

*/

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 7

You have indexed the addresses right now with [2]. What if there are more addresses? Or a better example, several orders for a customer.

This is because you set "root" to be OPENXML(@hDoc, N'/root/Customer', 2)

If you set root to be N'/root/Customer/Cust_address'

and use

WITH (Cust_id int '..\Cust_id',

Cust_fname varchar(10) '..\Cust_fname',

for example, you get all addresses without having to index them. it works much like a join with "no-expanding-join behaviour".

If one cust_id and three addresses, beginning with cust_id and drilling down to addresses return just one row because that is what cust_id is.

But, starting with addresses and "anti-drilling" up to cust_id gives three rows, since there are three addresses and just one cust_id.

In original posting, cust_id versus cust_spends is a better example if there were more than one cust_spend.

N 56°04'39.16"
E 12°55'05.25"

RyanRandall SSChampion Points: 13623 More actions · Answer 8

Thanks Peter. I think we both understand what can be done here and the implications. I just wanted to check there weren't any gaps in my (or your) knowledge

I'm still not sure I like the structure of that stage table though. I don't understand why you would want one with that structure.

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

Four years of practice

We (insurance broker company) imported around 400 XML files every day, each of them between 2 and 10 mb.

The XML files could have as much as 12 levels of tags and everything, I really mean EVERYTHING, had to be searchable and filterable. Below are tables somewhat simplified for proof of concept only.

I created a lookup table looking like this

Lookup table

---------------

FieldID INT

FieldName VARCHAR

FieldDataType VARCHAR

CustomersInsurances table

----------------------------

CustInsID INT IDENTITY

CustomerID INT

InsuranceNumber VARCHAR

And I staged everything in a table

Stage table

-----------------

InsuranceNumber VARCHAR

FieldID INT

FieldValue VARCHAR

Because the files we got from the insurance companies did not have SSN or any other id for customers than name, address and telephone number due to legal implications. But they had insurance numbers. CustomersInsurances table were filled from accounting department with their systems when we got our commission. This information is always received two weeks later than the XML-file. The XML-file arrived within 36 hours after customer acceptence and signing but commission arrived about two weeks after that.

This way I could import the XML-files anyway and all data were stored and ready to be used and bound to customer whenever accounting department inserted the binding information in the CustomersInsurances table.

This way we could bind everything together! I built a search engine with GUI that allowed the end-user to filter for every field in the XML file, no matter how many value instances for the field, and also allowing any field to be output, columnwise. You could search for families having at least two children, one under 10 and the other over 15. Also they must have had their insurance for a minimum of 2 years and not having bought another insurance the last 45 days. Output should be Office who sold first and last insurance to the family, accumulated commission for insurance and gender of sales agent.

N 56°04'39.16"
E 12°55'05.25"

RyanRandall SSChampion Points: 13623 More actions · Answer 10

Nice to get an insight into someone's life. Many thanks!

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 11

You're welcome.

They are still using my design for the XML file import, SEVEN years after implementation... That's solid!

Do I need to mention that the schema for XML file has been rewritten several times by the insurance companies?

But my import mechanism doesn't care. If new attribute found, it just add the new one to the lookup table and continue to parse file. It does a near recursive iteration through all file so it finds all attributes no matter what they are called and where they are. InsuranceNumber is practically the only attribute that hasn't changed place. It is still a 'root/insurance' object.

I love self-learning systems...

N 56°04'39.16"
E 12°55'05.25"

RyanRandall SSChampion Points: 13623 More actions · Answer 12

Maybe they just don't know any better!

Ryan Randall

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 13

SwePeso

SSC-Dedicated

Points: 39757

June 15, 2006 at 8:34 am

#643984

Hey! Be nice

N 56°04'39.16"
E 12°55'05.25"

nileshsane Default port Points: 1465 More actions · Answer 14

Alrighty. I have tried the code with the different formats which we have on here, and not surprisingly it was worked like silk.

My concern now is, the size of @doc varchar(8000). I have XML's which would exceed 30K, and I believe the only option I have is text, but text datatype is not supported for local variables.

An option which I am considering is passing the entire XML to OPENXML as it is without assigning it to @doc. Dont know how this translates to good practise, but it works for me.

Thanks for all the inputs guys.

What I hear I forget, what I see I remember, what I do I understand