June 14, 2006 at 11:45 pm
I am doing some investigations to find out if inserts happen in a more efficient way using XML.
It needs to be seen as to how the XML can be supplied for this type of operation. Can it work based on a file on disk? Can it work with an object that is present in memory?
The structure of the XML would be
<root>
<Customer>
<Cust_id>101</Cust_id>
<Cust_fname>"George"</Custf_name>
<Cust_lname>"Spencer"</Cust_lname>
<Cust_address>"Some Address"</Cust_address>
<Cust_dob>
<day>2</day>
<month>10</month>
<year>1963</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>"USD"</cust_currency>
<cust_spendamount>1000</cust_spendamount>
</Cust_Spends>
<Customer>
</root>
The table structure is
Cust_id
Cust_fname
Cust_lname
Cust_address
Cust_dob
cust_currency
cust_spendamount
I know that the XML is not in the format which it ideally should be in for me to use open XML
EG:
INSERT CUSTOMERS
SELECT *
FROM OPENXML(@hDoc, N'/ROOT/Customer',2)
WITH CUSTOMERS
The Problem is that the app is designed to return that XML in the XML format specified, and I need to insert it in the table. Additionally, I need to know about the transaction support provided, if any, and if I have 10000 customer records along with some records which have to be inserted into another related table, can I do it as a batch job.
Any help would be appreciated.
June 15, 2006 at 3:56 am
What you just selected is XML
<Cust_id>101</Cust_id>
<Cust_fname>"George"</Custf_name>
<Cust_lname>"Spencer"</Cust_lname>
<Cust_address>"Some Address"</Cust_address>
<Cust_dob>
<day>2</day>
<month>10</month>
<year>1963</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>"USD"</cust_currency>
<cust_spendamount>1000</cust_spendamount>
</Cust_Spends>
You will have to create a staging table, inserting CUSTID together with all attributes and afterwards, selecting from the staging table and insert into customers table.
stage table
-----------------------------
custid int
attributename varchar
attributevalue varchar
like
101 fname george
101 lname spencer
101 address some address
101 dobday 2
101 dobmonth 10
101 dobyear 1963
101 currency usd
101 spendamount 1000
and then select from insert table into customer table
insert customers
select custid,
max(case when attributename = 'fname' then fname else null end) fname,
max(case when attributename = 'lname' then fname else null end) lname,
max(case when attributename = 'address' then fname else null end) address
from stagetable
group by custid
N 56°04'39.16"
E 12°55'05.25"
June 15, 2006 at 4:40 am
Staging table? Peter - do you know we can just do this?
--data
declare @doc varchar(8000)
set @doc = '
<root>
<Customer>
<Cust_id>101</Cust_id>
<Cust_fname>"George"</Cust_fname>
<Cust_lname>"Spencer"</Cust_lname>
<Cust_address>"Some Address"</Cust_address>
<Cust_dob>
<day>2</day>
<month>10</month>
<year>1963</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>"USD"</cust_currency>
<cust_spendamount>1000</cust_spendamount>
</Cust_Spends>
</Customer>
<Customer>
<Cust_id>102</Cust_id>
<Cust_fname>Fred</Cust_fname>
<Cust_lname>Bloggs</Cust_lname>
<Cust_address>Somewhere</Cust_address>
<Cust_dob>
<day>3</day>
<month>12</month>
<year>1974</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>GBP</cust_currency>
<cust_spendamount>12345</cust_spendamount>
</Cust_Spends>
</Customer>
</root>'
--calculation
declare @CUSTOMERS table (
Cust_id int,
Cust_fname varchar(10),
Cust_lname varchar(10),
Cust_address varchar(30),
Cust_dob datetime,
cust_currency varchar(5),
cust_spendamount int)
declare @hDoc int
EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc
INSERT @CUSTOMERS
select Cust_id, Cust_fname, Cust_lname, Cust_address,
dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),
cust_currency, cust_spendamount
from (
SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)
WITH (Cust_id int 'Cust_id',
Cust_fname varchar(10) 'Cust_fname',
Cust_lname varchar(10) 'Cust_lname',
Cust_address varchar(30) 'Cust_address',
Cust_dob_day int 'Cust_dob/day',
Cust_dob_month int 'Cust_dob/month',
Cust_dob_year int 'Cust_dob/year',
cust_currency varchar(5) 'Cust_Spends/cust_currency',
cust_spendamount int 'Cust_Spends/cust_spendamount')) a
EXEC sp_xml_removedocument @hdoc
select * from @CUSTOMERS
/*results
Cust_id Cust_fname Cust_lname Cust_address Cust_dob cust_currency cust_spendamount
----------- ---------- ---------- ------------------------------ ------------------------------------------------------ ------------- ----------------
101 "George" "Spencer" "Some Address" 1963-10-02 00:00:00.000 "USD" 1000
102 Fred Bloggs Somewhere 1974-12-03 00:00:00.000 GBP 12345
*/
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 4:50 am
nileshsane - re a few of your questions:
Yes - see above example.
Aside from multiple openxml statements, I guess you could insert into a view with some instead of triggers to process and route it to your tables as you require.
You may well be best using a staging table and processing from there (as Peter suggested) - I just wanted to point out that you can do it without.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 5:08 am
Thanks for responding guys, I am going to try out the solutions which you'll posted and get back here with the results.
June 15, 2006 at 5:34 am
Yes. I did not understand he was knowledgeable about XML otherwise, since he wrote only for customer node.
And I also know that the technique you are using does not work when having for example two addresses. You end up with just one if writing as above. You must "start" at the address depth and dig yourself down to root level to get all addresses, for example.
N 56°04'39.16"
E 12°55'05.25"
June 15, 2006 at 5:44 am
Peter - Could you please show me an example of what can't be done? I'd like to understand.
Here's one example of dealing with multiple addresses, but I'm guessing this isn't what you had in mind.
--data
declare @doc varchar(8000)
set @doc = '
<root>
<Customer>
<Cust_id>101</Cust_id>
<Cust_fname>"George"</Cust_fname>
<Cust_lname>"Spencer"</Cust_lname>
<Cust_address>"Some Address"</Cust_address>
<Cust_address>"Some other Address"</Cust_address>
<Cust_dob>
<day>2</day>
<month>10</month>
<year>1963</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>"USD"</cust_currency>
<cust_spendamount>1000</cust_spendamount>
</Cust_Spends>
</Customer>
<Customer>
<Cust_id>102</Cust_id>
<Cust_fname>Fred</Cust_fname>
<Cust_lname>Bloggs</Cust_lname>
<Cust_address>Somewhere</Cust_address>
<Cust_dob>
<day>3</day>
<month>12</month>
<year>1974</year>
</Cust_dob>
<Cust_Spends>
<cust_currency>GBP</cust_currency>
<cust_spendamount>12345</cust_spendamount>
</Cust_Spends>
</Customer>
</root>'
--calculation
declare @CUSTOMERS table (
Cust_id int,
Cust_fname varchar(10),
Cust_lname varchar(10),
Cust_address varchar(30),
Cust_address2 varchar(30),
Cust_dob datetime,
cust_currency varchar(5),
cust_spendamount int)
declare @hDoc int
EXEC sp_xml_preparedocument @hdoc OUTPUT, @doc
INSERT @CUSTOMERS
select Cust_id, Cust_fname, Cust_lname, Cust_address, Cust_address2,
dateadd(day, Cust_dob_day - 1, dateadd(month, (Cust_dob_year - 1900) * 12 + Cust_dob_month - 1, 0)),
cust_currency, cust_spendamount
from (
SELECT * FROM OPENXML(@hDoc, N'/root/Customer', 2)
WITH (Cust_id int 'Cust_id',
Cust_fname varchar(10) 'Cust_fname',
Cust_lname varchar(10) 'Cust_lname',
Cust_address varchar(30) 'Cust_address',
Cust_address2 varchar(30) 'Cust_address[2]',
Cust_dob_day int 'Cust_dob/day',
Cust_dob_month int 'Cust_dob/month',
Cust_dob_year int 'Cust_dob/year',
cust_currency varchar(5) 'Cust_Spends/cust_currency',
cust_spendamount int 'Cust_Spends/cust_spendamount')) a
EXEC sp_xml_removedocument @hdoc
select * from @CUSTOMERS
/*results
Cust_id Cust_fname Cust_lname Cust_address Cust_address2 Cust_dob cust_currency cust_spendamount
----------- ---------- ---------- ------------------------------ ------------------------------ ------------------------------------------------------ ------------- ----------------
101 "George" "Spencer" "Some Address" "Some other Address" 1963-10-02 00:00:00.000 "USD" 1000
102 Fred Bloggs Somewhere NULL 1974-12-03 00:00:00.000 GBP 12345
*/
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 6:16 am
You have indexed the addresses right now with [2]. What if there are more addresses? Or a better example, several orders for a customer.
This is because you set "root" to be OPENXML(@hDoc, N'/root/Customer', 2)
If you set root to be N'/root/Customer/Cust_address'
and use
WITH (Cust_id int '..\Cust_id',
Cust_fname varchar(10) '..\Cust_fname',
for example, you get all addresses without having to index them. it works much like a join with "no-expanding-join behaviour".
If one cust_id and three addresses, beginning with cust_id and drilling down to addresses return just one row because that is what cust_id is.
But, starting with addresses and "anti-drilling" up to cust_id gives three rows, since there are three addresses and just one cust_id.
In original posting, cust_id versus cust_spends is a better example if there were more than one cust_spend.
N 56°04'39.16"
E 12°55'05.25"
June 15, 2006 at 7:37 am
Thanks Peter. I think we both understand what can be done here and the implications. I just wanted to check there weren't any gaps in my (or your) knowledge
I'm still not sure I like the structure of that stage table though. I don't understand why you would want one with that structure.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 8:01 am
Four years of practice
We (insurance broker company) imported around 400 XML files every day, each of them between 2 and 10 mb.
The XML files could have as much as 12 levels of tags and everything, I really mean EVERYTHING, had to be searchable and filterable. Below are tables somewhat simplified for proof of concept only.
I created a lookup table looking like this
Lookup table
---------------
FieldID INT
FieldName VARCHAR
FieldDataType VARCHAR
CustomersInsurances table
----------------------------
CustInsID INT IDENTITY
CustomerID INT
InsuranceNumber VARCHAR
And I staged everything in a table
Stage table
-----------------
InsuranceNumber VARCHAR
FieldID INT
FieldValue VARCHAR
Because the files we got from the insurance companies did not have SSN or any other id for customers than name, address and telephone number due to legal implications. But they had insurance numbers. CustomersInsurances table were filled from accounting department with their systems when we got our commission. This information is always received two weeks later than the XML-file. The XML-file arrived within 36 hours after customer acceptence and signing but commission arrived about two weeks after that.
This way I could import the XML-files anyway and all data were stored and ready to be used and bound to customer whenever accounting department inserted the binding information in the CustomersInsurances table.
This way we could bind everything together! I built a search engine with GUI that allowed the end-user to filter for every field in the XML file, no matter how many value instances for the field, and also allowing any field to be output, columnwise. You could search for families having at least two children, one under 10 and the other over 15. Also they must have had their insurance for a minimum of 2 years and not having bought another insurance the last 45 days. Output should be Office who sold first and last insurance to the family, accumulated commission for insurance and gender of sales agent.
N 56°04'39.16"
E 12°55'05.25"
June 15, 2006 at 8:11 am
Nice to get an insight into someone's life. Many thanks!
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 8:25 am
You're welcome.
They are still using my design for the XML file import, SEVEN years after implementation... That's solid!
Do I need to mention that the schema for XML file has been rewritten several times by the insurance companies?
But my import mechanism doesn't care. If new attribute found, it just add the new one to the lookup table and continue to parse file. It does a near recursive iteration through all file so it finds all attributes no matter what they are called and where they are. InsuranceNumber is practically the only attribute that hasn't changed place. It is still a 'root/insurance' object.
I love self-learning systems...
N 56°04'39.16"
E 12°55'05.25"
June 15, 2006 at 8:30 am
Maybe they just don't know any better!
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
June 15, 2006 at 8:34 am
Hey! Be nice
N 56°04'39.16"
E 12°55'05.25"
June 16, 2006 at 1:22 am
Alrighty. I have tried the code with the different formats which we have on here, and not surprisingly it was worked like silk.
My concern now is, the size of @doc varchar(8000). I have XML's which would exceed 30K, and I believe the only option I have is text, but text datatype is not supported for local variables.
An option which I am considering is passing the entire XML to OPENXML as it is without assigning it to @doc. Dont know how this translates to good practise, but it works for me.
Thanks for all the inputs guys.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply