XML Import into multiple tables with data translation

Question

XML Import into multiple tables with data translation

pixelwiz

Right there with Babe

Points: 738
More actions
July 17, 2014 at 12:48 pm

#288688

Hi All, I've never had to do this before, so forgive me if I'm not explaining the question well, I'll try.
I need to consume a live data feed from a golf tournament. And by consume, I really mean insert (merge) into our own SQL Server database on a regular intervals as a tournament progresses.
This site didn't let me upload an XML file, but you can see a sample of the data feed here:
http://images.pgalinks.com/scoring/TourFeedSmallSample.xml
I need to insert this data into 2 tables, Player_Holes and Player_Shots. But while doing the insert, I need to lookup several things such as our player ID match to theirs on an external_id against the players table. The shot types translation, and some other logic about the process overall.
The columns in my player_holes tables are: id, player_id, hole_id, round, shots (this is a total # of strokes) and date_created/date_modified.
Shots table is similar: id, player_id, hole_id, round, shot_number, shot_type_id, club, distance, date_created/date_modified.
The only way I know how to do it, is inefficient. I would parse the XML in ColdFusion (please no comments on ColdFusion, that's what we use for webdev), and then loop over it and do inserts for each player, each hole for each round, and the shots would probably be separate for each hole.
It would be so much better and more efficient if I could do it in SQL directly. I've done some research and SQL Server Data Tools looks promising. I've never used it, so would have to learn, but also I'm not sure if that'd work in this application when we want to run is as a scheduled task every few minutes.
I'm really hoping there is an efficient way to just do it with T-SQL.
Any advice greatly appreciated.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63552 More actions · Answer 1

Sounds like a good time to use Integration Services. If you created a package, you could then use SQL Agent to run the package you created on some interval (every X minutes, for example). The package would consist of several steps... something like this:

1. an FTP command to get the file(s).

2. XQuery to query the XML file so that you can select the data to insert.

3. With 2, use some INSERT statements to write the parsed data from step 2 into the tables you need.

Eirikur Eiriksson SSC Guru Points: 182943 More actions · Answer 2

Here is a quick suggestion on how to use a blob table and views to parse the xml

USE tempdb;

GO

/* Create the import table */

CREATE TABLE dbo.TBL_XML_DOC

( XML_DOC_ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED NOT NULL

,XML_DOCUMENT XML NOT NULL

);

GO

/* Load the xml */

INSERT INTO dbo.TBL_XML_DOC(XML_DOCUMENT)

SELECT CAST(BulkColumn AS XML) AS XML_DATA FROM OPENROWSET(

BULK 'C:\IMPORT\TourFeedSmallSample.xml',

SINGLE_BLOB

) AS X

/* Create HEADER view */

CREATE VIEW dbo.XML_HEADER

AS

SELECT

TX.XML_DOC_ID AS XML_DOC_ID

,IMP.DOC.value('@T_ID' ,'VARCHAR(12)') AS T_ID

,IMP.DOC.value('@eventId' ,'INT') AS eventId

,IMP.DOC.value('@generated','DATETIME2(0)') AS generated

,IMP.DOC.value('@source' ,'VARCHAR(12)') AS [source]

,IMP.DOC.value('msg[1]/@msgId','INT') AS msgId

,IMP.DOC.query('(.)') AS DOC_XML

FROM dbo.TBL_XML_DOC TX

OUTER APPLY TX.XML_document.nodes('ConsolidatedPlayers') AS IMP(DOC)

GO

/* Create ROUNDS view */

CREATE VIEW dbo.XML_ROUNDS

AS

SELECT

HEADER.XML_DOC_ID

,IMP.DOC.value('@T_ID','VARCHAR(12)') AS T_ID

,P.DOC.value('@id' ,'INT') AS ROUNDS_id

,RND.DOC.value('@n' ,'INT') AS ROUNDS_n

,RND.DOC.query('(.)') AS ROUND_XML

FROM dbo.XML_HEADER HEADER

OUTER APPLY HEADER.DOC_XML.nodes('ConsolidatedPlayers') AS IMP(DOC)

OUTER APPLY IMP.DOC.nodes('p') AS P (DOC)

OUTER APPLY P.DOC.nodes('rnd') AS RND(DOC)

/* Create HOLES view */

CREATE VIEW dbo.XML_HOLES

AS

SELECT

ROUNDS.XML_DOC_ID

,ROUNDS.T_ID

,ROUNDS.ROUNDS_id

,ROUNDS.ROUNDS_n

,HOLES.DOC.value('@n' ,'INT') AS HOLES_n

,HOLES.DOC.value('@cNum','INT') AS HOLES_cNum

,HOLES.DOC.value('@sc' ,'INT') AS HOLES_sc

,HOLES.DOC.value('@pDay','INT') AS HOLES_pDay

,HOLES.DOC.value('@pTot','INT') AS HOLES_pTot

,HOLES.DOC.query('(.)') AS HOLES_XML

FROM dbo.XML_ROUNDS ROUNDS

OUTER APPLY ROUNDS.ROUND_XML.nodes('rnd/holes/h') AS HOLES(DOC)

/* Create SHOT view */

CREATE VIEW dbo.XML_SHOT

AS

SELECT

HOLES.XML_DOC_ID

,HOLES.T_ID

,HOLES.ROUNDS_id

,HOLES.ROUNDS_n

,HOLES.HOLES_n

,SHOT.DOC.value('@n' ,'INT') AS SHOT_n

,SHOT.DOC.value('@pid' ,'INT') AS SHOT_pid

,SHOT.DOC.value('@time','INT') AS SHOT_time

,SHOT.DOC.value('@putt','CHAR(1)') AS SHOT_putt

,SHOT.DOC.value('@t' ,'CHAR(1)') AS SHOT_t

,SHOT.DOC.value('@prv' ,'CHAR(1)') AS SHOT_prv

,SHOT.DOC.value('@tee' ,'CHAR(1)') AS SHOT_tee

,SHOT.DOC.value('@cup' ,'CHAR(1)') AS SHOT_cup

,SHOT.DOC.value('@from','VARCHAR(10)') AS SHOT_from

,SHOT.DOC.value('@to' ,'VARCHAR(10)') AS SHOT_to

,SHOT.DOC.value('@asc' ,'INT') AS SHOT_asc

,SHOT.DOC.value('@dist','INT') AS SHOT_dist

,SHOT.DOC.value('@left','INT') AS SHOT_left

,SHOT.DOC.value('@x' ,'FLOAT') AS SHOT_x

,SHOT.DOC.value('@y' ,'FLOAT') AS SHOT_y

,SHOT.DOC.value('@z' ,'FLOAT') AS SHOT_z

,SHOT.DOC.value('@club','VARCHAR(10)') AS SHOT_club

,SHOT.DOC.value('@con' ,'VARCHAR(50)') AS SHOT_con

,SHOT.DOC.value('shottext[1]' ,'VARCHAR(250)') AS SHOT_shottext

FROM dbo.XML_HOLES HOLES

OUTER APPLY HOLES.HOLES_XML.nodes('h/shot') AS SHOT(DOC)

Umer Akram SSC-Insane Points: 22209 More actions · Answer 3

Umer Akram

SSC-Insane

Points: 22209

July 21, 2014 at 2:16 am

#1730923

Check This

pixelwiz Right there with Babe Points: 738 More actions · Answer 4

Wow, this didn't even cross my mind at all. Could you explain some of the thinking behind it?

I spent quite some time on it last week, first trying to work with .nodes(), but it turned out really slow for some reason when I had to get elements from parent nodes. Maybe I don't quite understand it, and the whole point of doing the APPLY there.

Then I switch to just using OPENXML, which worked a lot faster, 3 seconds to deal with 100k rows of XML data and 8500 that actually would come back from the query. But it's just a select of the scores for now, nothing for the shots:

DECLARE @xmlText varchar(max)

SET @xmlText = (SELECT ConsolidatedPlayersCombinedText FROM tour_feeds WHERE id = 1)

DECLARE @hDoc int

EXEC sp_xml_preparedocument @hDoc output, @xmlText

SELECT

player_id, round_num, hole_number, shots, last_shot_time

FROM OPENXML(@hdoc, '/ConsolidatedPlayers/p/rnd/holes/h/shot[last()]', 1)

WITH (

player_idvarchar(16) '../../../../@id',

round_numsmallint '../../../@n',

hole_numbersmallint '../@cNum',

shots smallint '../@sc',

last_shot_timevarchar(6) '@time'

)

EXEC sp_xml_removedocument @hDoc

wBob Right there with Babe Points: 715 More actions · Answer 5

There is a well-known issue with the xml datatype and methods and parent axis.

Try this instead:

IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp

GO

SELECT CAST( ConsolidatedPlayersCombinedText AS XML ) ConsolidatedPlayersCombinedText

INTO #tmp

FROM tour_feeds WHERE id = 1

SELECT

p.c.value('@id', 'INT') player_id,

r.c.value('@n', 'INT') round_num,

h.c.value('@cNum', 'INT') hole_number,

h.c.value('@sc', 'INT') shots,

s.c.value('@time', 'VARCHAR(6)') last_shot_time

FROM #tmp t

CROSS APPLY t.ConsolidatedPlayersCombinedText.nodes('ConsolidatedPlayers/p') p(c)

CROSS APPLY p.c.nodes('rnd') r(c)

CROSS APPLY r.c.nodes('holes/h') h(c)

CROSS APPLY h.c.nodes('shot[last()]') s(c)

XML Import into multiple tables with data translation

Cookies on SQLServerCentral