July 17, 2014 at 12:48 pm
Hi All, I've never had to do this before, so forgive me if I'm not explaining the question well, I'll try.
I need to consume a live data feed from a golf tournament. And by consume, I really mean insert (merge) into our own SQL Server database on a regular intervals as a tournament progresses.
This site didn't let me upload an XML file, but you can see a sample of the data feed here:
http://images.pgalinks.com/scoring/TourFeedSmallSample.xml
I need to insert this data into 2 tables, Player_Holes and Player_Shots. But while doing the insert, I need to lookup several things such as our player ID match to theirs on an external_id against the players table. The shot types translation, and some other logic about the process overall.
The columns in my player_holes tables are: id, player_id, hole_id, round, shots (this is a total # of strokes) and date_created/date_modified.
Shots table is similar: id, player_id, hole_id, round, shot_number, shot_type_id, club, distance, date_created/date_modified.
The only way I know how to do it, is inefficient. I would parse the XML in ColdFusion (please no comments on ColdFusion, that's what we use for webdev), and then loop over it and do inserts for each player, each hole for each round, and the shots would probably be separate for each hole.
It would be so much better and more efficient if I could do it in SQL directly. I've done some research and SQL Server Data Tools looks promising. I've never used it, so would have to learn, but also I'm not sure if that'd work in this application when we want to run is as a scheduled task every few minutes.
I'm really hoping there is an efficient way to just do it with T-SQL.
Any advice greatly appreciated.
July 17, 2014 at 7:49 pm
Sounds like a good time to use Integration Services. If you created a package, you could then use SQL Agent to run the package you created on some interval (every X minutes, for example). The package would consist of several steps... something like this:
1. an FTP command to get the file(s).
2. XQuery to query the XML file so that you can select the data to insert.
3. With 2, use some INSERT statements to write the parsed data from step 2 into the tables you need.
July 21, 2014 at 1:07 am
Here is a quick suggestion on how to use a blob table and views to parse the xml
😎
USE tempdb;
GO
/* Create the import table */
CREATE TABLE dbo.TBL_XML_DOC
( XML_DOC_ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED NOT NULL
,XML_DOCUMENT XML NOT NULL
);
GO
/* Load the xml */
INSERT INTO dbo.TBL_XML_DOC(XML_DOCUMENT)
SELECT CAST(BulkColumn AS XML) AS XML_DATA FROM OPENROWSET(
BULK 'C:\IMPORT\TourFeedSmallSample.xml',
SINGLE_BLOB
) AS X
/* Create HEADER view */
CREATE VIEW dbo.XML_HEADER
AS
SELECT
TX.XML_DOC_ID AS XML_DOC_ID
,IMP.DOC.value('@T_ID' ,'VARCHAR(12)') AS T_ID
,IMP.DOC.value('@eventId' ,'INT') AS eventId
,IMP.DOC.value('@generated','DATETIME2(0)') AS generated
,IMP.DOC.value('@source' ,'VARCHAR(12)') AS [source]
,IMP.DOC.value('msg[1]/@msgId','INT') AS msgId
,IMP.DOC.query('(.)') AS DOC_XML
FROM dbo.TBL_XML_DOC TX
OUTER APPLY TX.XML_document.nodes('ConsolidatedPlayers') AS IMP(DOC)
GO
/* Create ROUNDS view */
CREATE VIEW dbo.XML_ROUNDS
AS
SELECT
HEADER.XML_DOC_ID
,IMP.DOC.value('@T_ID','VARCHAR(12)') AS T_ID
,P.DOC.value('@id' ,'INT') AS ROUNDS_id
,RND.DOC.value('@n' ,'INT') AS ROUNDS_n
,RND.DOC.query('(.)') AS ROUND_XML
FROM dbo.XML_HEADER HEADER
OUTER APPLY HEADER.DOC_XML.nodes('ConsolidatedPlayers') AS IMP(DOC)
OUTER APPLY IMP.DOC.nodes('p') AS P (DOC)
OUTER APPLY P.DOC.nodes('rnd') AS RND(DOC)
/* Create HOLES view */
CREATE VIEW dbo.XML_HOLES
AS
SELECT
ROUNDS.XML_DOC_ID
,ROUNDS.T_ID
,ROUNDS.ROUNDS_id
,ROUNDS.ROUNDS_n
,HOLES.DOC.value('@n' ,'INT') AS HOLES_n
,HOLES.DOC.value('@cNum','INT') AS HOLES_cNum
,HOLES.DOC.value('@sc' ,'INT') AS HOLES_sc
,HOLES.DOC.value('@pDay','INT') AS HOLES_pDay
,HOLES.DOC.value('@pTot','INT') AS HOLES_pTot
,HOLES.DOC.query('(.)') AS HOLES_XML
FROM dbo.XML_ROUNDS ROUNDS
OUTER APPLY ROUNDS.ROUND_XML.nodes('rnd/holes/h') AS HOLES(DOC)
/* Create SHOT view */
CREATE VIEW dbo.XML_SHOT
AS
SELECT
HOLES.XML_DOC_ID
,HOLES.T_ID
,HOLES.ROUNDS_id
,HOLES.ROUNDS_n
,HOLES.HOLES_n
,SHOT.DOC.value('@n' ,'INT') AS SHOT_n
,SHOT.DOC.value('@pid' ,'INT') AS SHOT_pid
,SHOT.DOC.value('@time','INT') AS SHOT_time
,SHOT.DOC.value('@putt','CHAR(1)') AS SHOT_putt
,SHOT.DOC.value('@t' ,'CHAR(1)') AS SHOT_t
,SHOT.DOC.value('@prv' ,'CHAR(1)') AS SHOT_prv
,SHOT.DOC.value('@tee' ,'CHAR(1)') AS SHOT_tee
,SHOT.DOC.value('@cup' ,'CHAR(1)') AS SHOT_cup
,SHOT.DOC.value('@from','VARCHAR(10)') AS SHOT_from
,SHOT.DOC.value('@to' ,'VARCHAR(10)') AS SHOT_to
,SHOT.DOC.value('@asc' ,'INT') AS SHOT_asc
,SHOT.DOC.value('@dist','INT') AS SHOT_dist
,SHOT.DOC.value('@left','INT') AS SHOT_left
,SHOT.DOC.value('@x' ,'FLOAT') AS SHOT_x
,SHOT.DOC.value('@y' ,'FLOAT') AS SHOT_y
,SHOT.DOC.value('@z' ,'FLOAT') AS SHOT_z
,SHOT.DOC.value('@club','VARCHAR(10)') AS SHOT_club
,SHOT.DOC.value('@con' ,'VARCHAR(50)') AS SHOT_con
,SHOT.DOC.value('shottext[1]' ,'VARCHAR(250)') AS SHOT_shottext
FROM dbo.XML_HOLES HOLES
OUTER APPLY HOLES.HOLES_XML.nodes('h/shot') AS SHOT(DOC)
July 21, 2014 at 6:23 am
Wow, this didn't even cross my mind at all. Could you explain some of the thinking behind it?
I spent quite some time on it last week, first trying to work with .nodes(), but it turned out really slow for some reason when I had to get elements from parent nodes. Maybe I don't quite understand it, and the whole point of doing the APPLY there.
Then I switch to just using OPENXML, which worked a lot faster, 3 seconds to deal with 100k rows of XML data and 8500 that actually would come back from the query. But it's just a select of the scores for now, nothing for the shots:
DECLARE @xmlText varchar(max)
SET @xmlText = (SELECT ConsolidatedPlayersCombinedText FROM tour_feeds WHERE id = 1)
DECLARE @hDoc int
EXEC sp_xml_preparedocument @hDoc output, @xmlText
SELECT
player_id, round_num, hole_number, shots, last_shot_time
FROM OPENXML(@hdoc, '/ConsolidatedPlayers/p/rnd/holes/h/shot[last()]', 1)
WITH (
player_idvarchar(16) '../../../../@id',
round_numsmallint '../../../@n',
hole_numbersmallint '../@cNum',
shots smallint '../@sc',
last_shot_timevarchar(6) '@time'
)
EXEC sp_xml_removedocument @hDoc
July 24, 2014 at 11:19 am
There is a well-known issue with the xml datatype and methods and parent axis.
Try this instead:
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp
GO
SELECT CAST( ConsolidatedPlayersCombinedText AS XML ) ConsolidatedPlayersCombinedText
INTO #tmp
FROM tour_feeds WHERE id = 1
SELECT
p.c.value('@id', 'INT') player_id,
r.c.value('@n', 'INT') round_num,
h.c.value('@cNum', 'INT') hole_number,
h.c.value('@sc', 'INT') shots,
s.c.value('@time', 'VARCHAR(6)') last_shot_time
FROM #tmp t
CROSS APPLY t.ConsolidatedPlayersCombinedText.nodes('ConsolidatedPlayers/p') p(c)
CROSS APPLY p.c.nodes('rnd') r(c)
CROSS APPLY r.c.nodes('holes/h') h(c)
CROSS APPLY h.c.nodes('shot[last()]') s(c)
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply