November 6, 2017 at 1:01 pm
Very clearly written. Although I share some of the concerns of other commentators about the performance and design of your chosen solution, I thought that overall your description of pros and cons was balanced.
And what really stood out was your exceptionally level headed replies to negative comments. They are a lesson in how to handle criticism. 😎
November 6, 2017 at 2:44 pm
John Rees - Monday, November 6, 2017 1:01 PMVery clearly written. Although I share some of the concerns of other commentators about the performance and design of your chosen solution, I thought that overall your description of pros and cons was balanced.And what really stood out was your exceptionally level headed replies to negative comments. They are a lesson in how to handle criticism. 😎
Thank you.
November 6, 2017 at 3:59 pm
Although I disagree with your solution, I appreciate that you took the time and spent the effort to write this article.
I do agree with John Rees in how you are handling the flamers - much better than I did. Thanks for leading by example, here.
November 7, 2017 at 10:51 am
Have you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.
We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 7, 2017 at 2:16 pm
Matt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
I can understand the push against EAV (having to deal with Magento, which is EAV based...). I appreciate the comments received on my post and some give pause to think of what could be a better (more relational) solution. There are a few SQL experts that may have uncovered better ways, and I thank those sharing their views here.
November 7, 2017 at 6:43 pm
jcboyer-1091017 - Tuesday, November 7, 2017 2:16 PMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Ideally we could use ONE template (say XML) and push all the incoming data into it. Currently, the incoming data is JSON (at least 3 different 'structures' and naming like Contact: and contact: ) and 5 XML (some close, but with variations like <Contact> and <contact>. We do not know these variations until we start harvesting the new site based on Magento, Pressero, Volusion, Shopify etc. The objective is to populate our back-end ERP in a quick way to create new full fillable orders. And using SELECT INTO and UPDATE FROM tables will make code way easier to maintain.
We addressed this problem in a number of ways:
a) Massage the data at the source (usually using third party support) and receive it pre-defined according to our template. Worked for the first Web site. Started to break as soon as Web Site 2 was created as it required additional values.
b) Get the data 'raw' in a format we could transform locally. Each API will need some transformation
c) Do some PIVOT/UNPIVOT/MATCHING transformations and get all the data we need off normalized tables, with minor changes as the application matures.
I will be testing some code around c) to see if it can be done 'nicely'.I can understand the push against EAV (having to deal with Magento, which is EAV based...). I appreciate the comments received on my post and some give pause to think of what could be a better (more relational) solution. There are a few SQL experts that may have uncovered better ways, and I thank those sharing their views here.
It's always a battle. Those darned users always coming up with new things they'd like tracked!!! (Just kidding) Whichever technique you use is fine as long as you give yourself a way to retire some things from using the EAV. The real trail of tears I've seen were those where the EAV was THE solution. For everything. For every row. Essentially giving up on any form of modeling.
It was like the third manifesto, just horribly implemented on top of all of the bookish crazy baked into that theory. Nothing like every query taking hours upon hours 😛
And by the way - it takes a lot of guts to put your plan on paper as you have here. Hats off for that.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 7, 2017 at 8:55 pm
Matt Miller (4) - Tuesday, November 7, 2017 6:43 PMjcboyer-1091017 - Tuesday, November 7, 2017 2:16 PMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Ideally we could use ONE template (say XML) and push all the incoming data into it. Currently, the incoming data is JSON (at least 3 different 'structures' and naming like Contact: and contact: ) and 5 XML (some close, but with variations like <Contact> and <contact>. We do not know these variations until we start harvesting the new site based on Magento, Pressero, Volusion, Shopify etc. The objective is to populate our back-end ERP in a quick way to create new full fillable orders. And using SELECT INTO and UPDATE FROM tables will make code way easier to maintain.
We addressed this problem in a number of ways:
a) Massage the data at the source (usually using third party support) and receive it pre-defined according to our template. Worked for the first Web site. Started to break as soon as Web Site 2 was created as it required additional values.
b) Get the data 'raw' in a format we could transform locally. Each API will need some transformation
c) Do some PIVOT/UNPIVOT/MATCHING transformations and get all the data we need off normalized tables, with minor changes as the application matures.
I will be testing some code around c) to see if it can be done 'nicely'.I can understand the push against EAV (having to deal with Magento, which is EAV based...). I appreciate the comments received on my post and some give pause to think of what could be a better (more relational) solution. There are a few SQL experts that may have uncovered better ways, and I thank those sharing their views here.
It's always a battle. Those darned users always coming up with new things they'd like tracked!!! (Just kidding) Whichever technique you use is fine as long as you give yourself a way to retire some things from using the EAV. The real trail of tears I've seen were those where the EAV was THE solution. For everything. For every row. Essentially giving up on any form of modeling.
It was like the third manifesto, just horribly implemented on top of all of the bookish crazy baked into that theory. Nothing like every query taking hours upon hours 😛
And by the way - it takes a lot of guts to put your plan on paper as you have here. Hats off for that.
Thanks. Guts or innocence!!! But I am pleased to have generated some punches, pushing me to be more creative.
The comments made me burn some midnight oil. Pivot/Unpivot starts to make more sense. I reconsidered my strategy and came up with:
a) create a table from the XML elements (looking at JSON later)
b) UNPIVOT that table to create Name Value pairs where the Name can be mapped to meet our Standard Table(s) column names
c) Map the Pairs
d) PIVOT the results into our Standard Table meeting the XML extract root
e) Insert the results into the final ERP tables after extra transformations when necessary (like mapping California to CA)
a) to d) takes 17 milliseconds on a fair server --10 XML elements being parsed a script, not a procedure, yet.
e) takes 3 milliseconds. (a proc).
I need to repeat a) to e) for 5 tables. 20 x 5 rounds up to 100 milliseconds.
One of our goals was extracting and generating new orders in less than 1/2 second per web site. I can see this strategy meeting that goal. And with some OLTP/In Memory processing, we may be way better.
I am not close to the original timings. BUT the code size in e) was reduced by 50% and a) to d) are similar, if not better that the original code, size wise.
Thanks gain for all your inputs.
November 8, 2017 at 6:01 am
Matt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Personally, I still think that XML is handled poorly and slowly, not to mention the horrible tag bloat. If it's going to be queried on a regular basis, then I'll shred it and put the resulting data into properly normalized tables. And, no, I don't agree with using an EAV model as the target of shredding, either. That would simply be trading off one hierarchical, denormalized mess with no referential integrity for another.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 8, 2017 at 11:29 am
Jeff Moden - Wednesday, November 8, 2017 6:01 AMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Personally, I still think that XML is handled poorly and slowly, not to mention the horrible tag bloat. If it's going to be queried on a regular basis, then I'll shred it and put the resulting data into properly normalized tables. And, no, I don't agree with using an EAV model as the target of shredding, either. That would simply be trading off one hierarchical, denormalized mess with no referential integrity for another.
November 8, 2017 at 6:59 pm
Jeff Moden - Wednesday, November 8, 2017 6:01 AMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Personally, I still think that XML is handled poorly and slowly, not to mention the horrible tag bloat. If it's going to be queried on a regular basis, then I'll shred it and put the resulting data into properly normalized tables. And, no, I don't agree with using an EAV model as the target of shredding, either. That would simply be trading off one hierarchical, denormalized mess with no referential integrity for another.
Oh agreed on not keeping it in XML permanently. We just use it as a temporary storage for extra stuff people are adding in before telling us: when we catch up to them and determine it will be used, we get it properly normalized and then ditch the XML.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 9, 2017 at 12:03 am
Matt Miller (4) - Wednesday, November 8, 2017 6:59 PMJeff Moden - Wednesday, November 8, 2017 6:01 AMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Personally, I still think that XML is handled poorly and slowly, not to mention the horrible tag bloat. If it's going to be queried on a regular basis, then I'll shred it and put the resulting data into properly normalized tables. And, no, I don't agree with using an EAV model as the target of shredding, either. That would simply be trading off one hierarchical, denormalized mess with no referential integrity for another.
Oh agreed on not keeping it in XML permanently. We just use it as a temporary storage for extra stuff people are adding in before telling us: when we catch up to them and determine it will be used, we get it properly normalized and then ditch the XML.
A bit off topic, imo. The issue is not keeping the XML or not, but the fact we receive the data in that format from the API of the Web sites we manage (could be JSON). Of course, the first step is dealing with XML parsing the document using the native XQuery functions. In our situation, most xml documents are small and XQuery is performing quite well to 'shred' the tags and return standard SQL objects. In my original post, I suggested to store the parsed data as EAV (NameValue Pairs) data, as opposed to tabular data.It is easy enough to parse XML into a table instead,
One day, we may have simpler API communications system, but today, XML and JSON are in. I wish the API could return SQL tables in serialized binary string. It would save a lot of transformation. 😀
SQL 2016/17 seems to handle XML and JSON data reasonably well.
Some may use XML as column data type with schemas etc... I plead guilty of ignorance here as I stayed away from this, assuming it was possibly short lived.
November 9, 2017 at 9:02 am
jcboyer-1091017 - Thursday, November 9, 2017 12:03 AMMatt Miller (4) - Wednesday, November 8, 2017 6:59 PMJeff Moden - Wednesday, November 8, 2017 6:01 AMMatt Miller (4) - Tuesday, November 7, 2017 10:51 AMHave you considered just storing the XML itself (or a standardized version of the incoming XM)? Going through the effort of exploding ALL elements from ALL sources in an EAV structure was required a few years back, because the XML support was rather poor, but that argument doesn't hold true any more.We've used that kind of a technique to buffer the technical team WHILE we determine whether the extra elements need to be added or if they are truly one-offs. A long term EAV model which just keeps growing tends to become a ticking timebomb that will sink your ERP or reporting solution (or both) if not kept in check. In short the XML snippet or EAV are used to buy the tech team time to model the data into the correct elements assuming it's more than just a random attribute that got tossed in.
Our usage required to be able to pull lots of these items for a bunch of rows, so the function turned out to be too much of a performance hindrance.
Personally, I still think that XML is handled poorly and slowly, not to mention the horrible tag bloat. If it's going to be queried on a regular basis, then I'll shred it and put the resulting data into properly normalized tables. And, no, I don't agree with using an EAV model as the target of shredding, either. That would simply be trading off one hierarchical, denormalized mess with no referential integrity for another.
Oh agreed on not keeping it in XML permanently. We just use it as a temporary storage for extra stuff people are adding in before telling us: when we catch up to them and determine it will be used, we get it properly normalized and then ditch the XML.
A bit off topic, imo. The issue is not keeping the XML or not, but the fact we receive the data in that format from the API of the Web sites we manage (could be JSON). Of course, the first step is dealing with XML parsing the document using the native XQuery functions. In our situation, most xml documents are small and XQuery is performing quite well to 'shred' the tags and return standard SQL objects. In my original post, I suggested to store the parsed data as EAV (NameValue Pairs) data, as opposed to tabular data.It is easy enough to parse XML into a table instead,
One day, we may have simpler API communications system, but today, XML and JSON are in. I wish the API could return SQL tables in serialized binary string. It would save a lot of transformation. 😀
SQL 2016/17 seems to handle XML and JSON data reasonably well.
Some may use XML as column data type with schemas etc... I plead guilty of ignorance here as I stayed away from this, assuming it was possibly short lived.
It's not actually off topic. While you say your stuff works reasonably well, we had a similar situation at work where the "reasonably well" stuff no longer worked "reasonably well" because of scale and the increase in the number of users hitting on the XML. JSON isn't much different, IMHO. The fix was to shred the XML on arrival and put into properly normalized tables. YMMV but it worked very well for us.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply