March 25, 2009 at 3:49 pm
Our ERP data goes to Stage through Datamirror. And we have bunch of SSIS packages that runs nightly to load the datawarehouse tables. There is a need that we need to implement this realtime. In other words we need to have ERP data into our datawarehouse tables realtime (may be an hour delay). Can anybody give me some insight on where should I begin?
March 25, 2009 at 5:01 pm
That's a great question, I wish I knew the answer for sure. It is clear to almost everyone that the big web sites are doing exactly this on their live commercial sites (think of how Ebay gives you a live hierarchical breakdown of the number of items in each subcategory from your current location), however, whenever this question is asked, no one from those sites ever answers. From this I infer that everyone who could comment from experience is constrained by NDA's.
However, that doesn't stop me from speculating... And my speculation for a long time has been that they use a "Cube + Log" arrangement. The "log" is the per-table or per-cube live change logs from the OLTP system (where the actual transactions occur). The cubes are updated from the change logs on some fixed period (lets say once a day) and then the changes logs are reset (cleared/truncated/emptied etc).
Interactive queries go first to the cube to return the base numbers which are then modified based on the accumulated changes in the logs to give "almost live" current figures.
With an arrangement like this it should be possible to get Cube-like statistical breakdowns interactively that are current to within a minute. There would be no particular need to roll all of the logs into the cubes at the same time, and they could easily be done incrementally. For greater performance and freedom from locking/blocking cubes could be "mirrored" and then updated in a staggered alternating pattern. This would allow the logs to be rolled into the cubes incrementally on an almost continuous basis.
Anyway, that's how I would probably do it. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 26, 2009 at 8:09 am
Any other suggestions/advise please?
March 26, 2009 at 10:45 am
Hi
I think the problem is that nobody can help you since you don't specify your business case a little more.
The most simple (and certainly not helpful) answer would be:
Schedule your SSIS jobs to run every hour or thirty minutes.
... But I think you wont ask if this would be an opportunity. So maybe some more detailed information may cause more people to be able to help you.
And
Which parts of Barry's reply may be possible for you and which wont?
Greets
Flo
March 26, 2009 at 2:51 pm
barunpathak (3/26/2009)
Any other suggestions/advise please?
In additiona to what Florian said (which I completely agree with), if my response was not what you were looking for, you should explain why not or better explain what you are looking for.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 26, 2009 at 3:10 pm
Barry's suggestion of cube + current is what I've used for similar things. Real-time reports, with almost the speed of a data warehouse, and as close to real-time as your latest transactions. It does have a performance overhead, but that can be worked with if you have powerful enough hardware, a good enough design for the database, and the right type of load distribution.
No, I haven't worked on anything like the scope of eBay. But I have had to work with this kind of thing on a smaller scale, mainly because of hardware constraints.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 26, 2009 at 3:26 pm
Thanks for the confirmation, Gus. And yeah, I figured that it was no cheap approach because it requries a lot of hardware, but since it is so spread out it can scale much more readily.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 27, 2009 at 7:33 am
RBarryYoung (3/26/2009)
Thanks for the confirmation, Gus. And yeah, I figured that it was no cheap approach because it requries a lot of hardware, but since it is so spread out it can scale much more readily.
Actually, in my case, the reason I had to use so many techniques that normally only apply to huge, high activity databases, is because the hardware was far too low-power for what we wanted to do with it.
I don't think I ever had more than a couple of hundred people accessing the databases at the same time, but all the production databases were on one low-end server, and the needs were intense. So I had to push the limits of SQL 2000 pretty hard to get it to work worth a darn.
The only reason I had to warehouse at all is because the hardware wasn't up to producing a few reports in real-time from the OLTP tables, even with only a few hundred thousand rows per table and a few dozen transactions per second in them. So, I set up a warehouse-plus solution. Took the reports from timing out (over the 30-second limit set by the connection), to a fraction of a second.
Had to take a lot of steps with that system that would normally only apply to much bigger, much more heavily trafficed systems, because I needed to get every cycle I could out of those overloaded CPUs.
It's all about load to power ratios. Even cheap, low-end servers can benefit from this kind of thing. But only if it's worth the extra work that goes into it. Because implementing this kind of thing is a LOT of work.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 27, 2009 at 9:08 am
GSquared (3/27/2009)
It's all about load to power ratios. Even cheap, low-end servers can benefit from this kind of thing. But only if it's worth the extra work that goes into it. Because implementing this kind of thing is a LOT of work.
Yeah, I would think so. Lots of moving parts, etc.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 28, 2009 at 3:49 pm
I thought I explained it in detail (atleast I thought I did). Here you go again....
We have AS400 ERP system. From the ERP to the stage tables in SQL Server 2005, data replicates through a tool called DataMirror. From the stage tables to datawarehouse tables, we run SQL jobs comprising of SSIS/DTS packages that loads the datawarehouse tables every night.
Our intention is to grab the data from Stage and transfer to datawarehouse tables real time instead of waiting 1 day. Can this be achieved? Does CUBE involves changing our SSIS packages?
I heard Service Broker can also be used to achieve this..Do we need to change our existing SSIS packages in order to implement Service Broker?
Let me know if this is detail enough for you....
Do you have one example where this has been implemented and how it has been implemented?
Your help would be greatly apprecieated.
March 28, 2009 at 6:23 pm
barunpathak (3/28/2009)
I thought I explained it in detail (atleast I thought I did). Here you go again....We have AS400 ERP system. From the ERP to the stage tables in SQL Server 2005, data replicates through a tool called DataMirror. From the stage tables to datawarehouse tables, we run SQL jobs comprising of SSIS/DTS packages that loads the datawarehouse tables every night.
Our intention is to grab the data from Stage and transfer to datawarehouse tables real time instead of waiting 1 day. Can this be achieved? Does CUBE involves changing our SSIS packages?
It is impossible for us to tell if this can be achieved. You would have to give us a whole lot more information, so much that this would stop being advice and start being a Job for both you and us. The easiest way to tell would be to just set up a test and try it.
But in all likelihood, even if you can achieve it, it may still end up being unusable for the intended purpose. Data Warehouses are just not normally designed for either "real-time" updating or beng able to support users and updating at the same time. Good DW's are typically optimized for two things:
1) Reporting & Analysis, with no changes going on and,
2) Large Batch Updating while it has exclusive access
Even if you can get around the likely simultaneity problems, you still have the resource blocking issues: Remeber they are the whole reason that you moved your reporting and analysis functions to a different database? Well, real-time updating has the effect of trying to merge them back together again.
So the short answer is, you probably cannot just "go real-time" with the updates, unless you never really needed a separate datawarehouse database in the first place.
I heard Service Broker can also be used to achieve this..Do we need to change our existing SSIS packages in order to implement Service Broker?
Service Broker and SSIS do not really work together. In this case they are alternate means to accomplish similar things. Service Broker is much better at small/single incremental things than SSIS, however, SB is very much a "roll your own" kind of tool. It's all custom code.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply