October 22, 2013 at 7:33 am
Re the server specs, they are referenced as...
RAM: 76GB
OS: Windows Server 2008 R2 - 64 bit
Even without the O/S specified, above 3GB will be 64bit; 76GB of RAM will be nothing but.:-)
October 22, 2013 at 8:54 am
To be fair, it would have been nice if he had someone that was well versed in Talend as he is with SSIS to make a good competitive test. As simple as his may have been he would not know all the right buttons to push. A cross platform test would be more telling as well as the time and ease of setting up a test.
October 22, 2013 at 9:49 am
great test and article!
I did similar tests 2 or 3 years ago and was looking at SSIS, Talend and Pentaho. A lot of things changed since then but looks like the same thing true for Talend still - you need to mess with JVM and fight out of memory issues. This is just not good for a professional ETL tool.
I was amazed though with different possibilities and tons of features but these errors turned me away. Pentaho was great though and while they also use JVM and Eclipse, I did not have to mess with JVM.
If you read this article because you are looking for a good ETL tool, check Magic Gartners report - they have some good points there about SSIS, Pentaho and Talend.
If I had to chose between these three, I would probably pick SSIS if have to work with SQL Server and Pentaho for everything else.
SSIS 2012 is a huge improvement over 2008 R2 but still has a long way.
As for the test, it would be nice to see some typical ETL operations as well - try to sort 20MM file in Talend and when group by a few fields. This is then nightmares begin 🙂
October 22, 2013 at 10:55 am
I can understand developers picking up SSIS cause it ships with SQL server and is therefore considered 'free' . It may suffice if you have no comparison.
What I do not understand why anyone would compare it favorably to Pentaho. PDI ships with at least twice as many features 'out-of-the-box' ,extremely easy to use debugging, logging (server / db) for each and every purpose, great and stable GUI, limited set of internal datatypes etc, etc.
I build the same pilot with both SSIS and Pentaho, where SSIS literally took 3 times as much time. The interface feels like wading through mud, excrutiatingly messy with datatypes, and even unstable at times. The debugging features are laughable.
My background is INFA's Powercenter (PC). Though not as complete as PC, for small to medium projects I rate Pentaho over PC as well
October 22, 2013 at 11:36 am
keep in mind many people who started working with SSIS, have very little or no experience with dedicated ETL tools. In fact, a huge lot of companies still code ETL routines manually and for them going to T-SQL to SSIS (running T-SQL) is a more natural step than from going away entirely from their favorite SQL coding and use something like Pentaho.
Now, if you used any decent ETL tools, you will definitely miss many features in SSIS but it is still a good tool and it comes free with SQL Server that many people forget.
Also the good (and bad at the same time) part about SSIS, if you cannot do something, you can script it using VB or C#. I know it is ugly but many people end up doing that.
October 22, 2013 at 12:09 pm
There is a large secretive enterprise ETL company whose product is quite possibly the most powerful and easy to use tool on the planet.
I find with SSIS that when I come back to it I spend 2 weeks swearing at it before I return to being productive. I achieved more in my first 30 minutes with the enterprise ETL tool than I had every done with SSIS. I believe that Informatica follows a similar pattern. You get what you pay for.
One of the things that Talend does is to generate Java code from the ETL so you can use it as a RAD tool.
It has already been said that SSIS majors in getting data into SQL Server. It really needs more native connectors, both source and destinations.
October 22, 2013 at 1:38 pm
“all ETL tools should be tuned out of the box”
Right. I'm pretty sure a lot of SSIS packages out there use all of the default settings, but still get reasonable performance.
Regarding Talend: boy, and I thought SSIS was ugly 😀
Unfortunately I haven't used any other ETL tool than SSIS yet, so I can't really compare products.
Iv'e seen Datastage once, and the UI was just terrible. It had much better metadata management than SSIS though (which is pretty non-existing in SSIS).
Regarding the comment saying Talend should run in batch mode to have better performance:
the same goes up for SSIS. It runs faster out of BIDS.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 22, 2013 at 1:44 pm
I agree with the other person that said that the tests will be even more meaningful with their .BAT files
In our experience, it is very difficult to get assistance in Talend's forums. Hopefully this will change as more users adopt their products.
October 22, 2013 at 2:13 pm
Koen,
Download Pentaho Community Edition from sourceforge.net, unzip it to the folder of your choice, create a file or db repository (for example within SQL server 🙂 ) and start spoon.bat
This will take you about 10 minutes, without even having to install anything. The download comes with about 200 transformation examples (dataflows in SSIS) and the only other requirement is Java installed.
I've seen people with zero ETL experience up and running in no time, it's that easy to use
With SQL server you will use JDBC drivers for SQL server, which are slightly less ( 10% ) performant compared to OLE-DB, but AFAIK you do not even need to download these seperately.
It's a bit like trading the old DDR Trabant for a new Lexus 😉
In terms of features, the Pentaho team has really thought of about EVERYTHING. Instead of using VB script you can still resort to some Java coding, but there is a hardly any need, cause the design stack to choose from is IMMENSE.
Building a Pentaho transformation is a breeze compared to the unwieldy SSIS interface for designing a dataflow
October 22, 2013 at 2:19 pm
Spoon, this rings a bell. Is there some open-source variant called Kettle?
I remember using that in college (quite some time ago :-))
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 22, 2013 at 2:30 pm
Spoon is the GUI of Kettle, the design tool. Kettle nowadays is called PDI (Pentaho Data Integration)
October 22, 2013 at 4:13 pm
Let’s get something straight: if you’re a Microsoft shop, or you’re loading data to MS SQL – SSIS is the way to go.
I hope you don't mind if I disagree. I'll be happy to compromise and say "It Depends". 😉
In testing and comparing the two ETL tools, I figured it would be unfair to load data to or from Microsoft SQL since SSIS is a Microsoft product. I also didn’t want to compare some of the custom connections that come with the “out of the box” version of Open Studio since there would be some overhead in recreating those connectors in SSIS. The simplest comparison is testing overall ability of both tools to load 1 delimited flat file to another delimited flat file on the same server.
On that note, I'm a bit disappointed. I can't speak for other folks but I'm not interested in what boils down to a file-to-file copy or a filtered copy of the same process. I'm most interested in importing file data to tables and in exporting table data to files. It would have been nice to see such tests.
I do, however, seriously appreciate the time you spent on this article and the time you spent testing. Thanks for writing the article.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 23, 2013 at 8:08 am
Jeff,
It occurs to me that the publication of this article violates the License terms of SQL Server, in particular the "DeWitt clause":
"BENCHMARK TESTING. You must obtain Microsoft's prior written approval to disclose to a third party the results of any benchmark test of the software. However, this does not apply to the Microsoft .NET Framework (see below)."
Happily, the GPL license for Talend contains no such restriction!
October 23, 2013 at 8:22 am
Thankfully, if Microsoft takes away the author's licenses, he will still have Talend to work with, not to mention many wonderful cheaper-to-free databases! 🙂
October 23, 2013 at 12:56 pm
SQLSophist (10/23/2013)
It occurs to me that the publication of this article violates the License terms of SQL Server, in particular the "DeWitt clause":"BENCHMARK TESTING. You must obtain Microsoft's prior written approval to disclose to a third party the results of any benchmark test of the software. However, this does not apply to the Microsoft .NET Framework (see below)."
Wut? Even never heard about this clause.
Another insanity started by Oracle...
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply