May 30, 2008 at 4:00 am
Hi,
Apologies for the rather broad nature of this post.
I've recently been tasked with investigating the different approaches to a data transfer following a merger and proposed alignment of applications. The source is going to be a mixture of SQL Server (2000, 2005 and 2005 Express), Oracle and Access 2003 (though all with the same tables and relationships) and the destination being any flavour of SQL Server 2005. My initial reaction was to use SSIS with a .NET front end to drive the package. However this is no longer viable as the transfer must be available to run on several hundred client sites many of which will only have SQL Server Express. Opinion seems to be split. One view is to write a .NET application to carry out the transfer and the other to use a 3rd party ETL tool. I also need to consider that there will be several other data transfers to follow. I am currently reviewing some open source solutions - Talend and Apatar[/url] and am waiting to see what sort of budget will be allocated to the project.
Has anyone got strong opinions either way on this topic? Can anyone recommend any tools? As this is my first in depth look in to a data transfer, does anyone have a checklist or process they use in investigation of the task?
Many thanks,
Ed
Ed Phillips
May 30, 2008 at 5:52 am
Hi Ed,
Quite a broad question indeed...
Firstly, coming from a third party ETL tool background (Datastage server as well as Datastage Parallel extender), I fail to understand how a third party ETL tool will perform any differently to SSIS. If you have to run data transfers from your client location, you will experience the same kind of difficulties connecting as you have with SQL server express edition (in other words, are you going to have a third party ETL client running for every location?)
If you have an ETL server centralised, then it comes down to choice of ETL tool versus SSIS basically. You could for example get your SSIS to connect to the SQL server express client site, and you would still be able to do the data transfer.
Secondly, if your sources are ONLY Oracle, SQLn, Access and text files, SSIS will be able to do exactly what your third party ETL will do.
Where the big difference will come into play, and consider very very carefully after tons of research, is in parralel processing. You will find tons of biased white papers leaning towards one vendor or another vendor comparing their tool against another tool. Unless it is reaaaaally millions of rows of data, SSIS will handle data in a very acceptable amount of time. I say reaaaaally, as my scope is limited to one rather large 2.5 billion row fact table and several million row other tables, which I am loading and maintaining in SSIS, therefore I am not qualified to deliver an opinion on terrabytes of data.
I guess you have gathered that I myself am very biasedly in favor of SSIS. For a number of personal reasons, and perhaps that disqualifies my humble opinion from counting.
But from personal experiences, the following makes SSIS the tool of personal choice:
a) Note that I havent had any dealings in Informatica, so it may be unfair to judge them the same. But from a little bit of history, I can recall Informatica splitting from Ascential, with Ascential hosting the Datastage product. IBM then goes and buys Ascential much later, after buying Informatica. Also look at Conchango. Jamie wrote an impressive white paper on the similarities between SSIS and Informatica. So why do I still vote for SSIS? I bet the cheque that you fork out for Informatica will pay for your enitre ETL project. Now what about the consulting fees you going to pay to get these guys to load it for you?
b) DataStage. *sigh*. Certain things those guys reaaaally got right. On other things I can simply shake my head and thank my stars that I will never ever ever in my life look at THAT product again. For starters, the server edition is downright hogwash. I can recall having to do sooooo much outside of the ETL tool because it simply couldnt handle what I wanted to do. For example, where are the joins? Where are the lookups? Dont tell me a hashed file is adequate for lookups. No way no how. Then again, the my exposure to the parallel extender harvested a far better perception. Most of the things that frustrated me on server were present, plus a whole lot more funky things. HOWEVER. The one area where the product seriously comes short is with their watered down hogwash called routines. No way on earth can anyone be expected to use THAT version of '80s basic to create their transformation functions. In comparrison, the fellahs at Microsoft took time and more time and more time (no wonder it was dubbed 2005 but shipped so late) to actually integrate their product with something absolutely supreme called .NET. Never before could I code so sweetly at a row level, and I challenge anyone in the Datastage domain to do something similar.
c) The fact that SSIS ships standard and "free" with most editions of 2005 means that you are not forking out any cash for a special ETL tool. Rather use those pennies to get one or two really good consultants to come upskill your team, and watch your area start to add value within a very limited amount of time.
d) If you were to look at the amount of postings on the web, I bet you that you would find a seriously inbalanced scale in terms of counts on SSIS versus any other ETL tool.
e) The ability to customise your SSIS ETL tool also takes SSIS to a different dimension. Think about it, that reaaaaally special piece of ETL work that you would have had to get a vendor to do for you can be done in house by one of those consultants you are now employing, who no doubt will be only too keen to hand over to one of your in-house team mates.
I have to re-iterate, I am very biased in favor of SSIS (and I say this with all of 5 weeks of experience). Simply put, from a pragmatic perspective, my time to deliver has increased exponentially. Given any tool and the correct amount of training and time, anything is possible. But in terms of ROI, SSIS is going to knock the socks off of anything else.
Good luck on your research (I am sure some fellahs are going to lynch me shortly, but everyone is entitled to their opinion right?)
~PD
May 30, 2008 at 6:29 am
I assume the fact that SSIS would require a SQL license at each site has made this option cost-prohibitive in your case and that is why the option is somewhat out for you.
Unfortunately, this is going to be the case for every ETL tool I have ever worked with (and I have worked with a few of them), so third party tools are probably out of the question. You may find some components you could embed, but these are going to have licensing costs as well. A single SQL Standard license is about $800 - if you have to buy a couple hundred, they may be down to $500 per site. Development and maintenance costs of something custom (depending on the complexity) may easily exceed this cost, so spend the time to do a CBA. Remember, for the cost of a developer's salary for a year, you can buy a lot of SQL licenses.
May 30, 2008 at 7:31 am
PD and Michael,
Many thanks for the response. Lots of things for me to consider. I would much rather go down the SSIS route as I'm a big fan of DTS and from what I have read SSIS offers so much more. As you point out though, the requirement is for a tool we could embed in some sort of package to send out to the hundreds of clients from which they could select a target and destination database. However an excersise to calculate the number of clients already running SQL Server Standard or above plus the costs of the extra licences verus the cost of developing a data transfer application from scratch is going to be well worth it. Particularly as it will be mainly driven by contractors.
Ed Phillips
May 30, 2008 at 8:44 pm
I am an Informatica developer in a small company that has decided to switch to SSIS. Informatica licenses by the number of Informatica repositories and the characteristics of the server it is on. The repository runs Informatica and contains all the code, as well as history about each run. There is client software that resides on each client's machine. I believe there is no extra charge for this.
We have been getting by with one repository on a server that is no longer adequate and looked into switching to a much more powerful server. Informatica informed us that the annual licensing fees would be 8 times as high. Hence the conversion to SSIS.
In other words, Informatica is a very nice tool but can be quite expensive.
Good luck on your project!
July 10, 2008 at 10:05 am
Hi everyone, I am enjoying the conversation. I work for Talend, an Open Source ETL tool that goes head to head with all the other tools mentioned. I will invite you all to see for yourself and download a free copy at http://www.talend.com
Key Technical Differentiators
Code generator
Fully transparent executed code
Use of standards (Eclipse, Java, Perl, SQL)
Easily extensible
Embed existing Perl or Java routines
Centralized metadata and monitoring
-blake
949-365-4457
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply