September 20, 2007 at 1:14 pm
I need to copy data from a case sensitive server to a case insensitive server. Some unique indexes will be duplicates due to the CI. I need to trap these in a log file and I would also like to capture the transfer progress for every 10K rows(most table are in million + rows).
The Transfer SQL Server Objects and Export wizard don't seem to offer much in the logging the information I need. I found that I can capture failures using the data flow task with OLEDB connections but I am not getting the reason for the failure.
Any help will be greatly appreciated.
MG
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Tony Hoare
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
September 21, 2007 at 2:04 am
In the error output of the OLEDB dest, are the error columns not populated? They normally give you the reasons.
If not, the easiest way would be to push all the rows into a table without constraints and join it to itself, deleting the duplicates.
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
September 21, 2007 at 5:57 am
The error columns are not being populated. I have tried creating new packages on different servers but get the same result.
The main issue I see with pushing the rows into a table is that I would need a table for each table, unless I am misinterpreting the examples I have found. The server in question has 40 databases and the number of tables in each averages 80.
I'm not "under the gun" to get this done but I would like to figure it out using SSIS.
MG
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Tony Hoare
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
September 21, 2007 at 6:29 am
Doh. it's simple(ish)
Have two sources, both from the same table.
Under them, add a sort comp. For the left hand side, sort by your key, select Ignore Case and remove duplicates.
On the right side, only sort by key.
Below that, have a join, set to left outer. What you end up with is the left overs. Duplicates which would have blown the SQL.
You would have to put a multicast on the ignore case pipe above the join to use for your insert.
Should work...
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
September 21, 2007 at 6:35 am
Another way would be to use a script comp. Have two outputs, one for insert rows and one for dups.
In the code, keep a list of rows sent for insert and compare against that. Anything matching, send to the other output. Advantage
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
September 21, 2007 at 7:32 am
Thanks again - I am just starting to read up on the script comp. I originally thought of error output because it was simple. I need to identify the duplicates because they really aren't dups in the original data. This system was written by chemists and engineers so a description is the unique key. Example - CHLORIDE BY TITRATION and CHLORIDE by TRITATION - the first has 70 atoms, the second 0. Things like this need to be identified and either adjusted in the original before the final data migration, or do some sort of transformation in the package.
MG
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Tony Hoare
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
September 21, 2007 at 7:39 am
I've actually found a much nicer way of doing it: Fuzzy Grouping.
You specify the columns you want checked. Downstream, you have a conditional split when _key_out = _Key_in, it is the first of a unique group. Everything else is a duplicate to that group.
But, as your previous message said they are in fact not duplicates but identify something totally different, I guess there is more work to be done in sorting them. be sure to save off the fuzzy columns like _score etc so you can check to see why it did or did not match later on.
If it's chemistry you dealing with, getting this wrong could turn a can of hairspray into a nuclear bomb? "Well Bob, I guess we put to many atoms in that batch"
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
September 21, 2007 at 7:52 am
Well, there isn't any fissionable material but a large cloud of Chlorine gas would produce similar results with the added benefit of not destroying buildings, infrastrucure etc
MG
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Tony Hoare
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
September 21, 2007 at 10:04 am
Some reading: A remove duplicates custom component and source code.
http://msdn2.microsoft.com/en-us/library/ms160916.aspx
Might be worth looking into.
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
September 21, 2007 at 10:18 am
I started playing with the fuzzy grouping. I like it. I put the output in a spread sheet and let the users go through it to make the decisions.
I'll look at this Monday. I'm heading home in a few minutes.
Thanks for the help and have a good weekend
MG
"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Tony Hoare
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply