March 7, 2011 at 10:09 pm
So I have to populate a table with an SSIS package. The rows that return an error, go off to another table, but there is also a third table that I need to populate that has a foreign key that points to the second table. Some of the information I get is in those redirected rows.
My question is, will there be too much of a performance difference, if I populate both the error tables within SSIS or if I set up a trigger that populates the third table based on what I enter in the second one?
March 8, 2011 at 12:08 pm
SSIS will be much faster in this case. Use lookup transform to find the referenced table columns you need.
March 8, 2011 at 12:17 pm
I need to insert rows into the table I am referrencing too within the same package.
I need to insert rows into error table. Error table has an identity column primary key called errorid.
After that I need to insert rows into another error table which has errorid as the foreign key.
March 8, 2011 at 2:34 pm
vishal.gamji (3/8/2011)
SSIS will be much faster in this case. Use lookup transform to find the referenced table columns you need.
I'd love to see the test that proves that. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 8, 2011 at 2:50 pm
Working on it.
Although - shouldn't ssis be faster as bulk insert would be much faster than single row inserts?
March 8, 2011 at 2:55 pm
Khades (3/8/2011)
I need to insert rows into the table I am referrencing too within the same package.I need to insert rows into error table. Error table has an identity column primary key called errorid.
After that I need to insert rows into another error table which has errorid as the foreign key.
Multi-cast the ErrorRow stream and ship it to both OLEDB destinations. I'm curious, though, why are you doublestoring your errors, especially with FK relations?
Scrap that, brainfart. I quoted that you had an identity column and still fouled it up.
Second try: Personally I'd generate a GUID into a package variable and include that via a derived column into the first errors table. After the first DataFlow completes, startup a Execute SQL task that will either run a single-statement INSERT INTO T-SQL based on your GUID, or another dataflow that will do the pump for you, with a where clause still based on your GUID. I wouldn't drive this by trigger in either case.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 8, 2011 at 3:13 pm
Doesn't a trigger have the same performance of a Stored Procedure though?
And when I can make the trigger fire once after all my insterts is done, wouldn't this make the trigger run much faster?
I don't know how to test the performance of a trigger, but I put the code into a Proc, and first tested it with SSIS, then I tested it with the Proc, and the more I increased the rows, the better the trigger looked.
At around 100,000 rows, the Procedure was beating the package with flying colors (Less than 1/10th the time). But again, I measured this by time, not by logical reads or anything concrete, so I may be testing the wrong way here.
Any testing suggestions will be happily received 😀
March 8, 2011 at 3:20 pm
vishal.gamji (3/8/2011)
Working on it.Although - shouldn't ssis be faster as bulk insert would be much faster than single row inserts?
Except that if you use fast load, and set the maxinsertcommitsize to something really high, or 0, it will put all the rows in as one batch, and fire the trigger once for the whole batch.
You'd have to put FIRE_TRIGGERS in the fastloadoptions though since fastload doesn't normall fire triggers.
March 8, 2011 at 3:27 pm
Khades (3/8/2011)
Doesn't a trigger have the same performance of a Stored Procedure though?
Coded properly, yes. I find it to be a hidden process, however, and I'd have to look at the stream fastload mechanics. I know it holds the transaction/checkpoint open forever if you don't set the batch number, but I'm not sure of trigger firing occurence.
And when I can make the trigger fire once after all my insterts is done, wouldn't this make the trigger run much faster?
Yep.
I don't know how to test the performance of a trigger, but I put the code into a Proc, and first tested it with SSIS, then I tested it with the Proc, and the more I increased the rows, the better the trigger looked.
As you mentioned, the proc and trigger are equivalent in a bulk usage scenario. The trigger gets fired more often, though, depending on settings. The performance of the proc going down shouldn't drive you to a trigger. Under most (note, most) circumstances, the trigger will perform worse.
At around 100,000 rows, the Procedure was beating the package with flying colors (Less than 1/10th the time). But again, I measured this by time, not by logical reads or anything concrete, so I may be testing the wrong way here.
Any testing suggestions will be happily received 😀
I'm confused by this. Right now your procedure does an import, shipping rows to the staging table and error table. Until you involved the secondary update to the ErrorFK table, I'm not sure how the comparison became involved unless it was my proc vs. second dataflow recommendation.
That was primarily because I don't know what else you plan to do with the secondary data going to the ErrorFK table. If you needed runtime information from your SSIS package, or do other data manipulations, you'd probably want to generate a second dataflow stream. If not, I agree, a direct insert using T-SQL in the same DB is probably the best choice.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 8, 2011 at 3:53 pm
As you mentioned, the proc and trigger are equivalent in a bulk usage scenario. The trigger gets fired more often, though, depending on settings. The performance of the proc going down shouldn't drive you to a trigger. Under most (note, most) circumstances, the trigger will perform worse.
I meant the more rows, the better the Procedure looked. And yes I was testing the Procedure vs the second data flow.
If you needed runtime information from your SSIS package, or do other data manipulations, you'd probably want to generate a second dataflow stream. If not, I agree, a direct insert using T-SQL in the same DB is probably the best choice.
No runtime information or data manipulation is needed, thanks! That answers my question.
Except I guess, if I do use a trigger. How do I make it use only those rows that were inserted along with the PK of that table? Do I have to have some other identifying column and join the inserted table with my ErrorPK table?
March 9, 2011 at 10:26 am
Khades (3/8/2011)
Except I guess, if I do use a trigger. How do I make it use only those rows that were inserted along with the PK of that table? Do I have to have some other identifying column and join the inserted table with my ErrorPK table?
Well, the inserted table will only contain the new records, so no secondary join/indentifier would be necessary. You'd only need an indentifying column if you went the proc route.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 9, 2011 at 10:33 am
How do I get the identity column values from the errorPK table?
March 10, 2011 at 12:45 pm
Khades (3/9/2011)
How do I get the identity column values from the errorPK table?
Trigger:
The inserted table will carry the identity when you go to insert into ErrorFK.
Check this code for a working example:
IF OBJECT_ID ('TriggerTest','U') IS NOT NULL
DROP TABLE TriggerTest
GO
CREATE TABLE TriggerTest (ttid INT IDENTITY(1,1) NOT NULL, ttText VARCHAR(50))
GO
CREATE TRIGGER trg_TriggerTest ON TriggerTest
AFTER INSERT, Update, Delete
AS
SELECT * FROM inserted
SELECT * FROM Deleted
GO
INSERT INTO TriggerTest(ttText) VALUES ('Test')
INSERT INTO TriggerTest(ttText) VALUES ('Test2')
Proc with Identifier:
It will be generated when you return to the data on your identifying column, so just pull it naturally from the table.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 10, 2011 at 1:00 pm
Oh wow, I did not know the magic tables would have that identity even if you don't directly insert/update/delete it, thanks!
March 10, 2011 at 1:47 pm
Khades (3/10/2011)
Oh wow, I did not know the magic tables would have that identity even if you don't directly insert/update/delete it, thanks!
I couldn't remember myself because I avoid triggers so heavily... thus the test code. I had a backup plan for the instead of trigger ready to go if it didn't do what I remembered. :hehe:
Anyways, you're welcome. Good luck with this. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply