August 6, 2009 at 3:38 pm
in brief, what the ssis is doing - reading a flat file source and inserting into sql tables. the flat file have over 8 million rows. the sql tbl structure (tmp_DIS) is
[ID] [int] IDENTITY(1,1) NOT NULL,
[_District] [nchar](4) NULL,
[FILLER] [nchar](3) NULL,
[MODE] [nchar](1) NULL,
[CARD_NUMBER] [nchar](30) NULL,
[DRIVERS_LICENSE] [nchar](30) NULL,
[STATE] [nchar](2) NULL,
[UPDATE_FLAG] [nchar](1) NULL,
[STATUSID] [smallint] NOT NULL,
[ENTRYDATE] [smalldatetime] NOT NULL,
The "first" way of doing things ( this one works fine). But here i am doing 2 reads and 2 writes, which i was trying to avoid. I am just gonna define my dataflow as their is not much going in ssis itself.
Steps -Inside my first dataflow
I have flat file source.
then I use data conversitions to changed into unicode string as my tbl are defined in nchar.
then i have a derived column to add two more columns- statusid and entrydate ( which are in a variable).
A row count
and a sql destination.
Then on the second dataflow task I have a Oledb source which uses tmp_DIS tbl
and a OLEDB destination hist_DIS to load. the tbl structure for this hist tbl is exactly same with one more column of identity filed of its own. hist_DIS
[HISTID] [int] IDENTITY(1,1) NOT NULL,
[ID] [int] NOT NULL,
[_District] [nchar](4) NULL,
[FILLER] [nchar](3) NULL,
[MODE] [nchar](1) NULL,
[CARD_NUMBER] [nchar](30) NULL,
[DRIVERS_LICENSE] [nchar](30) NULL,
[STATE] [nchar](2) NULL,
[UPDATE_FLAG] [nchar](1) NULL,
[STATUSID] [smallint] NOT NULL,
[ENTRYDATE] [smalldatetime] NOT NULL,
The second way of doing above things. which fails ( i tried 3 times today and as soon as it hits over 5 million rows in pipeline, it throws errors. Detail errors is here. got same error all three times
A buffer failed while allocating 3342192 bytes.
The system reports 76 percent memory load. There are 8589070336 bytes of physical memory with 1996615680 bytes free. There are 2147352576 bytes of virtual memory with 82907136 bytes free. The paging file has 16973348864 bytes with 10345119744 bytes free.
The attempt to add a row to the Data Flow task buffer failed with error code 0x8007000E.
SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on component "Flat File Source" (534) returned error code 0xC02020C4. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Now let me explain steps inside dataflow ( i only have one dataflow here as i was trying to avoid 2 reads)
Steps-
FLat file source
then data convert transformation
Derived column transformation ( a little different here is i add three columns here; the first two like above, and Ident column which is also called from variable)
then i have script component used as transformation (what script component is doing is increaminting the Ident value for each row. so its kind of row id that i generate because i need to insert that into his_dis tbl)
here is the code in scriptmports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub PreExecute()
MyBase.PreExecute()
'
' Add your code here for preprocessing or remove if not needed
'
End Sub
Public Overrides Sub PostExecute()
MyBase.PostExecute()
'
' Add your code here for postprocessing or remove if not needed
' You can set read/write variables here, for example:
' Me.Variables.MyIntVar = 100
'
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
MyBase.Input0_ProcessInputRow(Row)
Dim vars As IDTSVariables100
Dim Count As Integer
Me.VariableDispenser.LockOneForRead("PipelineRows", vars)
Count = vars(0).Value
vars.Unlock()
Me.VariableDispenser.LockOneForWrite("PipelineRows", vars)
vars(0).Value = Count + 1
Row.Ident = Count + 1
vars.Unlock()
End Sub
End Class
Then a row count
a multicast and
and two oledb destination.
one for tmp_dis and one for hist_dis.
also, it seems to me that the this way of doing is slow than the first one, although i am avoding two reads.
SI there something in the script that i am doing wrong.
When i do the same thing for other loads ( where rows are less than a miilion) it works fine, although i haven't tested which is faster.
I am trying to do this becoz i want to generate a ID for each row and insert into hist tbl in parallel rather than insernting into temp tbl which generates ID and from there insert into hist tbl, which is sequentail load.
ANyone any thoughts where i can start? although it seems to me a memeory issue.
August 7, 2009 at 2:36 pm
Your error is a memory buffer error for SSIS. This is usually because you have a data flow that is handling too much data, your server needs more memory, or you need to tweak the data buffer settings in your data flow.
I haven't seen a really good tutorial on optimizing the buffers, but here's a start:
August 7, 2009 at 2:54 pm
Thanks JOhn.
If u remember, you were the one who helped me to produce that identity property in the dataflow pipelines. But the fact is, the first way of loading is faster than the second one.
I know, I am using two reads and two writes in the first process, but it is still better in 2 ways. first saves half the time for load, and second, ssis package didn't fail.
For 500,00 rows the first way of doing things took less than a minute, while second way of doing took 1 min and 30 secs.
So i decided to make every package like the first process.
Thanks anyway, for all the helps.
August 7, 2009 at 2:59 pm
When you are referring to 'the first process', do you mean reading the file in twice using 2 data flows to load your 2 tables? In other words, the solution that we worked out in the other thread?
August 7, 2009 at 3:05 pm
No, the first process ie reading from flat file and loading into sql tmp tbl and then from tmp tbl to sql hist tbl. So here we have 2 dataflow.
the second process is the one u helped me to solve on another thread, where we read from flat file and in the pipline we generate identity value for rows and then doing multicast to load into 2 sql tbs in parallel. so here just one dataflow.
the first one is better- saves half the time for each ssis and also ssis didn't failed like it does for the second process which is a Buffer/Memory error.
August 7, 2009 at 3:09 pm
Gkhadka (8/7/2009)
No, the first process ie reading from flat file and loading into sql tmp tbl and then from tmp tbl to sql hist tbl. So here we have 2 dataflow.the second process is the one u helped me to solve on another thread, where we read from flat file and in the pipline we generate identity value for rows and then doing multicast to load into 2 sql tbs in parallel. so here just one dataflow.
the first one is better- saves half the time for each ssis and also ssis didn't failed like it does for the second process which is a Buffer/Memory error.
You know, in the other thread, I never suggested using a multi-cast transformation. I suggested using 2 distinct data flows, reading the file in twice and using lookups in your second data flow to get your values. Anyway, as long as you're happy with what you've done. 🙂
Using the Multi-cast is doubleing the memory requirements for that data flow, I'm not surprised that it bombs on a data flow that large.
August 7, 2009 at 3:20 pm
No worries JOHN. I learned how to generate identity value in the pipelines. can be helpful in the future.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply