May 12, 2014 at 8:06 am
I am having trouble with an SSIS 2008 package, specifically with a Lookup transformation in a Data Flow.
The Lookup is caching a large volume of data (approx 400 million records), but failing to build its internal hash table.
Here are some details on the Lookup:
The Lookup transformation is set to 'Full cache', with an OLE DB connection manager.
The SQL query in the 'Connection' tab uses a very simple SELECT statement that returns two just columns (both bigint data types).
Here are the error details:
The Lookup task fails in the pre-execute phase, with the following error:
“Unable to allocate memory for hash table.
Component "Lookup Task 1" (12417) failed the pre-execute phase and returned error code 0xC0208247.
There are 127 GB of physical memory with 114 GB free.
There are 8,191 GB of virtual memory with 8187 GB free.
The paging file has 255 GB with 242 GB free.”
Although the Lookup task consumes approximately 10GB of memory during this pre-execute phase, there is still plenty of RAM available on the server, so I am confused by the 'unable to allocate memory' message.
Prior to the error message, I can see that the Lookup task completes the caching process:
"...component "Lookup Task 1" (12417) has cached a total of 377476666 rows..."
But the Lookup task does not complete the processing phase:
"The component "Lookup Task 1" (12417) failed to process the rows in the cache. The processing time was 494.024 second(s)..."
The following warning is also issued:
"The buffer manager failed a memory allocation call for -1270923128 bytes, but was unable to swap out any buffers to relieve memory pressure. 276542 buffers were considered and 276542 were locked. Either not enough memory is available to the pipeline because not enough are installed, other processes were using it, or too many buffers are locked."
Before I pursue other potential design patterns, I would like to understand the error.
Is it a limitation of the software - perhaps on the size of the hash table that the Lookup task builds?
I also find it strange that a negative integer is referenced (-1270923128 bytes) in the warning.
I tried adjusting the DefaultBufferSize and DefaultBufferMaxRows settings, but those did not help.
I also tried using both Partial Cache and No Cache, but both of them performed very poorly, and are not viable alternatives.
Any help/insight would be greatly appreciated.
May 13, 2014 at 8:01 am
How many fields are you pulling into the cache? You only need to cache the fields you need to match on and the fields you need to output, anything else is wasted resource.
If you are comparing incoming records with existing records, I would suggest using some sort of checksum algorithm (we use Konesans Checksum but others are available on codeplex) (you will of course need to store the checksum value in the current version of the record). The cache just contains the business key fields and the checksum result and a conditional split checks if found records have matching checksums.
Partial cache and no cache are extremely slow. If you need to compare 400M records then any solution is going to be memory intensive in SSIS. You could try a Cache Transform but I think this still needs to be loaded into RAM before it can be used.
May 23, 2014 at 7:54 am
Thanks for the reply.
We've raised a support ticket with Microsoft, and they have confirmed that there is a memory leak during the hash table build.
We've found other design patterns that will solve the problem in more efficient ways (using a LEFT JOIN in the Source task, followed by a Conditional Split to check for NULL values), but at this point, we would like some clarity/closure on the issue from Microsoft.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply