Error in Fuzzy Lookup

  • I have been investigating an error in my SSIS package, and have discovered a problem with a Fuzzy Lookup.

    I am performing a Fuzzy Lookup on three columns, and the value in all three is "...".

    This caused the following error when just that single record goes through the dataflow:

    [Fuzzy Lookup [30841]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Communication link failure".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Named Pipes Provider: No process is on the other end of the pipe.

    ".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Named Pipes Provider: Could not open a connection to SQL Server [233]. ".

    The error was different when there were a number of records in the dataflow, but I have narrowed the issue down to just a single record, from the orignal 500,000.

    Is this a known issue? I am trying to think of a way to prevent it happening again. The three columns are First_Name, Last_Name and Address_Line_1. All of these could legitimately contain ".", but probably not on its own. Do you think that I need to add some validity checking to inbound records to prevent such entries, as it brought down my entire system (which thankfully is only in testing at this stage.)

    Any thoughts or experience of this issue would be appreciated.

    Paul

  • I realise that this is drifting off topic somewhat, but can anyone help me to get around this?

    I am wondering if there is a way of using the NULLIF function on the orignal select statement whereby these fields can be read as NULL if the do not contain any alphanumeric characters?

    Alternatively I can insert an Execute SQL task to replace fields in the recordset that do not contain alphanumeric characters.

    Can anyone help with the statement?

    Thanks,

    Paul

  • Could you post a small sample file (~100 records) that has the rogue record in it? And perhaps a screenshot of what the package looks like, sources, transforms and destinations, etc. I'm not sure I understand the issue you are having.

    [font="Arial Narrow"]bc[/font]

  • Hi _bc,

    I believe that I was mistaken in my initial diagnosis.

    I am currently testing a SSIS package that is going to be used to clean and deduplicate incoming data before its entry into our data warehouse. Our total dataset is currently around 500,000 records. I have been importing these 100k at a time, and it seems that on the third run these errors start occuring.

    [Fuzzy Lookup [30841]] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Communication link failure".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Named Pipes Provider: No process is on the other end of the pipe.

    ".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

    An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "Named Pipes Provider: Could not open a connection to SQL Server [233]. ".

    Initially I thought that there was one or more rogue records causing the problem, but actually I think now that it is just something that happens once I have more than 200,000 or so records in the dataset. (in fact certain records, such as the one I mentioned in my initial post do make the error occur earlier, presumably becuase the lookup takes longer, but eventually as the dataset fills up any record will cause the problem it seems).

    Is there a limit on the size of dataset that a fuzzy lookup can handle, or is the limitation likely to be on the perfoirmance of our test server? The package seems to pause for quite a long time on the Fuzzy Lookup component, then give the database connection error. I am now thinking that this is some sort of timeout. I wonder if I can increase the timeout?

    The production server for this is not going to be available for some time, so I need to try to establish what the cause of this is. Has anyone else come across this?

  • I would check the Custom Properties of the Fuzzy Lookup. If the Exhaustive property is set to True, I've seen it take forever. The Reference index settings may need to be adjusted. I would remove or reroute the records that will not match (...) before they get to the Fuzzy Lookup, that's wasting time and resources. Have you tried using batches smaller than 100,000 (perhaps 10,000)? How many rows are in the Reference index you are comparing to? There are a lot of variables that could be causing your issues.

    [font="Arial Narrow"]bc[/font]

  • Thanks for the reply.

    With smaller group it generally runs through. I am not likely to have more than 1000 records at a time coming through once this is in production. I just want to be sure what the issue is before release.

    I am looking up against about 500,000 rows ultimately. However as I am in testing I am constantly changing te dataset. The errors generally start to occur once I have over 250,000 records.

    I have already redirected the records that will not be matched in the incoming data. DO you think it would help to create a view in the main dataset (or a temporary table?) so that I only look up AGAINST records that are likely to match too? I had thought that the Fuzzy Lookup would do this itself, but maybe not?

    I appreciate your advice.

    Paul

  • Yes, I generally create a view limited to the records that will likely match. The Fuzzy lookup is very intensive and the more limited you are comparing against, the better.

    I think when you are getting into the 250,000 range you are hitting a memory issue and/or timeouts as you've already surmised.

    [font="Arial Narrow"]bc[/font]

  • Thanks _BC.

    I have reduced the lookup data by creating views that exclude records taht would never match.

    This appears to have both prevents the errors, and reduced the total running time of my package by about 30%.

    I appreciate your help!

  • Glad it was helpful.

    [font="Arial Narrow"]bc[/font]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply