January 6, 2014 at 1:57 pm
I've found that using an Excel Source in the Data Flow of an SSIS package makes Visual Studio designer unresponsive if the Excel file is relatively large (> 100 MB).
My question is if anyone knows of a property or setting that I can adjust so that Visual Studio's SSIS designer won't choke when I'm trying to use an Excel Source connected to a very large Excel file?
I'm using Visual Studio 2012 with version 11.0.3369.0 of Microsoft SQL Server Integration Services, my computer is 64-bit running windows 7 and my machine has 12 GB of RAM, I'm using Excel 2007 for the Excel Connection Manager and the ConnectionString results in "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\scratch\****.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=YES";".
I've tried setting the property DelayValidation=True for the Data Flow and the property ValidateExternalMetadata=False for the Excel Source component; however it still seems to be doing some kind of validating of the spreadsheet once the Excel Source is added to the data flow and Visual Studio becomes unresponsive. I can see in Task Manager that the process devenv.exe *32 is chewing up memory, it grows to be using about 2 GB before it seems to finally give up. The total size of the Excel spreadsheet is 360 MB, so I'm not sure why it is using so much memory.
The only solution I've found is to open the Excel spreadsheet I want to load, save as a .csv, and then import the .csv file. This is annoying though because there are 160 columns, many of which are IDs with leading zeros that need to be preserved but when I save the Excel file as .csv Excel thinks those columns are numbers and doesn't keep the leading zeros.
January 6, 2014 at 2:39 pm
What if you don't use the Excel source, but the OLE DB source?
(you can write a SQL query to get data out of Excel)
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 6, 2014 at 7:42 pm
Thank you for the idea, I wasn't familiar with using other sources to extract data from Excel. That is a very useful idea, especially if you only need to select a few of the columns from the spreadsheet or need to specify the range of cells to use for a table rather than the whole worksheet. Unfortunately I need all of the columns and rows in the spreadsheet and Visual Studio still locked up the same way it had been when using the Excel Source. After some reading it sounds like the Excel Source is actually just using the OLE DB provider for the Microsoft Office 12.0 Access Database Engine under the covers when you choose Excel 2007 in the Excel Connection Manager. I also tried using an ODBC Source with a connection string like "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=C:\scratch\spreadhseet.xlsx;HDR=Yes;", this worked too but also made the designer in Visual Studio lock up when against a large spreadsheet.
I ended up opening the large Excel spreadsheet and deleting all but the first few rows and saving that slimmed down version of it, then designed my SSIS package off of a connection to the slimmed down file and when i was ready to run it I changed the connection to use the large file.
I don't generally think people should be using Excel files to transfer large amounts of data, unfortunately I'm in a position where I commonly get these gigantic Excel files, sometimes 700 MB and they are really hard to work with. I'm surprised there aren't more complaints of this out there... or maybe there is some really easy way to load gigantic Excel files into SQL Server that I'm unaware of. I had also tried using the Data Import Wizard in SSMS to import this spreadsheet but after a considerable wait the wizard threw an error... and I believe that wizard is just creating the same type of SSIS package under the covers that I was trying to create to import this data.
January 6, 2014 at 11:53 pm
You're correct, the import wizard uses SSIS behind the scenes.
I'm still quite curious as to why Visual Studio hangs. The ACE OLE DB provider only samples a few rows to determine the metadata of the columns (maximum around 65000 rows, this is determined by a registry setting).
Excel is indeed unfit to act as a vessel to transport large amounts of data. CSV (or even XML) is preferrred.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 7, 2014 at 10:49 am
+1
No clue as to why VS would hang. How much memory is on the Visual Studio host? Does the package execute when calling it from DTEXEC.exe or from the SSIS catalog/server?
January 7, 2014 at 2:07 pm
I checked my registry and see that the value for HKEY_LOCAL_MACHINE ? Software ? Wow6432Node ? Microsoft ? Office ? 12.0 ? Access Connectivity Engine ? Engines ? Excel ? TypeGuessRows is set to 8.
I'm running VS on my work computer with 12 GB of memory, not a lot is free, and I have just been executing the package from VS because these are typically only used once to load the data into SQL Server. Executing the package isn't really the problem though, it will execute (although it takes a very long time to spin up and start doing anything), but VS locks up when I'm just designing the package, so my problem is that as soon as I add the Excel Source I then can't complete designing the package to get it to a point where I'm ready to run it.
January 7, 2014 at 3:11 pm
I don't know if this is useful info, but to recreate I do the following:
1-Create new SSIS package
2-Add Data Flow
3-Add Excel Source component
4-Double click Excel Source to open
5-From Excel Source Editor, click create New... next to Excel Connection manager
6-Choose Excel file path, Excel Version=Microsoft Excel 2007, check First column has column names
7-Click OK
8-From dropdown "Name of the Excel sheet:" choose the worksheet from the Excel file I want to load
9-Click Ok
10-Right click Excel Source, select "Show Advanced Editor"
So everything works fine until step 9, then it takes a long time (which might be normal for the first time it has to check the Excel file for columns and everything), I can see in task manager that devenv.exe *32 starts consuming more and more memory up to about 1.6 GB (there is a stair step pattern as it grows); finally the Excel Source Editor closes and devenv.exe *32 releases memory back down to about 300 MB, but VS is unresponsive and there is still a red X on the Excel Source component, I see the same thing happen where devenv.exe *32 starts consuming memory up to 1.6 GB (same stair step pattern), finally the red X disappears on the Excel Source component and devenv.exe *32 releases memory again. At this point VS is responsive, but when I move to step 10 devenv.exe *32 starts consuming memory again, but this time VS throws this error:
TITLE: Microsoft Visual Studio
------------------------------
Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.SqlServer.Dts.Pipeline.Wrapper.IDTSCustomProperty100'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{A0836E5E-50CF-4B28-9577-E25173725F0B}' failed due to the following error: The application called an interface that was marshalled for a different thread. (Exception from HRESULT: 0x8001010E (RPC_E_WRONG_THREAD)).
January 7, 2014 at 5:35 pm
To do the row sampling, afaik it still loads the entire sheet into memory, which is why you stall after you select the source sheet. The cache is too small for it and it basically chokes until it does something in the background (I believe it uses the disk swapfile). Takes a few ages, if it ever completes.
The workaround, of using a truncated source as the sample during design/metadata calculation and then processing the file correctly during runtime is what I've done for a while.
That said, I usually try to get everyone to save their excels as CSVs instead of trying to fight with the thing. Clients are particularly fussy when I ask them that, though, so I feel your pain.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 8, 2014 at 12:11 am
Evil Kraig F (1/7/2014)
Clients are particularly fussy when I ask them that, though, so I feel your pain.
Because god-forbid that a process would run smoothly for once in a while
:unsure:
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 8, 2014 at 9:27 am
I have a personal preference (if possible) for remoting into a dev server that has BIDS installed so that my machine does not get crippled.
Koen/Kraig,
Is this type of external metadata validation behavior (entire worksheet) able to be short-circuited at design time (as opposed to the runtime behavior with DelayValidation).
January 8, 2014 at 10:25 am
The first two things I thought to try were to set "DelayValidation=True" for the Data Flow and "ValidateExternalMetadata=False" for the Excel Source component; but neither seemed to help for me.
January 8, 2014 at 10:50 am
I was hoping for some sleazy hack besides those 2 options..... 😎
January 8, 2014 at 10:59 am
sneumersky (1/8/2014)
I have a personal preference (if possible) for remoting into a dev server that has BIDS installed so that my machine does not get crippled.Koen/Kraig,
Is this type of external metadata validation behavior (entire worksheet) able to be short-circuited at design time (as opposed to the runtime behavior with DelayValidation).
You can design your package using only a tiny dataset, so that validation doesn't consume too much time.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 8, 2014 at 11:28 am
sneumersky (1/8/2014)
Is this type of external metadata validation behavior (entire worksheet) able to be short-circuited at design time (as opposed to the runtime behavior with DelayValidation).
The confirm metadata/etc options are runtime options. You can't avoid this at design time.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply