July 27, 2010 at 3:12 pm
I intend to import csv files to SQL Server. Which would be a better option: bulk insert or openrowset. Would anyone be able to elaborate what are the disadvantages and advantages of each?
We have a current implementation wherein we use bulk insert to load files to SQL Server via temp tables before loading them to their actual tables. We do this because the actual tables have other fields that are system-defined rather than coming from the file. We intend to improve this to make the load faster but not sacrificing flexibility to import files of different formats. One way is using openrowset bulk to bypass loading to temp tables and having format files to define the fields. Isn't this the same as using bulk insert?
July 30, 2010 at 7:35 am
I'm a big fan of SSIS (SQL Server Integration Services). You'll find that you can do a lot more with the data in terms of cleansing and logging exactly what happened using this fuctionality than the features you are currently thinking of using.
To create a basic SSIS package you can use the Import/Export data wizard. Save the package and then make it as complicated as it needs to be to accomplish your task. For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.
Rich
July 30, 2010 at 4:40 pm
BULK INSERT only really works if you're doing a raw data dump from the CSV file into a table that has the exact same matching structure as the CSV file.
OPENROWSET allows you to effectively query off of the csv file so you can write it with selecting from specific columns in the CSV file and also include a WHERE clause. OPENROWSET can be used with a FORMATFILE that will tell SQL Server the datatypes, lengths, and column names, which can be useful if you don't want all of the data to be imported or imported in a specific way, giving you more control over what you import.
August 1, 2010 at 3:37 pm
I'm also a big fan of SSIS. The only worrying thing is that once it's deployed and goes live in production and then something goes wrong, the only way I know to debug is by opening the package via BIDS. Our DBAs are not skilled enough to do this and would require assistance from developers. It's not straightforward to apply a fix and redeploy as well unlike when it's implemented as a stored proc. The package is scheduled to run nightly and getting hold of the developer will only come in the morning. Besides this, I'm not sure of other ways to get the package up and running sooner.
August 2, 2010 at 8:09 am
<snip>
For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.
Could you explain how you do that in some more detail, please? Is the SSIS package set to be in a perpetual loop until it finds the file? I wasn't aware SSIS had a built-in tool for monitoring folders.
Thanks,
Rich
August 2, 2010 at 8:19 am
I'm interested in this too. I would love to use this for log files.
August 2, 2010 at 1:16 pm
There is a File Watcher Task available from www.SQLIS.com. Some people have reported memory leaks with it but I'm not sure if that's true or not. Another alternative is to write your own Windows service with VB/C#/whatever that will watch the folder and kick off your SSIS package when it finds a new file. I've done that before and I can say it's virtually no memory overhead.
August 2, 2010 at 1:24 pm
Thanks for the link!
Rich
August 11, 2010 at 1:14 pm
Couldn't that be done too via a SQL Agent job that runs every say 30 minutes or 1 hour? Just periodically look to see if there is a file and if so, then execute the rest of the SSIS package.
August 12, 2010 at 6:28 am
This is 'a depends' answer.
You can use either, if all the csv files are of the same format then you could BULK INSERT to a staging table before transfering to output tables, this is what I do.
This is the code I use to transfer files, beware it uses a cursor.
CREATE TABLE #files
(
rowid int IDENTITY(1,1),
fname varchar(255) NULL
)
DECLARE@filename varchar(255),
@count int,
@sql nvarchar(500),
@result int,
@cmd nvarchar(4000)
INSERT#files
(fname)
EXECUTE@result = master.dbo.xp_cmdshell 'DIR \\servername\folder /a-d /b'
IF EXISTS (SELECT * FROM #files WHERE fname = 'File Not Found')
BEGIN
PRINT' No Files Found'
RETURN 0
END
IF @result <> 0
BEGIN
PRINT'Command (DIR) Failed. Result = '+CAST(@result as varchar)
RETURN 1
END
IF NOT EXISTS (SELECT fname FROM #files WHERE fname IS NOT NULL)
BEGIN
PRINT'Failed to Retrieve Files'
RETURN 1
END
DECLARE filecursor CURSOR FOR
SELECTfname
FROM#files
WHEREfname IS NOT NULL
OPEN filecursor
FETCH NEXT FROM filecursor INTO @filename
WHILE @@FETCH_STATUS = 0
BEGIN
BULK INSERT .....
FETCH NEXT FROM filecursor INTO @filename
END
CLOSE filecursor
DEALLOCATE filecursor
Far away is close at hand in the images of elsewhere.
Anon.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply