bulk insert vs openrowset

Question

bulk insert vs openrowset

mambion

Old Hand

Points: 349
More actions
July 27, 2010 at 3:12 pm

#235700

I intend to import csv files to SQL Server. Which would be a better option: bulk insert or openrowset. Would anyone be able to elaborate what are the disadvantages and advantages of each?
We have a current implementation wherein we use bulk insert to load files to SQL Server via temp tables before loading them to their actual tables. We do this because the actual tables have other fields that are system-defined rather than coming from the file. We intend to improve this to make the load faster but not sacrificing flexibility to import files of different formats. One way is using openrowset bulk to bypass loading to temp tables and having format files to define the fields. Isn't this the same as using bulk insert?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

RichardDouglas SSCarpal Tunnel Points: 4029 More actions · Answer 1

I'm a big fan of SSIS (SQL Server Integration Services). You'll find that you can do a lot more with the data in terms of cleansing and logging exactly what happened using this fuctionality than the features you are currently thinking of using.

To create a basic SSIS package you can use the Import/Export data wizard. Save the package and then make it as complicated as it needs to be to accomplish your task. For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.

Rich

Hope this helps,
Rich

[p]
[/p]

blackbird Ten Centuries Points: 1127 More actions · Answer 2

BULK INSERT only really works if you're doing a raw data dump from the CSV file into a table that has the exact same matching structure as the CSV file.

OPENROWSET allows you to effectively query off of the csv file so you can write it with selecting from specific columns in the CSV file and also include a WHERE clause. OPENROWSET can be used with a FORMATFILE that will tell SQL Server the datatypes, lengths, and column names, which can be useful if you don't want all of the data to be imported or imported in a specific way, giving you more control over what you import.

mambion Old Hand Points: 349 More actions · Answer 3

I'm also a big fan of SSIS. The only worrying thing is that once it's deployed and goes live in production and then something goes wrong, the only way I know to debug is by opening the package via BIDS. Our DBAs are not skilled enough to do this and would require assistance from developers. It's not straightforward to apply a fix and redeploy as well unlike when it's implemented as a stored proc. The package is scheduled to run nightly and getting hold of the developer will only come in the morning. Besides this, I'm not sure of other ways to get the package up and running sooner.

Rich Mechaber SSChampion Points: 11130 More actions · Answer 4

<snip>
For example you could have it watching a directory and so it will pick up the file as soon as it is entered into the directory, thereby producing the data to the users in the quickest possible time - as opposed to a polling interval for a scheduled job.

Could you explain how you do that in some more detail, please? Is the SSIS package set to be in a perpetual loop until it finds the file? I wasn't aware SSIS had a built-in tool for monitoring folders.

Thanks,

Rich

jvanderberg SSCertifiable Points: 6762 More actions · Answer 5

I'm interested in this too. I would love to use this for log files.

Ben Teraberry SSChampion Points: 11845 More actions · Answer 6

There is a File Watcher Task available from www.SQLIS.com. Some people have reported memory leaks with it but I'm not sure if that's true or not. Another alternative is to write your own Windows service with VB/C#/whatever that will watch the folder and kick off your SSIS package when it finds a new file. I've done that before and I can say it's virtually no memory overhead.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Rich Mechaber SSChampion Points: 11130 More actions · Answer 7

Rich Mechaber

SSChampion

Points: 11130

August 2, 2010 at 1:24 pm

#1201212

Thanks for the link!

Rich

blackbird Ten Centuries Points: 1127 More actions · Answer 8

Couldn't that be done too via a SQL Agent job that runs every say 30 minutes or 1 hour? Just periodically look to see if there is a file and if so, then execute the rest of the SSIS package.

David Burrows SSC Guru Points: 65143 More actions · Answer 9

This is 'a depends' answer.

You can use either, if all the csv files are of the same format then you could BULK INSERT to a staging table before transfering to output tables, this is what I do.

This is the code I use to transfer files, beware it uses a cursor.

CREATE TABLE #files

(

rowid int IDENTITY(1,1),

fname varchar(255) NULL

)

DECLARE@filename varchar(255),

@count int,

@sql nvarchar(500),

@result int,

@cmd nvarchar(4000)

INSERT#files

(fname)

EXECUTE@result = master.dbo.xp_cmdshell 'DIR \\servername\folder /a-d /b'

IF EXISTS (SELECT * FROM #files WHERE fname = 'File Not Found')

BEGIN

PRINT' No Files Found'

RETURN 0

END

IF @result <> 0

BEGIN

PRINT'Command (DIR) Failed. Result = '+CAST(@result as varchar)

RETURN 1

END

IF NOT EXISTS (SELECT fname FROM #files WHERE fname IS NOT NULL)

BEGIN

PRINT'Failed to Retrieve Files'

RETURN 1

END

DECLARE filecursor CURSOR FOR

SELECTfname

FROM#files

WHEREfname IS NOT NULL

OPEN filecursor

FETCH NEXT FROM filecursor INTO @filename

WHILE @@FETCH_STATUS = 0

BEGIN

BULK INSERT .....

FETCH NEXT FROM filecursor INTO @filename

END

CLOSE filecursor

DEALLOCATE filecursor

Far away is close at hand in the images of elsewhere.
Anon.