August 27, 2008 at 10:49 am
I have 4 very large CSV files (large number of records and large number of fields) and need to script the process of importing the data from these CSVs into SQL. I gather that the Bulk Insert command should do the job here, however, it would appear that this command requires that a suitable table is created first. I don't have the exact schema of the database that the CSVs were exported from and would like to avoid having to go through the CSVs manually creating a table with the right number and size fields.
So I thought I would use the Import Data wizard to manually import the CSVs (thus creating a table with the correct fields). Then I'd be able to script the creation of the table, and include that part in my Bulk Insert script (just before I call the actual Bulk Insert command itself). I hope that part makes sense. Please let me know if there's a better way of doing this.
Problem is that although I've tried using the "Suggest Types" option, when I actually go to import the file I get errors warning of truncation, even when I go through and manually increase the size of the fields.... Is there an easier way to do this?
August 27, 2008 at 11:40 am
August 27, 2008 at 12:02 pm
here's my suggestion:
add a folder which contains all of your .csv files as a linked server.
a text linked server will show evert .txt file and every.csv file as a table...
so you can do something really simple like
SELECT *
INTO NEWTABLE
FROM TxtSvr...sample#csv
which would create the table on sql server with the same structure.
i think csv files assume the first row in the file is the name of the columns.
here's the syntax:
--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\',
NULL,
'Text'
GO
--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
GO
--===== List the tables in the linked server which is really a list of
-- file names in the directory. Note that the "#" sign in the
-- Table_Name is where the period in the filename actually goes.
EXEC dbo.sp_Tables_Ex TxtSvr
GO
--===== Query one of the files by using a four-part name.
SELECT *
FROM TxtSvr...sample#csv
--===== Drop the text server
EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'
GO
Lowell
August 27, 2008 at 8:38 pm
Lowell (8/27/2008)
here's my suggestion:add a folder which contains all of your .csv files as a linked server.
a text linked server will show evert .txt file and every.csv file as a table...
so you can do something really simple like
SELECT *
INTO NEWTABLE
FROM TxtSvr...sample#csv
which would create the table on sql server with the same structure.
i think csv files assume the first row in the file is the name of the columns.
here's the syntax:
--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\',
NULL,
'Text'
GO
--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
GO
--===== List the tables in the linked server which is really a list of
-- file names in the directory. Note that the "#" sign in the
-- Table_Name is where the period in the filename actually goes.
EXEC dbo.sp_Tables_Ex TxtSvr
GO
--===== Query one of the files by using a four-part name.
SELECT *
FROM TxtSvr...sample#csv
--===== Drop the text server
EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'
GO
Heh... looks real familiar... 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2008 at 3:16 am
Hi Lowell, Hi Jeff,
Thanks for your suggestion - sounds just right for my situation. Everything works fine until I get to this bit: SELECT *
FROM TxtSvr...sample#csv It says the table isn't there. In fact the text server contains just three tables and they are debuglog#txt, services#txt and test#txt. None of my CSV files have been picked up. Any ideas on what I need to change?
August 28, 2008 at 3:37 am
Ok, apologies for that guys - I needed to put the csv files on the sql server, not my local drive :blush:
August 28, 2008 at 5:08 am
Hmmm, sorry that this is dragging on everyone, now that I have the files in the right place (;)) the tables are being picked up. However, when I open the tables, there are twice as many rows in the table than in the CSV file. And every value in every row/column is NULL.
I should say that my CSV files are actually TAB separated. However, I've tried replacing the tabs with commas and although I then get the correct number of columns, I'm still getting all NULLs. Not sure if it makes any difference, but I've got tons of columns (>50)...
August 28, 2008 at 5:29 am
thomas.lemesurier (8/28/2008)
Hmmm, sorry that this is dragging on everyone, now that I have the files in the right place (;)) the tables are being picked up. However, when I open the tables, there are twice as many rows in the table than in the CSV file. And every value in every row/column is NULL.I should say that my CSV files are actually TAB separated. However, I've tried replacing the tabs with commas and although I then get the correct number of columns, I'm still getting all NULLs. Not sure if it makes any difference, but I've got tons of columns (>50)...
I'm pretty sure the first row determines the number of columns...if there's anything preceeding the data, that might be an issue.
I could understand getting some rows that are null for all fields at the end of the file; every CrLf(carriage Return Line Feed...Char(10) + Char(13) is used to determine whether a row exists; if there are a bunch of CrLf at the end of the document, I'd expect that,and would use a WHERE clause to ignore then (WHERE COL1 IS NOT NULL or something)
sometimes a better text editor can help; I use EditPlus, which has find and replace that allows regular expressions and more; if i were to open a file like this in a linked text server, you can see that rows 9 thru 14 would exist and be null.
Note how this text editor displays CrLf witha paragraph symbol, Tabs As a Dbl Arrow character,and spaces as a floating period...it helps visualize the data much better.
If ALL of your data is null, it might be that the file is unix formatted...it has CHAR(10) as the row limiter, and doesn't the additional have Char(13) that is expected;
you could find and replace, or for example with editplus, you can open the file, and save it as a PC file with document...Fileformat..PC/UNIX/Mac setting and re-save the text file.
Lots of other text editors have the same ability.
Lowell
August 28, 2008 at 6:43 am
thomas.lemesurier (8/28/2008)
Hmmm, sorry that this is dragging on everyone, now that I have the files in the right place (;)) the tables are being picked up. However, when I open the tables, there are twice as many rows in the table than in the CSV file. And every value in every row/column is NULL.I should say that my CSV files are actually TAB separated. However, I've tried replacing the tabs with commas and although I then get the correct number of columns, I'm still getting all NULLs. Not sure if it makes any difference, but I've got tons of columns (>50)...
Time for you to attach a file so we can see what it actually looks like.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2008 at 8:24 am
Hmmm, Lowell hit the nail on the head - it's a unicode file. But having said that, I've tried saving it out as ANSI (just using notepad) just it's still giving me nothing but NULLS.
OK, I've uploaded a cut down version of the file and stuck a .txt extension on the end so it can be uploaded. Thanks,
Tom
August 28, 2008 at 8:46 am
well your example shows up just fine for me...
renamed the file to "Artists_mini.csv" ,as only one period is allowed to exist for a text server to see it:
selected from the text server, saw everything basically as undelimited.
i replaced all tabs with a comma.
I saved as PC format.
selected from the text server.
changed the format to UNIX.
selected from the text server.
was able to see the data either way.
Lowell
August 28, 2008 at 8:56 am
maybe a comma in the exisitng data is affecting the real data you are looking at...
Can you search the file for a pre-existing commas before you replace the tabs with commas? some of the same fields, like "Humple And Marlin" might b a company name like "Microsoft,Inc." to mess up your assumptions
Lowell
August 28, 2008 at 10:25 pm
thomas.lemesurier (8/28/2008)
Hmmm, Lowell hit the nail on the head - it's a unicode file. But having said that, I've tried saving it out as ANSI (just using notepad) just it's still giving me nothing but NULLS.OK, I've uploaded a cut down version of the file and stuck a .txt extension on the end so it can be uploaded. Thanks,
Tom
Heh... doesn't matter now that we know what the file actually looks like. 😀
Quick! What's the difference between a CSV file and a TSV file? :hehe: Just the delimiter... if we can read a CSV file into a temp table without knowing what's in the CSV file, there's absolutely no reason why we can't do the same with a TSV file... we just gotta let the system know what the delimiter is. Once we're done importing, we should set everything back to the way it was, as well.
As Lowell stated, the system doesn't like file names with two periods in it... so I, too, changed the filename to Artists_mini.txt
Here's the tested code... I've used your "mini" file as is. As usual, the details are in the comments. 😉
/**********************************************************************************************************************
Purpose:
Script to read from a TSV (Tab Separated Values) file having an unknown number of columns of unknown data type.
This demo is setup to read a given file called Artists_mini.txt located in C:\Temp.
Author: Jeff Moden - 28 Aug 2008
**********************************************************************************************************************/
--===== Declare the local variables we'll need. Names are self-documenting
DECLARE @RootKey SYSNAME,
@key SYSNAME,
@Value_Name SYSNAME,
@Type SYSNAME,
@PreviousValue SYSNAME,
@NewValue SYSNAME
--===== Preset the "constants". These are self documenting as well
SELECT @RootKey = N'HKEY_LOCAL_MACHINE',
@key = N'SOFTWARE\Microsoft\Jet\4.0\Engines\Text',
@Value_Name = N'Format',
@Type = N'REG_SZ',
@NewValue = N'TabDelimited' --May be a character using N'Delimited(,)' where the comma is the character
--Original setting is usually N'CSVDelimited'
--===== Remember the previous value so we can set it back
EXEC Master.dbo.xp_Instance_RegRead
@RootKey = @RootKey,
@Value_Name = @Value_Name,
@Value = @PreviousValue OUTPUT
--===== Show what the original delimeter setting was set to.
-- This, of course, may be commented out
SELECT 'Previous delimiter setting = ' + @PreviousValue
--===== All set... define the new temporary delimiter
EXEC Master.dbo.xp_Instance_RegWrite
@RootKey = @RootKey,
@Value_Name = @Value_Name,
@Type = @Type,
@Value = @NewValue
--===== Read the TAB delimited file without knowing any of the columns
-- Notes: "Database =" identifies the directory the file is in
-- The FROM clause identifies the file name
SELECT *
INTO #MyHead
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Text;Database=C:\Temp;HDR=YES;FMT=Delimited',
'SELECT * FROM Artists_mini.txt')
--===== Show the content of the table we just populated
SELECT * FROM #MyHEad
--===== Restore the original delimiter setting
EXEC Master.dbo.xp_Instance_RegWrite
@RootKey = @RootKey,
@Value_Name = @Value_Name,
@Type = @Type,
@Value = @PreviousValue
--===== Cleanup after the demo
DROP TABLE #MyHead
Send pretzels... I already have enough beer! 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
August 29, 2008 at 3:55 am
Thanks to both of you for your help with this. The server went down last night and has stayed down, so currently I can't test it, but it looks good. Thanks again,
Tom
August 29, 2008 at 5:55 am
Thanks for the feedback... and the pretzel!:P
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply