May 29, 2013 at 2:45 am
Hi Experts,
i have a text file with 1 -3 lakh records and 5 columns or so.... each column is seperated by tab delimiter.
i need to import this file to SQLSERVER 2008.
while importing this text file into SQL server, the first column in text file has 10 characters string
i need to split this string into 4 separate strings of fixed lengths (like1st string -2char, 2nd string - 2char, 3rd string - 3char, 4th string -- remaining chars) and insert it into 4 different columns in sql server.
example i have a string in first column like INAPHYD00001 when i import this into sql server it should split like
IN ------to be inserted into Country column in sql server.
AP -----to be inserted into State column in sql server.
HYD ----to be inserted into City column in sql server.
00001 --to be inserted into LocalityID column in sql server.
i heard i can use Bulk Copy but i'm not sure how to use it and also how to prepare a format file for this requirement.
please help me in this regard.
Thanks
Kishore
May 29, 2013 at 8:57 am
I'd take the simple approach, use bcp to import into a 5 column staging table, then use T-SQL to select from the 5 column staging table into your destination, that way you don't need bcp to do any splitting of the first column into multiple columns, you can do that in T-SQL.
I would do it in two steps. First I assume you will ALWAYS have 5 columns (disregarding splitting the first column).
Inserting into table TEST that I created like so:
CREATE TABLE TEST
(
COL1 VARCHAR(50),
COL2 VARCHAR(50),
COL3 VARCHAR(50),
COL4 VARCHAR(50),
COL5 VARCHAR(50)
)
[/code]
Create format file. Default delimiter is tab anyways, so I don't have to specify that. Do this at the windows cmd prompt.
bcp MYDATABASENAME.dbo.test format nul -T -c -f MYNEWFORMATFILE.FMT -S "MYSERVER"
Do the bcp import also at the windows cmd prompt.
bcp MYDATABASENAME.dbo.test in importfile.txt -T -f MYNEWFORMATFILE.FMT -S "MYSERVER"
Then I now hopefully have my "staging" table populated. Notice both bcp commands use "-T" implying a trusted connection, ie., windows authentication.
I can then populate my destination table. I have created one for testing like so:
CREATE TABLE TEST2
(
Country VARCHAR(50),
State VARCHAR(50),
City VARCHAR(50),
LocalityID VARCHAR(50),
COL2 VARCHAR(50),
COL3 VARCHAR(50),
COL4 VARCHAR(50),
COL5 VARCHAR(50)
)
Then populate the destination with a select against the "staging" table like so:
INSERT INTO TEST2
(
Country,
State,
City,
LocalityID,
COL2,
COL3,
COL4,
COL5
)
SELECT
SUBSTRING(COL1,1,2),
SUBSTRING(COL1,3,2),
SUBSTRING(COL1,5,3),
SUBSTRING(COL1,8,LEN(COL1) - 7),
COL2,
COL3,
COL4,
COL5
FROM TEST
The downside to this approach is that if your import text file is sloppy in any way, like an unpredictable number of tab delimitted columns, then bcp may not work as expected and you'll have to account for this. But if you've got a solid and reliable format of import table, this is a straightforward method using stock bcp features.
An interesting challenge would be to account for sloppy import files. I already do this with a vb snippet that rewrites the import file to ALWAYS have a fixed number of delimitted columns, and for delimitted columns in excess of the specified number, delimits the extra columns into the last column with a different delimitter. But that solution would be obvious to even a a vb programmer so I won't post that there. How could we do that in T-SQL? I'm guessing with Jeff Moden's splitter!
May 29, 2013 at 2:16 pm
I don't really know the proper use of split8k but I thought I'd open myself up to ridicule anyways!
Load Jeff Moden's function from http://www.sqlservercentral.com/articles/Tally+Table/72993/, understanding it is up to you !
CREATE TABLE TEST
(
LINEIN VARCHAR(1000)
)
windows cmd prompt for format creation:
bcp MYDATABASE.dbo.test format nul -T -c -f TEST_LINE.FMT -S "MYSERVER"
windows cmd prompt import:
bcp MYDATABASE.dbo.test in test_import.txt -T -f TEST_LINE.FMT -S "MYSERVER"
Insert into your destination table from staging table TEST:
INSERT INTO TEST2
(
Country,
State,
City,
LocalityID,
COL2,
COL3,
COL4,
COL5
)
SELECT SUBSTRING(ITEM1,1,2),
SUBSTRING(ITEM1,3,2),
SUBSTRING(ITEM1,5,3),
SUBSTRING(ITEM1,8,LEN(ITEM1) - 7),
ITEM2,
ITEM3,
ITEM4,
SPLIT.ITEM ITEM5
FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, SPLIT.ITEM ITEM4 FROM
(
SELECT LINEIN, ITEM1, ITEM2, SPLIT.ITEM ITEM3 FROM
(
SELECT LINEIN, ITEM1, SPLIT.ITEM ITEM2 FROM
(
SELECT LINEIN, SPLIT.ITEM ITEM1
FROM TEST
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 1
)
APPLY1
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 2
)
APPLY2
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 3
)
APPLY3
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 4
)
APPLY4
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 5
Probably an abuse of a perfectly good function!
May 30, 2013 at 12:23 am
Thanks a ton patrick,
yes as you said there are sloppy files. i.e., there are few files for which columns are separated by many number of spaces or many tab's and column number also might vary.
im stuck with this,
as splitting the first column went well with ur explanation, how to identify the next columns in order to insert using format file as it uses tab delimiter.
these text files do not have Data Definition(column headings) either.
please guide me with this scenario.
May 30, 2013 at 7:06 am
I know what you mean, imports can be messy and its helpful to brainstorm about the problems.
One method I like to do is to just get the text imported into SQL Server table so I can then start examining the data to design a better import, and design rules that I can then use to refine the import. Since you are ok with how to split a column, lets just assume that you're good to go there, and just produce a generic import with up to 9 columns that also includes the source line.
Again we start with our staging table:
CREATE TABLE TEST
(
LINEIN VARCHAR(1000)
)
And also load Jeffs fine function!
http://www.sqlservercentral.com/articles/Tally+Table/72993/, its very useful to also understand whats going on. Now I know the "stacked applies" method I'm suggesting is a bit wasteful, but right now I can't come up with an alternative (except for what I usually do, a similar method with a vb.net helper program that pretty much does the same thing except BEFORE the bcp).
windows cmd prompt for format creation as we've seen:
bcp MYDATABASE.dbo.test format nul -T -c -f TEST_LINE.FMT -S "MYSERVER"
windows cmd prompt import:
bcp MYDATABASE.dbo.test in test_import.txt -T -f TEST_LINE.FMT -S "MYSERVER"
This time lets create our staging table with the most columns we expect, I'm picking 9 ! Lets also include a column that has our original import data so we can check it out along with the columns that got split out!
CREATE TABLE IMPORT_TO_VIEW
(
LINEIN VARCHAR(4000),
COL1 VARCHAR(200),
COL2 VARCHAR(200),
COL3 VARCHAR(200),
COL4 VARCHAR(200),
COL5 VARCHAR(200),
COL6 VARCHAR(200),
COL7 VARCHAR(200),
COL8 VARCHAR(200),
COL9 VARCHAR(200)
)
Ok, the trick is to import as many columns as we can, but still get the lines that don't have 5 or whatever columns. This one imports up to 9 columns. Note the innermost select appends redundant tab characters to insure we always have enough columns so we don't miss any rows!
INSERT INTO IMPORT_TO_VIEW
(
LINEIN,
COL1,
COL2,
COL3,
COL4,
COL5,
COL6,
COL7,
COL8,
COL9
)
SELECT LINEIN, ITEM1, ITEM2, ITEM3, ITEM4, ITEM5, ITEM6, ITEM7, ITEM8, SPLIT.ITEM ITEM9 FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, ITEM4, ITEM5, ITEM6, ITEM7, SPLIT.ITEM ITEM8 FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, ITEM4, ITEM5, ITEM6, SPLIT.ITEM ITEM7 FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, ITEM4, ITEM5, SPLIT.ITEM ITEM6 FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, ITEM4, SPLIT.ITEM ITEM5 FROM
(
SELECT LINEIN, ITEM1, ITEM2, ITEM3, SPLIT.ITEM ITEM4 FROM
(
SELECT LINEIN, ITEM1, ITEM2, SPLIT.ITEM ITEM3 FROM
(
SELECT LINEIN, ITEM1, SPLIT.ITEM ITEM2 FROM
(
SELECT LINEIN, SPLIT.ITEM ITEM1
FROM
(
-- APPENDING EXTRA TAB CHARACTERS INSURES YOU ALWAYS HAVE ENOUGH DELIMITTED FIELDS FOR THE
-- CROSS APPLY TO WORK SO YOU DON'T MISS ROWS!
SELECT LINEIN + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) + CHAR(9) LINEIN FROM TEST
)
TESTIN
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 1
)
APPLY1
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 2
)
APPLY2
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 3
)
APPLY3
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 4
)
APPLY4
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 5
)
APPLY5
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 6
)
APPLY6
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 7
)
APPLY7
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 8
)
APPLY8
CROSS APPLY dbo.DelimitedSplit8k(LINEIN,CHAR(9)) SPLIT
WHERE ITEMNUMBER = 9
Now you have an import split into columns that you can examine and try to develop some rules for!
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply