December 16, 2009 at 5:37 am
Hi to all,
I have recently joined at a firm to work as a Data Analyst and part of my daily job requires me to go thru a lot of data. I usually carry out most of my work on Excel as its enough for most of the work and occassionally use SQL if the data is reallyHuge.The Issue I face is that Till Now, i have imported files into SQL(I'm using SQL server 2008 Management Studio) from Excel sheets and had no problem,but now I have to import a text file of about 800 mb in size and When i Run the import/export wizard,I 'm greeted with an error that says 'Data Conversion failed'. I need help regarding this. Before i go into the problem itself please let me tell you:I am a Newbie to SQL.
Now,please allow me to describe my predicament in further detail.
I am trying to import a text file of about 800 mb in size that has data in the following format:
(A sample of data from the txt file)
Then I choose the Data Source As Flat file from the drop down and everything seem to be fine until I try to run the Package when the following error hits me in the face:
I've tried to change the data types for source and destination but of no use....... To be honest i don't even know which data types to assign to the columns.... The values of column 1 in the text file can go upto 1000000 or more.
Please try and help me..... without resolving this issue I'm stuck and everything has come to a standstill here... not many ppl to approach for help here either:-(
Thanks in advance to anyone willing to take a look.
December 16, 2009 at 5:48 am
Can u please provide some data in text file? And let me know what is code page you are using?
December 16, 2009 at 6:17 am
The data in the text file is something like this:
COLUMN0 COLUMN1
yahoo510297
kijiji376608
facebook login350253
gmail313800
windows live292609
img wlxrs286278
hotmail.com272570
you tube263347
http://www.facebook.com252276
google.ca237275
yahoo mail214191
microsoft messenger190257
nbcsports msnbc186310
msn184693
image xboxlive181117
ebay175051
yahoo.ca146628
plenty of fish134887
google maps134701
canada 411134323
craigslist129121
yahoo.com128963
google.com128295
face book126520
youtube.com117249
mapquest116769
plentyoffish111863
img2 wlxrs96811
porn93996
westjet91128
air canada88518
mls.ca88008
As you can see the data in the 'COLUMN1' is striclty numbers(+ve Integers) while the data in 'COLUMN0' can be anything(text,symbols,numbers etc.)
..........and the data is tab-delimited.
December 16, 2009 at 8:09 am
Would it be possible to attach an extract from the actual file you're having difficulties with? A hundred or so records in the actual format the import is attempting to handle would useful.
Typing the data yourself no doubt removes any of the problem data the import is encountering.
BrainDonor
December 16, 2009 at 8:12 am
I had a similar problem. In regard to the truncation errors, what I initially did was create a staging table and set all my character fields to TEXT. That way, I was intentionally making them larger than my data so there'd be no truncation errors. As I gained more experience, I eventually learned how to use BCP to elegantly import text using proper column types and without using a staging table.
As for your issue, I have two thoughts:
Just my $0.02 . . .
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 8:36 am
ok. I will provide a sample.Actually I cannot open the original file because its about 830 MB in size and my notepad can barely handle it.I have tried opening it but it takes an eon for the notepad to respond and still doesn't open the file,So I have imported the data into Excel and though I couldn't import all of the data (as there's a Row-Limit in Excel),A large amount of it got imported and now I'm pasting over a 100 rows from the Excel sheet:
chfi5985
alc5971
garth brooks announces he's coming out of retirement thursday5968
gamespot5950
osap5944
canada5938
go transit5937
saskatoon star phoenix5934
192.168.1.15920
e-bay5915
cibc personal banking5913
starfall5913
charity village5909
windows update5886
google.5881
kelli carpenter5875
gogole5839
6495831
future shop.ca5828
680 news5823
superstore5811
"berlin heart" + charles tillman + washington5797
disney5795
funny games5790
toyota5774
rogers yahoo mail5773
canada.com5771
job bank canada5761
750-pound shark5747
flight status5742
flyerland.ca5740
great west life5732
literotica.com5729
toronto weather5723
gthl5708
yahoo email5696
avon.ca5694
cat food diet5691
yahoo answers5682
torontostar5681
allegiant air5676
no frills5676
http://www.f5675
ford canada5674
faceboo5672
hoymail5669
file extension pps5668
big boobs5658
scotia online5650
globeandmail5640
future shop canada5638
waiter jon-barrett ingels5630
cbs5628
rosie o'donnell kelli carpenter5628
rogers video5627
ikea canada5620
cam4.com5616
download.com5602
the province5593
canadapost5584
lotteries5584
kate gosselin5568
nasza klasa5555
southwest airlines5552
days of our lives5549
mass murderer dies5548
kate beckinsale5547
blackberry5510
facebook applications5500
halloween costume ideas5496
perezhilton5496
plenty of fish in the sea5495
surf the channel5487
google news5486
justintv5480
rogers.yahoo.com5427
aol mail5426
yahoo fantasy hockey5412
cbc sports5410
img microsoft5408
maria shriver apologizes5408
tim hortons5402
farmville5395
party packagers5395
rogers.ca5385
toyota canada5384
royalbank.com5380
yotube5376
caa5362
adam lambert5357
pbskids5342
the hun5338
sex videos5332
rona.ca5331
meteomedia5324
q1075322
webmail5314
silverdaddies5312
home depot.ca5302
soupy sales5300
vancity5300
tv shack5299
coronation street5298
dr oz5297
localtvmatters.ca5291
ups tracking5291
servicecanada5289
i3 microsoft5283
sarnia observer5276
shaw cable5274
slutload5267
new york times5254
virgin mobile5242
om5241
tdcanadatrust.ca5241
'jon & kate'5237
avg free download5236
hotwire5214
mail5212
December 16, 2009 at 9:44 am
Sheez . . . that's quite the variety of data.
Here's how I'd tackle it . . .
One thing to bear in mind: the single quotes (') in the first column will likely cause problems, because T-SQL makes extensive use of them.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 9:49 am
One other thought (speaking of stray characters): data cleansing is always a good thing. For example, since you're importing from Excel, it's probably using the TAB character as a delimiter. If any TAB characters exist in your first column, it's going to to halt your import.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 9:54 am
@ray K
you mean change the column data types for the 'Data Source'??? coz I can't.... the defaults are as follows:
Column 0 - string [DT_STR]
Column 1 - single-byte signed integer [DT_I1]
....and they remain the same after pressing the 'Suggest types'....
and there're no varchar or nvarchar choices available in the drop-down..... am using import/export wizard BTW......
December 16, 2009 at 11:51 am
No. What I'm saying is change the destination to VARCHAR(MAX). That would make the column large enough to be handle variable-length character data to whatever the upper limit is (I don't remember the limit off the top of my head). That should be enough to accommodate that column and take care of any truncation issues.
Also (and if I'm mistaken with this, someone please correct me), I believe VARCHAR(MAX) is flexible enough to be able to handle different types of characters.
At the same time, you also need to make sure your data actually can be imported. I had a situation where I was importing log files, and the format was changed in the middle of a month (specifically, an extra column was added). I had to make an adjustment in order to accommodate the extra column. But because these were log files, I wasn't allowed to make any edits to the data source files themselves.
Your situation, however, looks like a different case. Your column contains inconsistent data, and I'm going to assume that you can safely edit the data source without compromising anything. Strange data characters (like the TAB character I mentioned before, for example) can stop an import in its tracks. That's what data cleansing is: making sure your data is in a consistent format so that it's usable by the database.
I also mentioned the single-quote; that shouldn't be too much of a problem with imports (unless you're using it as a text qualifier), but it could be an issue with INSERTS, UPDATES, and SELECTS. (For example: "select * from TABLE where COLUMN = 'O''Brien'" -- note that you need two single quotes, because one single quote would result in an error.)
Hope this helps. Good luck.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 16, 2009 at 11:54 am
I just remembered something . . .
VARCHAR(MAX) is equivalent to the TEXT type in SQL Server. You should change the column type to TEXT. However, Microsoft is depreciating this data type, and they suggest you should use VARCHAR(MAX) instead.
+--------------------------------------------------------------------------------------+
Check out my blog at https://pianorayk.wordpress.com/
December 18, 2009 at 3:54 am
I've taken the data you posted and turned it into something that may actually resemble your data. Because you just copied it into your message it was one column, so I have attached a spreadsheet here that has two columns.
In future try to include the precise data in the correct format so we're all working from the same starting point. It isn't always easy, but I also find that going through that process sometimes helps me find the answer before I complete the post.
There's an article about this at: http://www.sqlservercentral.com/articles/Best+Practices/61537/.
The code I have pasted below is a very basic import from an excel spreadsheet into a temporary table. When you run this I would like to know if it gives you any errors. We should then be able to work from this starting point.
I understand that you have very little SQL knowledge, so we'll keep it as simple as possible.
The core of it creates a temporary table with two varchar columns of the maximum size that SS 2008 can create.
Then it uses the OPENROWSET command to import from the spreadsheet (you can change the filename and location to whatever you require). This should import the data from the first sheet of that spreadsheet into the temporary table.
The SELECT statement that follows is just to show whatever is in that temp table.
With the number of records in your spreadsheet it may take a while. I've never tried this method for any reason other than curiosity, so I couldn't begin to guess.
When you run this code you may get a message along the lines of:
"SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online. "
If you do see this then need to also run the code that is commented out. Remove the '--' from the start of each line. This changes the SQL Server configuration to allow the OPENROWSET command to run, and then removes the permission again at the end.
The temporary table is just while we're messing about. You can create a permanent table instead of course.
--drop table #ExcelImport
--sp_configure 'show advanced options', 1
--reconfigure
--go
--sp_configure 'Ad Hoc Distributed Queries', 1
--reconfigure
CREATE TABLE #ExcelImport(
ImportTextVarChar(MAX),
ImportNumberVarChar(MAX))
INSERT INTO #ExcelImport(ImportText,ImportNumber)
SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\TEMP\Imports\TestImport.xls;HDR=YES',
'SELECT * FROM [Sheet1$]')
SELECT * FROM #ExcelImport
--sp_configure 'show advanced options', 1
--reconfigure
--go
--sp_configure 'Ad Hoc Distributed Queries', 0
--reconfigure
December 21, 2009 at 2:35 am
>RayK
Hi,
Thank you very Much for your suggestions:)
>Braindonor
Hi,
Firstly,Thank you very much for your interest and patience.One couldn't have approached my problem in a better way,I suppose,and I appreciate it.
Secondly, I've read the Forum Posting Ettiquette and I believe I now have a clue as to how to ask for help on a professional forum.Thanks,I'll make sure i follow them in the future:)
Now--I've tried the code you have provided and was the greeted by the error you already predicted,So I removed the Comments from the code and tried to execute it and encountered the following error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'sp_configure'.
Msg 102, Level 15, State 1, Line 19
Incorrect syntax near 'show advanced options'.
Msg 15123, Level 16, State 1, Procedure sp_configure, Line 51
The configuration option 'Ad Hoc Distributed Queries' does not exist,
or it may be an advanced option.
So,I also made a few searches for info about Surface Area Configuration and found that i can tweak the settings here: Server-->facets-->-->Surace area config. and While I couldn't find the 'Ad Hoc Distributed Queries' option, there was one which read 'AdHocRemoteQueriesEnabled' which was 'False' and I've turned it to 'true' and tried to execute the program and it popped the following error msg:
OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "(null)"
returned message "Unspecified error".
Msg 7303, Level 16, State 1, Line 14
Cannot initialize the data source object of OLE DB provider
"Microsoft.Jet.OLEDB.4.0" for linked server "(null)".
I guess both those options are not obviously interchangeable,after all....
So, after that I've searched how to enable 'Ad Hoc Distributed Queries' and found some info here:
http://msdn.microsoft.com/en-us/library/ms187569.aspx
However, it was the same as what you have written in the code you provided.So, I don't understand why:
1. it throws up a Syntax error when i try to exec ur code.
2.why I can't see the 'Ad Hoc Distributed Queries' option itself in the Surface Area Cofig. window.
December 21, 2009 at 3:39 am
You were right - AdHocRemoteQueriesEnabled is the option you wanted. To see what the settings are for the various options you can run
SELECT * FROM sys.configurations
ORDER BY name
and you should now see 'Ad Hoc Distributed Queries' with a value of 1.
The only reference I've found for the Jet error message is at http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/9605444f-a6e1-40ab-be9f-f40362672039, which then points to http://support.microsoft.com/default.aspx?scid=kb;en-us;239114 for the latest version of the Jet database engine.
I would suggest taking the steps listed in it to check you have the latest files installed. Be careful though - there are different downloads depending upon which version of Windows you are using. If in doubt and you have PC support staff where you are, ask them.
BrainDonor
December 22, 2009 at 11:06 pm
Hi folks,
>braindodnor: Thank you Braindonor:-)
I have finally got around the problem of importing the data into SQL db from the text file.Here's what I did,I split the 800 MB file into 30 smaller text files and tried to upload each one and when they failed to import again,I did a little fiddling around and found that the whole problem was with the text file 'ENCODING'...... all of them (including the acual 800MB original) were in ANSI text format and my sql import/export wizard accepted only 'UNICODE'. So was able to import some files after changing their encoding from ANSI to UNICODE.
Now I've got two simple questions:
1.)I know there are a few editors out there that can open very large text files(PFE,Gvim etc.),but is there one that would let me open my 800 meg file(ANSI) and save it as a 'UNICODE' text file?? . . .I haven't seen that option in any of the text editors I've used so far:-(
or
2.)I have 30 text files(of abt. 25 MB each) and each time I import one of them a NEW table is being created in the SQL db,So my question is : Is there any way to Append/Combine the tables after/before Importing so I would have one big table instead of 30 smaller ones???
Can someone help me???Any help appreciated.
Thanks in Advance.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply