January 6, 2022 at 4:54 pm
Thanks Frederico_Fonseca - can you do one small test for me and set the LASTROW to 1001 and test. You will see why in my update below. If the first file fails it will confirm that it is due to something at the end of the file. That first file has 1,000 rows, the second 1001, and the other files that are failing have around 600 and 800
UPDATE - It is clearly someting at the very end of the file. If I set the LASTROW = 500 it will import all of the files (only 500 rows of course) And in my looking at the files I can see that the ones that work go to 1001 rows and I skip the first row. The others have less than 1001 rows.
So clearly it is something at the very end of the file that is causing it to fail.
January 6, 2022 at 5:12 pm
your issue is the blank lines at the end of the file - if this is always how you get them you have 3 options (in order of preference)
1 - process the file onto a staging/temp table with a single column - no delimiters
then use a spplitter like this one https://www.sqlservercentral.com/forums/topic/tally-oh-an-improved-sql-8k-%E2%80%9Ccsv-splitter%E2%80%9D-function/page/66 to split the columns - ignoring the header and the empty records
2 - process the file onto a staging table and count the valid rows - then do the same bulk insert but specifying the correct max row number (feasible if file not GIGANTIC)
3 - before loading file use any type of utility (fast one) to strip down empty lines from the file - then do the bulk insert
January 6, 2022 at 5:22 pm
If I use LASTROW= 1001 and make sure that each file has at least 1001 rows (by putting in fake data) everything works fine. The problem is I cant put in fake data and I cant make sure each file has 1001 rows. I could do it for the first five files but not the last one.
I have a stored procedure that takes all the files in a directory and imports them and puts the imported file name in another table. I followed an example online and can kind of follow along with what it is doing. However, I do not have enough skill to know if there is a way to maybe import each file to a simple temp table first, get the max rows, and then use that as a variable. Something like that seems like it might work.
====================
Just saw your response Federico, I see I am not too far off. I am thinking maybe #2 because I have tried deleting out the empty lines at the end of the file and it still does not import properly unless I specify the last row and there is at least that many rows. I opened each file in notepad ++ and deleted any line after the last data line and it would still not import (unless I specified the number of rows)
Here is my stored procedure (I have not found an alternative to xp_cmdshell but I have heard that is a security concern) It seems if i can get the max rows with data from the temp table y and add it to my parameters it should work.
USE [TestTracingImport]
GO
/****** Object: StoredProcedure [dbo].[import_dailytracing] Script Date: 1/6/2022 11:17:24 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER Procedure [dbo].[import_dailytracing] @filepath varchar(500), @pattern varchar(100), @tablename varchar(100)
As
Begin
Declare @Query varchar(1000), @moveProcessed varchar (1000)
Create TABLE #x(name varchar(200))
set @Query = 'master.dbo.xp_cmdshell "dir '+@filepath+@pattern+' /b"'
insert #x exec(@query)
Delete from #x where name is Null
Select identity(int,1,1) as ID, name into #y from #x
drop table #x
Declare @max1 int, @count1 int, @filename varchar (200)
set @max1 = (select max(ID) from #y)
set @count1 = 0
While @count1 <@max1
BEGIN
SET @count1 = @count1 +1
Set @filename = (select name from #y where [id] = @count1)
set @query = 'BULK INSERT '+@tablename+' FROM "'+@filepath+@filename+'" WITH (FIELDTERMINATOR = '','',FIRSTROW=2,LASTROW=1001,ROWTERMINATOR=''\r'')'
Exec (@query)
insert into Tracingfiles_logtable(filename) select @filename
set @moveProcessed = 'Move "'+@filepath+@filename+'"D:\TestTracingImport\Archive\"'+SUBSTRING(@filename, 1, LEN(@filename) - 4)+'_'+cast(Format(getdate(), 'yyyyMMddHHmmss') as varchar)+'.csv"'
Exec master..xp_cmdshell @moveProcessed
End
End
Then I run it by using
Exec import_dailytracing 'D:\TestTracingImport\', '*.csv', 'TestTracingImportSP'
I may need to do a bit of both 1 and 2 , to get the count. I am happy to dig and look for a way to do it but would appreciate some guidance on which method I should be DuckDuckgoing.
January 6, 2022 at 6:13 pm
try the following - untested
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER Procedure [dbo].[import_dailytracing] @filepath varchar(500), @pattern varchar(100), @tablename varchar(100)
As
Begin
-- these 2 should always be set in ALL sp's
-- set nocount on; - do not output rowcounts
-- set xact_abort on; --- if it fails on one step it does not continue executing
--
Declare @Query varchar(1000), @moveProcessed varchar (1000)
Create TABLE #x(name varchar(200))
set @Query = 'master.dbo.xp_cmdshell "dir '+@filepath+@pattern+' /b"' -- possibly change this so that the path+pattern are within doublequotes - in case in future the path contains spaces
insert #x exec(@query)
Delete from #x where name is Null
Select identity(int,1,1) as ID, name into #y from #x
drop table #x
if object_id('tempdb..#staging') is not null
drop table #staging
create table #staging
(
record varchar(8000) -- change as required to be a bit over what max can be
)
Declare @max1 int, @count1 int, @filename varchar (200), @validrows int
set @max1 = (select max(ID) from #y)
set @count1 = 0
While @count1 <@max1
BEGIN
SET @count1 = @count1 +1
Set @filename = (select name from #y where [id] = @count1)
-- truncate staging table and load file on to it so we can count non empty lines
-- note that this will still fail if there is an empty line on the middle of the file
truncate table #staging
set @query = 'BULK INSERT #staging FROM "'+@filepath+@filename+'" WITH (FIRSTROW=1, ROWTERMINATOR=''\r'')'
Exec (@query)
select @validrows = count(*)
from #staging
where record <> ''
if @validrows > 1 -- only load onto main table if there are valid records on the file
begin
set @query = 'BULK INSERT '+@tablename+' FROM "'+@filepath+@filename+'" WITH (FIELDTERMINATOR = '','',FIRSTROW=2,LASTROW=' + convert(varchar(20), @validrows) + ',ROWTERMINATOR=''\r'')'
Exec (@query)
end
insert into Tracingfiles_logtable(filename) select @filename
set @moveProcessed = 'Move "'+@filepath+@filename+'"D:\TestTracingImport\Archive\"'+SUBSTRING(@filename, 1, LEN(@filename) - 4)+'_'+cast(Format(getdate(), 'yyyyMMddHHmmss') as varchar)+'.csv"'
Exec master..xp_cmdshell @moveProcessed
End
End
January 6, 2022 at 8:28 pm
Hello, thank you very much for the help, I implemented the change and as long as I dont have xact_abort on (i guess set it later once everything is working) then it looks like it properly processes the first file but then stops. However, when you check the database table there is actually nothing there.
7 rows affected)
(1 row affected)
(6 rows affected)
(1004 rows affected)
Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Completion time: 2022-01-06T14:27:44.4339455-06:00
January 6, 2022 at 9:14 pm
I've added some print statements to the following - run it again and the filename before the error will be the one failing - would need to look at that one to identify what can be causing the issue
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER Procedure [dbo].[import_dailytracing] @filepath varchar(500), @pattern varchar(100), @tablename varchar(100)
As
Begin
-- these 2 should always be set in ALL sp's
-- set nocount on; - do not output rowcounts
-- set xact_abort on; --- if it fails on one step it does not continue executing
--
set xact_abort on;
Declare @Query varchar(1000), @moveProcessed varchar (1000)
Create TABLE #x(name varchar(200))
set @Query = 'master.dbo.xp_cmdshell "dir '+@filepath+@pattern+' /b"' -- possibly change this so that the path+pattern are within doublequotes - in case in future the path contains spaces
insert #x exec(@query)
Delete from #x where name is Null
Select identity(int,1,1) as ID, name into #y from #x
drop table #x
if object_id('tempdb..#staging') is not null
drop table #staging
create table #staging
(
record varchar(8000) -- change as required to be a bit over what max can be
)
Declare @max1 int, @count1 int, @filename varchar (200), @validrows int
set @max1 = (select max(ID) from #y)
set @count1 = 0
While @count1 <@max1
BEGIN
SET @count1 = @count1 +1
Set @filename = (select name from #y where [id] = @count1)
print 'processing filename ' + @filename
-- truncate staging table and load file on to it so we can count non empty lines
-- note that this will still fail if there is an empty line on the middle of the file
truncate table #staging
set @query = 'BULK INSERT #staging FROM "'+@filepath+@filename+'" WITH (FIRSTROW=1, ROWTERMINATOR=''\r'')'
Exec (@query)
set @validrows = 0
select @validrows = count(*)
from #staging
where record <> ''
print 'processing filename ' + @filename + ' row count = ' + convert(varchar(20), coalesce(@validrows, -1))
if @validrows > 1 -- only load onto main table if there are valid records on the file
begin
set @query = 'BULK INSERT '+@tablename+' FROM "'+@filepath+@filename+'" WITH (FIELDTERMINATOR = '','',FIRSTROW=2,LASTROW=' + convert(varchar(20), @validrows) + ',ROWTERMINATOR=''\r'')'
Exec (@query)
end
insert into Tracingfiles_logtable(filename) select @filename
set @moveProcessed = 'Move "'+@filepath+@filename+'"D:\TestTracingImport\Archive\"'+SUBSTRING(@filename, 1, LEN(@filename) - 4)+'_'+cast(Format(getdate(), 'yyyyMMddHHmmss') as varchar)+'.csv"'
Exec master..xp_cmdshell @moveProcessed
End
End
January 6, 2022 at 9:17 pm
First of all, please know I really appreciate the help I am being given on this. Not only am I getting a problem worked on but I am learning a lot.
here is the output, if i am not mistaken it is actually still getting hung up on the end of the first file and that is why it is not actually inserting anything in to the table.
(7 rows affected)
(1 row affected)
(6 rows affected)
processing filename TCIX-_Trace_1070220210700.csv
(1005 rows affected)
processing filename TCIX-_Trace_1070220210700.csv row count = 1005
Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Completion time: 2022-01-06T15:16:06.5059141-06:00
January 6, 2022 at 10:33 pm
ok. issue is the row separator
you defined it as \r - but it really is \n\r so the way the code is there will be a few records with hex 0A on them.
either change the separator to be \n or change the count check to be
select count(*) from #staging
where record <> ''
and record <> 0x0A (zeroxzeroA)
January 6, 2022 at 10:53 pm
I don't know if this matters but I deleted all the extra lines at the end of the file but with the last CRLF so there was one blank line after and it still did not work. However, once i deleted the last CRLF so that the last line of the file showed no EOL character in notepad++ it imported all of the files correctly
January 6, 2022 at 11:09 pm
Genius, Thanks so much Federico_Fonseca. That last edit did the trick. I took 6 files from today in which I did no edits and ran the import and it imported all of them.
You have gone above and beyond to be helpful, not only does my SP work now, I have some cool stuff to look at for learning purposes.
A great end to a long day, thanks so much for sticking with it.
January 6, 2022 at 11:49 pm
your most welcome and it was no bother at all to help on this.
and we all learn something everyday mainly when we help others.
January 7, 2022 at 11:20 am
Yep both the abov workd for me too
August 4, 2023 at 4:40 pm
Hi there,
I had the same thing happened to me and the following worked for me.
The ending of each lines in the .csv file might be "\n" and would need it to be "\r\n".
Open the .csv file in notepad,
hint: view>show symbols>show all characters
press ctrl+f and replace all "\n" by "\r\n".
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply