How to read first line of each text file in a folder and validate

  • This approach will need to be refined to suit your requirements but it gives you the gist of a possible solution.

    You can identify the files you need using an initial CTE based on xp_dirtree undocumented stored procedure. Using OPENROWSET allows you to convert xp_dirtree into a SELECT statement so you can use WHERE [File] = 1 to ensure that only files are selected. You could apply an additional condition to this filter to narrow down the files you want, if necessary. Now that you have the list of files in the cte, you can build up a UNION query string by using OPENROWSET BULK and FOR XML PATH. You will need a simple format file BulkFile.fmt which has a single field called BulkColumn defined as SQLCHAR 8000. By using LASTROW = 1, you will only get the header information selected in the OPENROWSET BULK statement. In this way, you obtain a dataset containing your filenames and the corresponding headers. You can then insert this data into a table, @t in this example. By iterating through this table based on your validation criteria, files which meet your criteria are copied to your validation folder. Since each copied file is sequentially removed from @t, this leaves behind a list of the failed files in @t for review.

    DECLARE @SQL VARCHAR(MAX)

    DECLARE @t TABLE (FileName VARCHAR(100), Header VARCHAR(8000))

    DECLARE @FileName VARCHAR(100)

    ;WITH cte AS

    (

    SELECT subdirectory FROM OPENROWSET('SQLNCLI','SERVER=.;TRUSTED_CONNECTION=YES','SET FMTONLY OFF EXEC xp_dirtree ''C:\YourFolder'', 1, 1') WHERE [File] = 1

    )

    SELECT @SQL = STUFF((SELECT 'UNION ALL SELECT ''' + subdirectory + ''' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK ''C:\YourFolder\' + subdirectory + ''', FORMATFILE = ''C:\BulkFile.fmt'', LASTROW = 1) AS Z' + CHAR(10) FROM cte FOR XML PATH(''), TYPE).value('.[1]', 'varchar(MAX)'), 1, 10, '')

    INSERT @t EXEC(@SQL)

    WHILE EXISTS (SELECT * FROM @t WHERE Header = 'Title Forename Surname FullName JobTitle Department Organisation Address1')

    BEGIN

    SELECT TOP 1 @FileName = FileName FROM @t WHERE Header = 'Title Forename Surname FullName JobTitle Department Organisation Address1'

    EXEC('xp_cmdshell ''Copy "C:\YourFolder\' + @FileName + '" "C:\YourFolder\YourValidatedFolder"''')

    DELETE @t WHERE FileName = @FileName

    END

    SELECT FileName, Header FROM @t

  • thisisdeepthy (8/20/2010)


    cAN YOU PLEASE ALSO ADD THE SCRIPT WITH THIS TEXT VALIDATION IN THE CURSOR ? AND ALSO SLIGHT CHANGES IN THE REQ, NEED TO MOVE THE VALID FILES TO THE DB TABLE AND THE REJECTED TO LEAVE IN THE SAME FOLDER

    What is the reason for yelling at us in general and Eralper specifically?

    This is not a "spoon-feeding-solve-my-issue-immediately-for-free" site. It's a community of professionals helping each other to understand how coding SQL Server can be done efficiently.

    If you want a script as per your need without doing much by yourself, get a consultant and pay him for as long as it takes until you're satisfied.

    But stop yelling at volunteers, especially if you already got a solution that meets your requirement approx. 90%. You should be able to do the remaining part.

    @Eralper: great job, given the limited information provided...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thank you SSC Veteran. This works. Thanks a lot

  • steve-893342 (8/20/2010)


    This approach will need to be refined to suit your requirements but it gives you the gist of a possible solution.

    You can identify the files you need using an initial CTE based on xp_dirtree undocumented stored procedure. Using OPENROWSET allows you to convert xp_dirtree into a SELECT statement so you can use WHERE [File] = 1 to ensure that only files are selected. You could apply an additional condition to this filter to narrow down the files you want, if necessary. Now that you have the list of files in the cte, you can build up a UNION query string by using OPENROWSET BULK and FOR XML PATH. You will need a simple format file BulkFile.fmt which has a single field called BulkColumn defined as SQLCHAR 8000. By using LASTROW = 1, you will only get the header information selected in the OPENROWSET BULK statement. In this way, you obtain a dataset containing your filenames and the corresponding headers. You can then insert this data into a table, @t in this example. By iterating through this table based on your validation criteria, files which meet your criteria are copied to your validation folder. Since each copied file is sequentially removed from @t, this leaves behind a list of the failed files in @t for review.

    DECLARE @SQL VARCHAR(MAX)

    DECLARE @t TABLE (FileName VARCHAR(100), Header VARCHAR(8000))

    DECLARE @FileName VARCHAR(100)

    ;WITH cte AS

    (

    SELECT subdirectory FROM OPENROWSET('SQLNCLI','SERVER=.;TRUSTED_CONNECTION=YES','SET FMTONLY OFF EXEC xp_dirtree ''C:\YourFolder'', 1, 1') WHERE [File] = 1

    )

    SELECT @SQL = STUFF((SELECT 'UNION ALL SELECT ''' + subdirectory + ''' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK ''C:\YourFolder\' + subdirectory + ''', FORMATFILE = ''C:\BulkFile.fmt'', LASTROW = 1) AS Z' + CHAR(10) FROM cte FOR XML PATH(''), TYPE).value('.[1]', 'varchar(MAX)'), 1, 10, '')

    INSERT @t EXEC(@SQL)

    WHILE EXISTS (SELECT * FROM @t WHERE Header = 'Title Forename Surname FullName JobTitle Department Organisation Address1')

    BEGIN

    SELECT TOP 1 @FileName = FileName FROM @t WHERE Header = 'Title Forename Surname FullName JobTitle Department Organisation Address1'

    EXEC('xp_cmdshell ''Copy "C:\YourFolder\' + @FileName + '" "C:\YourFolder\YourValidatedFolder"''')

    DELETE @t WHERE FileName = @FileName

    END

    SELECT FileName, Header FROM @t

    NICELY DONE! Very nice!

    Just a couple of tips (or thoughts or whatever you want to call them)...

    Instead of using OPENROWSET, you could use BULK INSERT with no field delimiter (just a row terminator) to eliminate the need for a format file.

    Also, instead of a DOS COPY and DELETE combination, you could just use MOVE instead.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff, yes it is rather annoying having to create a format file for OPENROWSET BULK, albeit a very simple one in this case. I wonder why that is when you don't necessarily have to for BULK INSERT.

  • I did mock up a BULK INSERT variation but, whilst eliminating the need for a format file, at the same time the specter of RBAR 🙁 was introduced. Or was it RBR since it wasn't too agonizing? I'm not sure :unsure:

  • I tried solving using OLE objects this time.

    Created a procedure called validateheader to validate the header of each file with a template file.

    create table validate_table (validate varchar(max), FileName1 varchar(100),FileName2 varchar(100) )

    select * from validate_table

    --- Now Create the procedure

    ALTER PROCEDURE ValidateHeader

    @FileName1 VARCHAR (1024),

    @FileName2 VARCHAR (1024)

    AS

    DECLARE @OLEResult INT

    DECLARE @FS INT

    DECLARE @FileID INT

    DECLARE @Message VARCHAR (8000)

    DECLARE @out_msg varchar (8000)

    DECLARE @OLEResult2 INT

    DECLARE @FS2 INT

    DECLARE @FileID2 INT

    DECLARE @Message2 VARCHAR (8000)

    DECLARE @out_msg2 varchar (8000)

    -- Create an instance of the file system object

    EXECUTE @OLEResult = sp_OACreate 'Scripting.FileSystemObject', @FS OUT

    IF @OLEResult <> 0

    BEGIN

    PRINT 'Scripting.FileSystemObject'

    PRINT 'Error code: ' + CONVERT (VARCHAR, @OLEResult)

    END

    -- Open the text file for reading

    EXEC @OLEResult = sp_OAMethod @FS, 'OpenTextFile', @FileID OUT, @FileName1, 1, 1

    IF @OLEResult <> 0

    BEGIN

    PRINT 'OpenTextFile'

    PRINT 'Error code: ' + CONVERT (VARCHAR, @OLEResult)

    END

    -- Read the first line into the @Message variable

    EXECUTE @OLEResult = sp_OAMethod @FileID, 'ReadLine', @Message OUT

    ---------------

    -- Create an instance of the file system object

    EXECUTE @OLEResult2 = sp_OACreate 'Scripting.FileSystemObject', @FS2 OUT

    IF @OLEResult2 <> 0

    BEGIN

    PRINT 'Scripting.FileSystemObject'

    PRINT 'Error code: ' + CONVERT (VARCHAR, @OLEResult2)

    END

    -- Open the text file for reading

    EXEC @OLEResult2 = sp_OAMethod @FS2, 'OpenTextFile', @FileID2 OUT, @FileName2, 1, 1

    IF @OLEResult2 <> 0

    BEGIN

    PRINT 'OpenTextFile'

    PRINT 'Error code: ' + CONVERT (VARCHAR, @OLEResult2)

    END

    -- Read the first line into the @Message variable

    EXECUTE @OLEResult2 = sp_OAMethod @FileID2, 'ReadLine', @Message2 OUT

    if ((@OLEResult2 = @OLEResult) and (@Message = @Message2))

    BEGIN

    set @out_msg =@Message

    set @out_msg2=@Message2

    --PRINT 'FileHeader Matched-- SUCCESS '+@out_msg2

    insert into validate_table values('FileHeader Matched-- SUCCESS ',@FileName1,@FileName2)

    END

    if ((@Message <> @Message2) OR (@OLEResult2 <> @OLEResult) )

    BEGIN

    set @out_msg =@Message

    set @out_msg2=@Message2

    --PRINT 'FileHeader Not Matched --REJECT '+ @out_msg2

    insert into validate_table values('FileHeader Not Matched --REJECT ',@FileName1,@FileName2)

    END

    ---------------

    -- Keep looping through until the @OLEResult variable is < 0; this indicates that the end of the file has been reached.

    WHILE @OLEResult >= 0

    BEGIN

    --PRINT @Message

    EXECUTE @OLEResult = sp_OAMethod @FileID, 'ReadLine', @Message OUT

    END

    EXECUTE @OLEResult = sp_OADestroy @FileID

    EXECUTE @OLEResult = sp_OADestroy @FS

    ---------------

    -- Keep looping through until the @OLEResult2 variable is < 0; this indicates that the end of the file has been reached.

    WHILE @OLEResult2 >= 1

    BEGIN

    --PRINT @Message2

    EXECUTE @OLEResult2 = sp_OAMethod @FileID2, 'ReadLine', @Message2 OUT

    END

    EXECUTE @OLEResult2 = sp_OADestroy @FileID2

    EXECUTE @OLEResult2 = sp_OADestroy @FS2

    ---------------

    -- End of Procedure.

    and created a cursor to call the procedure

    declare @files table (ID int IDENTITY, FileName varchar(100))

    insert into @files execute xp_cmdshell 'dir C:\Digitdb\Prospects\ /b'

    select * from @files

    -- then read the each file name from the table using a cursor and pass it to the ReadFromTextFile proc

    declare @STR_fname VARCHAR(100)

    declare @templ_fname varchar(100)

    DECLARE fn_cursor CURSOR FOR

    SELECT FileName

    FROM @files

    set @templ_fname='C:\Digitdb\prospects1\Template.txt'

    OPEN fn_cursor

    set @STR_fname=''

    FETCH NEXT FROM fn_cursor INTO @STR_fname

    WHILE @@FETCH_STATUS = 0

    BEGIN

    begin

    EXEC ValidateHeader @templ_fname,@STR_fname

    end

    FETCH NEXT FROM fn_cursor INTO @STR_fname

    print 'temp file : next file to read'+@templ_fname+' tally filename :'+@STR_fname

    END

    CLOSE fn_cursor

    DEALLOCATE fn_cursor

    but my validate_table is returning with 'fileheader not matched reject file' for the file in the prospects.

    Any suggestions please?

  • steve-893342 (8/23/2010)


    I did mock up a BULK INSERT variation but, whilst eliminating the need for a format file, at the same time the specter of RBAR 🙁 was introduced. Or was it RBR since it wasn't too agonizing? I'm not sure :unsure:

    How'z that? you're reading the LastRow = 1 for each file no matter which method you're using.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, at least with the BULK INSERT method I came up with, I first created 2 staging tables. Then for each each file, I needed to BULK INSERT into the first staging table, then INSERT into the second staging table and finally TRUNCATE the first staging table. This is done on a row by row basis which leads to multiple statements. Conversely the OPENROWSET BULK method resolves into a single UNION ALL SELECT statement. Maybe I'm missing something and there's a better way of doing the BULK INSERT :ermm:

    DECLARE @SQL VARCHAR(MAX)

    --BULK INSERT Method

    IF NOT OBJECT_ID('tempdb.dbo.#t', 'U') IS NULL DROP TABLE #t

    CREATE TABLE #t (Header VARCHAR(8000))

    IF NOT OBJECT_ID('tempdb.dbo.#t2', 'U') IS NULL DROP TABLE #t2

    CREATE TABLE #t2 (FileName VARCHAR(100), Header VARCHAR(8000))

    ;WITH cte AS

    (

    SELECT subdirectory FROM OPENROWSET('SQLNCLI','SERVER=.;TRUSTED_CONNECTION=YES','SET FMTONLY OFF EXEC xp_dirtree ''C:\YourFolder'', 1, 1') WHERE [File] = 1

    )

    SELECT @SQL = (SELECT 'BULK INSERT #t FROM ''C:\YourFolder\' + subdirectory + ''' WITH (LASTROW = 1) INSERT #t2 SELECT ''' + subdirectory + ''', Header FROM #t TRUNCATE TABLE #t' + CHAR(10) FROM cte FOR XML PATH(''), TYPE).value('.[1]', 'varchar(MAX)')

    PRINT(@SQL)

    --eg

    BULK INSERT #t FROM 'C:\YourFolder\File1.txt' WITH (LASTROW = 1) INSERT #t2 SELECT 'File1.txt', Header FROM #t TRUNCATE TABLE #t

    BULK INSERT #t FROM 'C:\YourFolder\File2.txt' WITH (LASTROW = 1) INSERT #t2 SELECT 'File2.txt', Header FROM #t TRUNCATE TABLE #t

    BULK INSERT #t FROM 'C:\YourFolder\File3.txt' WITH (LASTROW = 1) INSERT #t2 SELECT 'File3.txt', Header FROM #t TRUNCATE TABLE #t

    BULK INSERT #t FROM 'C:\YourFolder\File4.txt' WITH (LASTROW = 1) INSERT #t2 SELECT 'File4.txt', Header FROM #t TRUNCATE TABLE #t

    --OPENROWSET BULK Method

    ;WITH cte AS

    (

    SELECT subdirectory FROM OPENROWSET('SQLNCLI','SERVER=.;TRUSTED_CONNECTION=YES','SET FMTONLY OFF EXEC xp_dirtree ''C:\YourFolder'', 1, 1') WHERE [File] = 1

    )

    SELECT @SQL = STUFF((SELECT 'UNION ALL SELECT ''' + subdirectory + ''' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK ''C:\YourFolder\' + subdirectory + ''', FORMATFILE = ''C:\BulkFile.fmt'', LASTROW = 1) AS Z' + CHAR(10) FROM cte FOR XML PATH(''), TYPE).value('.[1]', 'varchar(MAX)'), 1, 10, '')

    PRINT(@SQL)

    --eg

    SELECT 'File1.txt' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK 'C:\YourFolder\File1.txt', FORMATFILE = 'C:\BulkFile.fmt', LASTROW = 1) AS Z

    UNION ALL SELECT 'File2.txt' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK 'C:\YourFolder\File2.txt', FORMATFILE = 'C:\BulkFile.fmt', LASTROW = 1) AS Z

    UNION ALL SELECT 'File3.txt' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK 'C:\YourFolder\File3.txt', FORMATFILE = 'C:\BulkFile.fmt', LASTROW = 1) AS Z

    UNION ALL SELECT 'File4.txt' AS FileName, BulkColumn AS Header FROM OPENROWSET (BULK 'C:\YourFolder\File4.txt', FORMATFILE = 'C:\BulkFile.fmt', LASTROW = 1) AS Z

  • steve-893342 (8/23/2010)


    Well, at least with the BULK INSERT method I came up with, I first created 2 staging tables. Then for each each file, I needed to BULK INSERT into the first staging table, then INSERT into the second staging table and finally TRUNCATE the first staging table. This is done on a row by row basis which leads to multiple statements. Conversely the OPENROWSET BULK method resolves into a single UNION ALL SELECT statement. Maybe I'm missing something and there's a better way of doing the BULK INSERT :ermm:

    The only thing missing is more coffee for me. You're spot on here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply