parse files in folder

  • I put output for filenames and found that the delimiter switched from * to a bar(|) delimiter.  Is there a way to dynamically

    figure out delimiter?

     

    Thanks for suggestions..

  • The code already figures out the data element separator based on the 4th position in the file - or for segment element separator the 106th position of the ISA record.

            # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(4,1);
    $segmentElement = $fileData.Substring(106,1);

    If that is the code you are using - then something else is causing a problem with determining the separators.  I tested with changing the data element separator to a pipe and changing the segment element separator to a pipe - and did not have any issues.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I outputted these values:

    # Validate this is an EDI file

    $fileData.Substring(0,3)

    ISA

    # Get the data element separator and segment element separator

    $dataElement = $fileData.Substring(3,1);

    *

    $segmentElement = $fileData.Substring(105,1);

    {

    It seems the record start at zero(0) based upon the output I'm seeing, so this is the only thing I changed in script.

    Thanks  does that help with debug?

     

  • Would this make the script any easier for lookups and speed.

    Seems like there are 2 different formats the records could use for Inbound(which I would already know because of the folder designation)

    I could use another parm and ask for Inbound or OutBound then switch to folder in script and use rules below for first pass.

    X12 standards:

    First record is always the ISA envelope, and this record has fixed length fields:

    a) position 36 contains the sending partner ID, total length = 15.

    b) position 71 contains the received date, format YYMMDD. .

    The next record is the group envelope – this record has variable length fields and is designated by the first “GS” after position 104

    a) Position 2 after the “GS” contains the document type: (i.e. “PO” for purchase order, “PC” for PO change).

  • Bruin wrote:

    I outputted these values:

    # Validate this is an EDI file $fileData.Substring(0,3) ISA

    # Get the data element separator and segment element separator $dataElement = $fileData.Substring(3,1); *

    $segmentElement = $fileData.Substring(105,1); {

    It seems the record start at zero(0) based upon the output I'm seeing, so this is the only thing I changed in script.

    Thanks  does that help with debug?

    It seems the files may have a different encoding than the sample data I used - which would lead to this kind of problem, as well as a problem with any code you try to write.  If the encoding is different then you need to deal with those differences.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • What would the code look like that follows the rules outlined above that looks at specific positions within the data record. I checked the records and for Inbound ISA records segments the rules would apply.

    Thanks for checking up!!!

  • I have provided code and samples - it is up to you to use that as a basis for extending and modifying the code to meet your requirements.  I have already stated that part of the problem is most likely due to the encoding of the files - but you seem to ignore that advice and go right back to parsing the files based on position.

    The first record of an EDI file is 106 characters - it is a fixed size.  After that - the other segments can be any length depending on the definition of the fields in those records.  You can parse the first 106 characters using substring - but you cannot rely on substring for anything past that point.

    If your files are encoded as ASCII files - then we need to change to this:

    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);

    If your files are encoded as unicode - then you need to change the following:

        if ($fileData.Substring(1,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(4,1);
    $segmentElement = $fileData.Substring(106,1);

    If you have a mix of Ascii and Unicode - then you would have to either:

    1. Write code to determine the encoding - and based on the encoding run different code blocks
    2. Convert the file to a specific encoding using: Get-Content ... | Set-Content ... -Encoding {your preferred encoding}

    One way to handle option 1:

    Read the data as Ascii - check the first 3 characters.  If the first 3 characters are ??I - convert the string to Unicode.  Set an offset variable and add that value to the substrings - for example:

    $fileList | ForEach-Object {
    $unicodeOffset = 0;
    $fileName = $_.FullName;

    # Get the first 300 bytes from the file
    $bytes = Get-Content $fileName -Encoding byte -TotalCount 500 -ReadCount 500;
    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);

    if ($fileData.Substring(0,3) -eq "??I") {
    $unicodeOffset = 1
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }

    # Validate this is an EDI file
    if ($fileData.Substring($unicodeOffset,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(3+$unicodeOffset,1);
    $segmentElement = $fileData.Substring(105+$unicodeOffset,1);

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin wrote:

    Thanks for update to script...

    I started it and got a message: You cannot call a method on a null-valued expression Error came from:

    if (($firstRow[6].Trim() -eq $Sender -or $Sender -eq "") -and ($firstRow[9] -eq $FileDate -or $FileDate -eq "")) {

    I ran it with FileDate and RecordType.. it's still running but through out this error every so often.

    Many tahnks for the followups..

    I was able to reproduce this error - by creating a file with UTF7 encoding and attempting to parse that file.  It looks like you might have some files that are using that encoding - and if that is the case you need to have the sender change that.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I was using:

    $fileData = [System.Text.Encoding]::UTF8.GetString($bytes);

     

  • Not sure what else to say - you seem to have files with different encodings, and worse - some files in UTF-7 which has been deprecated for quite a while.

    Since UTF-7 files do not have a BOM there is no way to accurately determine a UTF-7 file vs a UTF-8 vs Unicode or other encoding.

    You can not just go with position either, since UTF-7 files encode special characters.  For example - the asterisk '*' is encoded as +ACo- which is 5 bytes (characters).  If you use the correct encoding when converting the bytes to a string - those values are correctly converted to the appropriate special character and then you can parse the string as needed.  But you cannot do that until you have identified the encoding and converted the bytes to a string.

    If you don't have any UTF-7 files - then you have a mixture of ascii, unicode, utf8 and/or other encodings.  For Unicode or UTF-8 (which will be identified by the BOM) - the string will start in position 1.  For ascii (no BOM) the string starts in position 0 - which includes UTF-7 files because they don't have a BOM either.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Is there anything you can run to determine types of files your processing?

  • I really appreciate all of your help and suggestions, and code...  I'll see if I can take your snippets and code samples

    to try and see if I can get a solution that will work.

  • If the file was created with a BOM (byte order mark) - you can use that to determine the encoding.  Unicode\Unicode (Big-Endian)\Unicode (UTF-32)\Unicode (UTF-32 Big-Endian)\UTF-8 all have a definition for the BOM (byte order mark).

    UTF-7 does not have a BOM (at least not a definition in Windows that can be used - and it is a deprecated encoding).

    A file created using UTF-8 encoding may or may not have a BOM - and in many cases will not have a BOM.  This presents a challenge if we default to Ascii - but we can default to UTF-8 instead of Ascii which should work with normal ascii files just fine.

    With that said - an EDI file encoded as UTF-7 should always start with 'ISA' and the 4th position should be a + (char 43) and the 7th position should be a - (char 45).  You could write code that checks for the BOM as well as position 4 and 7 - to determine the encoding.  It may not be 100% accurate for UTF-7 files but it should be good enough for your purposes.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here are some hints:

    # Encoding check arrays
    [byte[]]$utf7 = 43,45;
    [byte[]]$unicode = 255,254;
    [byte[]]$utf8 = 239,187,191;
        if (-not (Compare-Object $bytes[0..1] $unicode)) {
    $offset = 1
    Write-Host 'Unicode encoded file identified';
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • param (
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][AllowEmptyString()][string]$Sender,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][AllowEmptyString()][string]$FileDate,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][AllowEmptyString()][string]$RecordType
    )

    # Check for at least one parameter selected
    if ($Sender -eq "" -and $FileDate -eq "" -and $RecordType -eq "") {
    Write-Host -ForegroundColor Yellow "At least one parameter must be selected. Please try again.";
    Exit;
    }


    # Get list of files
    $fileList = Get-ChildItem -Path C:\temp\EDI\Temp -Recurse | Where-Object {$_ -like "*.int"};

    $fileList | ForEach-Object {
    $unicodeOffset = 0;
    $fileName = $_.FullName;

    # Get the first 300 bytes from the file
    $bytes = Get-Content $fileName -Encoding byte -TotalCount 500 -ReadCount 500;
    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);

    if ($fileData.Substring(0,3) -eq "??I") {
    $unicodeOffset = 1
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }

    # Validate this is an EDI file
    if ($fileData.Substring($unicodeOffset,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(3+$unicodeOffset,1);
    $segmentElement = $fileData.Substring(105+$unicodeOffset,1);

    # Split first row based on segment and data element separators - Index = 0
    $firstRow = $fileData.Split($segmentElement)[0].Split($dataElement);

    # If we match the sender and the date - get the second row and check the record type
    if (($firstRow[6].Trim() -eq $Sender -or $Sender -eq "") -and ($firstRow[9] -eq $FileDate -or $FileDate -eq "")) {

    # Get the second row based on the segment and data element separators - Index = 1
    $secondRow = $fileData.Split($segmentElement)[1].Split($dataElement);

    if ($secondRow[1] -eq $RecordType -or $RecordType -eq "") {

    # Copy the file to the new location
    Copy-Item -Path $fileName -Destination "C:\Temp\Archive\$($filename)" -WhatIf;
    }
    }
    }
    }

    This worked!!!  It went thru all of the Inbound folders and located files based on criteria supplied in parms, and didn't throw any errors. Now I have to have it go thru the Outbound folders...I would like to maybe put up a message while it's executing maybe saying which folder it's searching and a count of files search give the user confidence it's going running and not looping or stuck. Can that easily be inserted in script?

    Again many thanks for patience and support much appricated.

     

     

Viewing 15 posts - 46 through 60 (of 88 total)

You must be logged in to reply to this topic. Login to reply