Parsing Unstructured Text file in SSIS and read the each line to get the required data

  • I am working on SSIS and I am having the complex unstructured TEXT File, I have to parse the text file through creating SSIS Packages and get the required column's data in the DataBase.What is the best way to parse the Textfile and how can I write the Script to read each line in that Text file.I need help to how to write script to read each line in the file .

    Required columns from the Text file data is LOGID, DEVICEID, DATAVALUE and DATAUNITS :

    Help is greatly appreciated.

  • While I can find Device Id in the data file, I cannot find any of the other data items you mention. Please clarify.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin - Monday, June 12, 2017 4:21 PM

    While I can find Device Id in the data file, I cannot find any of the other data items you mention. Please clarify.

    Slot0 Var Units=251
    Slot0 Var Value=4
    These are the units and values.

  • I cannot see a LogId anywhere in the file.
    There is no easy way of getting this to work.
    But there is a complex way, and that is by using an Asynchronous Script Component to read the source file line by line to pick up and output only those parts of the file in which you are interested.
    It will require significant development time to get all of the nuances ironed out and assumes that you are fairly confident writing C# code.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • this was kind of interesting, so I imported the file into a varchar max column, and used two CrLf as the delimiter, and the pipe character as the column delimiter(since it does notexist int he data)
    That gave me 99763 rows(with data containing CrLf), instead of 587,475 rows.
    there are only 5 Device Id's in the whole file, and 27,704 'Units=' and 'Value=' pairs.

    i was assuming that Units/Values would be related to the previous appearing DeviceId, but all Units/Value Pairs appear after the last DeviceId in the file
    I'm not sure that data in in an order related to the previous row(s) of data.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.

  • Phil Parkin - Tuesday, June 13, 2017 7:28 AM

    I cannot see a LogId anywhere in the file.
    There is no easy way of getting this to work.
    But there is a complex way, and that is by using an Asynchronous Script Component to read the source file line by line to pick up and output only those parts of the file in which you are interested.
    It will require significant development time to get all of the nuances ironed out and assumes that you are fairly confident writing C# code.

    Hi Phil, forget about LogID can you send any reference link how to workout for this type of file to get the required Output schema.

  • stakhilmadivada - Tuesday, June 13, 2017 9:18 AM

    Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.

    Well that's the problem.
    I was assuming that  DeviceId showed up, and then all subsequent lines containing Units or Value were related to the previously found DeviceID
    Since only one has data with that assumption, it doesn't seem correct.

    unless only a single device has Units/Values(there are no dataunits/datavalue strings), I'm not sure my assumptions were correct at all.

    Except for using the SSIS wizard to import, I just did everything in TSQL and used a collection of Cross Apply statements to extract some substrings. nothing magical.

    CREATE TABLE [dbo].[TestData] (
    [ImportRowID]  INT              IDENTITY(1,1)          NOT NULL,
    [Column 0]     VARCHAR(max)                                NULL)

    --UPDATE MyTarget SET MyTarget.DeviceId = FinalSubString.FinalString
    select * from [TestData] MyTarget
    --the start of the value
    CROSS APPLY(SELECT SUBSTRING(MyTarget.[Column 0],CHARINDEX('Device ID=',MyTarget.[Column 0]) + LEN('Device ID='),8000) As strval) PartialString
    CROSS APPLY(SELECT SUBSTRING(PartialString.strval,1,CHARINDEX(CHAR(13),strval) -1) AS FinalString) FinalSubString--the CrLf after the start of the value
    WHERE MyTarget.[Column 0] LIKE '%Device ID=%'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell - Tuesday, June 13, 2017 9:59 AM

    stakhilmadivada - Tuesday, June 13, 2017 9:18 AM

    Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.

    Well that's the problem.
    I was assuming that  DeviceId showed up, and then all subsequent lines containing Units or Value were related to the previously found DeviceID
    Since only one has data with that assumption, it doesn't seem correct.

    unless only a single device has Units/Values(there are no dataunits/datavalue strings), I'm not sure my assumptions were correct at all.

    Except for using the SSIS wizard to import, I just did everything in TSQL and used a collection of Cross Apply statements to extract some substrings. nothing magical.

    CREATE TABLE [dbo].[TestData] (
    [ImportRowID]  INT              IDENTITY(1,1)          NOT NULL,
    [Column 0]     VARCHAR(max)                                NULL)

    --UPDATE MyTarget SET MyTarget.DeviceId = FinalSubString.FinalString
    select * from [TestData] MyTarget
    --the start of the value
    CROSS APPLY(SELECT SUBSTRING(MyTarget.[Column 0],CHARINDEX('Device ID=',MyTarget.[Column 0]) + LEN('Device ID='),8000) As strval) PartialString
    CROSS APPLY(SELECT SUBSTRING(PartialString.strval,1,CHARINDEX(CHAR(13),strval) -1) AS FinalString) FinalSubString--the CrLf after the start of the value
    WHERE MyTarget.[Column 0] LIKE '%Device ID=%'

    Thankyou Lowell for your quick response, How can I create package in ssis to read the each line in ssis and get the data?
    Help is greatly appreciated,
    Thanks.

  • Can you answer Lowell's question about how to associate a given DeviceId with the Unit/Value pairs please?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • For every Device ID it returns the both data and unit fields.

  • stakhilmadivada - Tuesday, June 13, 2017 11:25 AM

    For every Device ID it returns the both data and unit fields.

    OK, here are some more direct questions:

    • For DeviceId 748132, what are the associated data and unit values?
    • How do you know that these specific values are associated with this DeviceId and not with other DeviceIds?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • stakhilmadivada - Tuesday, June 13, 2017 11:25 AM

    For every Device ID it returns the both data and unit fields.

    but the Data you provided clearly shows that is NOT the case.
    In the raw file, search for the Five 'Device ID=' locations.

    Line  30 Device ID=74813
    Line  81 Device ID=8001643
    Line  99 Device ID=1053375
    Line 117 Device ID=7010724
    Line 135 Device ID=8002090

    There are no Unit= or Value= prior to line 135.
    The ALL appear at Line 153 or later. so how are they related?  I looked at things like Sub-Device Index= and others, there's no obvious relationship I see between any block of code like this, and a previously Device Reference

    Rx Cmd=9, Rsp code=0x00, Device Status=0x50
    Extended Device Status=0
    Slot0 Var Code=246
    Slot0 Var Classification=0
    Slot0 Var Units=251
    Slot0 Var Value=4
    Slot0 Var Status=C0
    Slot1 Var Code=116
    Slot1 Var Classification=209
    Slot1 Var Units=70
    Slot1 Var Value=0

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Phil Parkin - Tuesday, June 13, 2017 11:40 AM

    stakhilmadivada - Tuesday, June 13, 2017 11:25 AM

    For every Device ID it returns the both data and unit fields.

    • For DeviceId 748132, what are the associated data and unit values?
    • How do you know that these specific values are associated with this DeviceId and not with other DeviceIds?

    Here is my required output schema forget about ID and LogID because one is Identity function and other is default, now how can I get all data and units for all devices?

  • Lowell - Tuesday, June 13, 2017 11:45 AM

    stakhilmadivada - Tuesday, June 13, 2017 11:25 AM

    For every Device ID it returns the both data and unit fields.

    but the Data you provided clearly shows that is NOT the case.
    In the raw file, search for the Five 'Device ID=' locations.

    Line  30 Device ID=74813
    Line  81 Device ID=8001643
    Line  99 Device ID=1053375
    Line 117 Device ID=7010724
    Line 135 Device ID=8002090

    There are no Unit= or Value= prior to line 135.
    The ALL appear at Line 153 or later. so how are they related?  I looked at things like Sub-Device Index= and others, there's no obvious relationship I see between any block of code like this, and a previously Device Reference

    Rx Cmd=9, Rsp code=0x00, Device Status=0x50
    Extended Device Status=0
    Slot0 Var Code=246
    Slot0 Var Classification=0
    Slot0 Var Units=251
    Slot0 Var Value=4
    Slot0 Var Status=C0
    Slot1 Var Code=116
    Slot1 Var Classification=209
    Slot1 Var Units=70
    Slot1 Var Value=0

    Here is my required output schema forget about ID and LogID because one is Identity function and other is default, now how can I get all data and units for all devices.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply