June 12, 2017 at 3:46 pm
I am working on SSIS and I am having the complex unstructured TEXT File, I have to parse the text file through creating SSIS Packages and get the required column's data in the DataBase.What is the best way to parse the Textfile and how can I write the Script to read each line in that Text file.I need help to how to write script to read each line in the file .
Required columns from the Text file data is LOGID, DEVICEID, DATAVALUE and DATAUNITS :
Help is greatly appreciated.
June 12, 2017 at 4:21 pm
While I can find Device Id in the data file, I cannot find any of the other data items you mention. Please clarify.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 12, 2017 at 4:24 pm
Phil Parkin - Monday, June 12, 2017 4:21 PMWhile I can find Device Id in the data file, I cannot find any of the other data items you mention. Please clarify.
Slot0 Var Units=251
Slot0 Var Value=4
These are the units and values.
June 13, 2017 at 7:28 am
I cannot see a LogId anywhere in the file.
There is no easy way of getting this to work.
But there is a complex way, and that is by using an Asynchronous Script Component to read the source file line by line to pick up and output only those parts of the file in which you are interested.
It will require significant development time to get all of the nuances ironed out and assumes that you are fairly confident writing C# code.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 13, 2017 at 8:37 am
this was kind of interesting, so I imported the file into a varchar max column, and used two CrLf as the delimiter, and the pipe character as the column delimiter(since it does notexist int he data)
That gave me 99763 rows(with data containing CrLf), instead of 587,475 rows.
there are only 5 Device Id's in the whole file, and 27,704 'Units=' and 'Value=' pairs.
i was assuming that Units/Values would be related to the previous appearing DeviceId, but all Units/Value Pairs appear after the last DeviceId in the file
I'm not sure that data in in an order related to the previous row(s) of data.
Lowell
June 13, 2017 at 9:18 am
Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.
June 13, 2017 at 9:21 am
Phil Parkin - Tuesday, June 13, 2017 7:28 AMI cannot see a LogId anywhere in the file.
There is no easy way of getting this to work.
But there is a complex way, and that is by using an Asynchronous Script Component to read the source file line by line to pick up and output only those parts of the file in which you are interested.
It will require significant development time to get all of the nuances ironed out and assumes that you are fairly confident writing C# code.
Hi Phil, forget about LogID can you send any reference link how to workout for this type of file to get the required Output schema.
June 13, 2017 at 9:59 am
stakhilmadivada - Tuesday, June 13, 2017 9:18 AM
Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.
Well that's the problem.
I was assuming that DeviceId showed up, and then all subsequent lines containing Units or Value were related to the previously found DeviceID
Since only one has data with that assumption, it doesn't seem correct.
unless only a single device has Units/Values(there are no dataunits/datavalue strings), I'm not sure my assumptions were correct at all.
Except for using the SSIS wizard to import, I just did everything in TSQL and used a collection of Cross Apply statements to extract some substrings. nothing magical.
CREATE TABLE [dbo].[TestData] (
[ImportRowID] INT IDENTITY(1,1) NOT NULL,
[Column 0] VARCHAR(max) NULL)
--UPDATE MyTarget SET MyTarget.DeviceId = FinalSubString.FinalString
select * from [TestData] MyTarget
--the start of the value
CROSS APPLY(SELECT SUBSTRING(MyTarget.[Column 0],CHARINDEX('Device ID=',MyTarget.[Column 0]) + LEN('Device ID='),8000) As strval) PartialString
CROSS APPLY(SELECT SUBSTRING(PartialString.strval,1,CHARINDEX(CHAR(13),strval) -1) AS FinalString) FinalSubString--the CrLf after the start of the value
WHERE MyTarget.[Column 0] LIKE '%Device ID=%'
Lowell
June 13, 2017 at 10:18 am
Lowell - Tuesday, June 13, 2017 9:59 AMstakhilmadivada - Tuesday, June 13, 2017 9:18 AM
Yeah right Lowell, It is a really interesting one in the data after five devicesID's again the deviceID's are repeating with new DATAUNITS and DATAVALUES, so the required Output Schema was to get all DATAUNITS and DATAVALUES for the five Devices.Well that's the problem.
I was assuming that DeviceId showed up, and then all subsequent lines containing Units or Value were related to the previously found DeviceID
Since only one has data with that assumption, it doesn't seem correct.unless only a single device has Units/Values(there are no dataunits/datavalue strings), I'm not sure my assumptions were correct at all.
Except for using the SSIS wizard to import, I just did everything in TSQL and used a collection of Cross Apply statements to extract some substrings. nothing magical.
CREATE TABLE [dbo].[TestData] (
[ImportRowID] INT IDENTITY(1,1) NOT NULL,
[Column 0] VARCHAR(max) NULL)--UPDATE MyTarget SET MyTarget.DeviceId = FinalSubString.FinalString
select * from [TestData] MyTarget
--the start of the value
CROSS APPLY(SELECT SUBSTRING(MyTarget.[Column 0],CHARINDEX('Device ID=',MyTarget.[Column 0]) + LEN('Device ID='),8000) As strval) PartialString
CROSS APPLY(SELECT SUBSTRING(PartialString.strval,1,CHARINDEX(CHAR(13),strval) -1) AS FinalString) FinalSubString--the CrLf after the start of the value
WHERE MyTarget.[Column 0] LIKE '%Device ID=%'
Thankyou Lowell for your quick response, How can I create package in ssis to read the each line in ssis and get the data?
Help is greatly appreciated,
Thanks.
June 13, 2017 at 11:18 am
Can you answer Lowell's question about how to associate a given DeviceId with the Unit/Value pairs please?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 13, 2017 at 11:25 am
For every Device ID it returns the both data and unit fields.
June 13, 2017 at 11:40 am
stakhilmadivada - Tuesday, June 13, 2017 11:25 AMFor every Device ID it returns the both data and unit fields.
OK, here are some more direct questions:
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 13, 2017 at 11:45 am
stakhilmadivada - Tuesday, June 13, 2017 11:25 AMFor every Device ID it returns the both data and unit fields.
but the Data you provided clearly shows that is NOT the case.
In the raw file, search for the Five 'Device ID=' locations.
Line 30 Device ID=74813
Line 81 Device ID=8001643
Line 99 Device ID=1053375
Line 117 Device ID=7010724
Line 135 Device ID=8002090
There are no Unit= or Value= prior to line 135.
The ALL appear at Line 153 or later. so how are they related? I looked at things like Sub-Device Index= and others, there's no obvious relationship I see between any block of code like this, and a previously Device Reference
Rx Cmd=9, Rsp code=0x00, Device Status=0x50
Extended Device Status=0
Slot0 Var Code=246
Slot0 Var Classification=0
Slot0 Var Units=251
Slot0 Var Value=4
Slot0 Var Status=C0
Slot1 Var Code=116
Slot1 Var Classification=209
Slot1 Var Units=70
Slot1 Var Value=0
Lowell
June 13, 2017 at 11:53 am
Phil Parkin - Tuesday, June 13, 2017 11:40 AMstakhilmadivada - Tuesday, June 13, 2017 11:25 AMFor every Device ID it returns the both data and unit fields.
- For DeviceId 748132, what are the associated data and unit values?
- How do you know that these specific values are associated with this DeviceId and not with other DeviceIds?
Here is my required output schema forget about ID and LogID because one is Identity function and other is default, now how can I get all data and units for all devices?
June 13, 2017 at 11:54 am
Lowell - Tuesday, June 13, 2017 11:45 AMstakhilmadivada - Tuesday, June 13, 2017 11:25 AMFor every Device ID it returns the both data and unit fields.but the Data you provided clearly shows that is NOT the case.
In the raw file, search for the Five 'Device ID=' locations.Line 30 Device ID=74813
Line 81 Device ID=8001643
Line 99 Device ID=1053375
Line 117 Device ID=7010724
Line 135 Device ID=8002090There are no Unit= or Value= prior to line 135.
The ALL appear at Line 153 or later. so how are they related? I looked at things like Sub-Device Index= and others, there's no obvious relationship I see between any block of code like this, and a previously Device ReferenceRx Cmd=9, Rsp code=0x00, Device Status=0x50
Extended Device Status=0
Slot0 Var Code=246
Slot0 Var Classification=0
Slot0 Var Units=251
Slot0 Var Value=4
Slot0 Var Status=C0
Slot1 Var Code=116
Slot1 Var Classification=209
Slot1 Var Units=70
Slot1 Var Value=0
Here is my required output schema forget about ID and LogID because one is Identity function and other is default, now how can I get all data and units for all devices.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply