Parsing Text Files

  • has anyone parsed a text file where there is no clear delimiter so you have to search for a string value that occurs on each line ? I need to know if SSIS can do this quickly for over 2 million lines of text.

    example of the data:

    employee name: joe schmoe employee id:123455

    reportName: some report name

    sun 1/1/2005 12:00pm 12:30 pm 1:00pm 2:00pm

    tue 1/2/2005 12:00pm 12:30 pm 1:00pm 2:00pm

    ...

    etc...for over 2million lines.

    but for each week the data repeats with the employee name, id, reportname, etc. So, each week=1 page of the text file.

    We have successfully parsed the file but it takes like 3 weeks to do so. I can't figure out why. I'm not the one running it, so I don't have the procedure. I just want to find a faster way of doing it.

    Thanks!

  • There's no way of doing that just through a quick bit of configuration - looks to me like it will need a script component to read each line and do the breakdown in code, with multiple outputs (it's not clear from your example what happens to the data once it has been unscrambled). That's not going to be blindingly fast with those data volumes, though I would expect a considerable improvement over three weeks!!!!

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Sorry to be vague. The data goes into a table so we can do time punch calculations to see where there are violations. So we don't need all the stuff at the beginning or the end of every page just the portion that states the day they worked plus the punches.

    I'm determined to find a better way. I'll look into the script task to see if i can figure it out. Thanks.

  • fi we could get the process down to a couple of days I would be excited. 🙂

  • There are script tasks (which run in the Control Flow) and script components (which run within a data flow). It is the latter which I think you would need.

    IMO it should run in hours rather than days...

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • i'll take a shot at it and see what i can do.. Thanks so much for the suggestions.

  • Good luck. Now the fun bit starts. I suggest that you start with a source file containing just 10 lines or so to speed up your development.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • ummm...yah, that's a good idea. Thanks. 🙂

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply