Need to split the row values

  • p.shabbir (6/3/2016)


    These records are loaded while processing the text file using ssis. If the file is missing delimiter, the next row will append it to the current row.Hence, records are rejected and loaded into rejects table.

    ....

    {snip}

    ....

    File is Tab delimited. Hope this helps.

    Thnx.

    I suppose it's possible but, except for notorious "ragged right" files, I find it really unlikely that such files are missing delimiters, especially EOR (end of record) delimiters. I'd ask you to post the file as a zipped attachment but, seeing that there's a wad of names and email addresses in the data you've already posted, I'd say that there might be some PII in the file.

    With that in mind, ask yourself... if your information was in the file, would you consider it to be private information? Would your boss or your "Compliance Officer"? If not, then attach the zipped version of the file so that we can get to the bottom of this. If so, then I recommend that you go back through your posts and delete the data from the posts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Most example candidates begin:

    [font="Courier New"]<space><int><space><int><space><YYYYMMDD><space><MM/DD/YYYY><space><YYYYMMDD><space><GLOB><space>[/font]

    If this were ALWAYS the case, you could look for 'GLOB', count back six <space>, and you have found the split-point.

    You might even check that the three candidate-dates are valid-as-date to give yourself some added comfort.

    Even if <GLOB> can be <EUR> or <USA> or some other restricted range of values you could still handle it.

    The two exceptions in your examples are:

    200SC-offer-detail11362376 1224314 20160523 05/22/2016 20160526 GLOB

    200SC-offer-de11386712 1224314 20160523 05/22/2016 20160526 GLOB

    ...in each case the URL at the end of the previous candidate has been (randomly?) truncated and there is no <space> before the new candidate begins.

    Count back six <space> will FAIL.

    Plan "B": count back five <space> and step back while numeric, BUT

    If this end-of-candidate truncation is as random as it appears to be, the second exception above might have been:

    200SC-offer-details-May111386712 1224314 20160523 05/22/2016 20160526 GLOB

    or

    200SC-offer-details-May1611386712 1224314 20160523 05/22/2016 20160526 GLOB

    ...Plan "B" FAILS when the truncated URL ends numeric.

    Plan "C":

    If you can GUARANTEE that the first <int> is ALWAYS eight digits then maybe you have a route.

    Plan "C" FAILS when the first <int> is not eight digits.

    You are being provided with un-parsable corrupt data.

    Only sensible option is to have the source send you correctly formed data.

    It's their rubbish - they should clean it up.

    Here are the tails of the URLs in your examples with character count:

    [font="Courier New"]creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

    marriottpremier/Biz100k5ksh-offer-details-May16 '

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

    creditcards.chase.com/a1/marriottpremier/200SC-offer-detail

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

    creditcards.chase.com/a1/marriottpremier/200SC-offer-de

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

    creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

    1 2 2 2 2 2 2

    9 0 1 2 3 4 5

    0 0 0 0 0 0 0[/font]

    Most URLs in your examples are suspiciously close to 255 chars. Google will tell you that Excel sometimes plays rough with CSV fields over 255 chars (though the specification is 32k chars).

    Oddly enough, the two exception cases mentioned above truncate before 255 characters!!

    Most of your URLs is repeated junk: maybe the first 50 chars, the campaign_code (if any) , and everything after the last '/' is all you need to address business requirements and would sidestep any 255 chars issues. This extraction would need to handled by your data source.

    Chris

  • ChrisM@Work (6/3/2016)


    p.shabbir (6/3/2016)


    ChrisM@Work (6/3/2016)


    Do you have a means of detecting the split point?

    After

    "be.wylei.com/3dadf94363998b3bb263e8944efa95b4.smartclick?wid=${Profile.CustomerKey}&wcid=${CAMPAIGN_KEY}&wseg=${VSEG} applynow.chase.com/FlexAppWeb/pricing.do?card=FK6S&page_type=appterms creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16" value

    in the above example the record must be splitted.

    Thanks.

    You've done this by eye. Can you describe a rule or set of rules to find the split point?

    For instance, the example you posted has an 8-digit number as the first space-delimited element. It looks like an identifier, a row identifier. At the point where the row is split, the first element of the new row also has an 8-digit number as the first space-delimited element, and it's numerically similar: 11157243 compared with 11157246.

  • Chris_M above has pointed out above that some of your rows appear to be truncated...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (6/6/2016)


    Chris_M above has pointed out above that some of your rows appear to be truncated...

    Yes.

    Can this be done or if not, I will have to to do it manually for 396 rows. Copying each row and press enter after that string.It is difficult to do for 396 rows but, seems no option for me.

    Thanks.

  • Ok. I have managed to work it manually (It is urgent since, these records we need to send it back to the client). I thought it would take long time but, it took just half an hour.

    Once again Thanks to all (especially Chris :-))

    But, I was curious to achieve through the chris query but, was not successful.

    Thanks to all.

  • Chris_M (6/4/2016)


    Most example candidates begin:

    [font="Courier New"]<space><int><space><int><space><YYYYMMDD><space><MM/DD/YYYY><space><YYYYMMDD><space><GLOB><space>[/font]

    If this were ALWAYS the case, you could look for 'GLOB', count back six <space>, and you have found the split-point.

    You might even check that the three candidate-dates are valid-as-date to give yourself some added comfort.

    Even if <GLOB> can be <EUR> or <USA> or some other restricted range of values you could still handle it.

    The two exceptions in your examples are:

    200SC-offer-detail11362376 1224314 20160523 05/22/2016 20160526 GLOB

    200SC-offer-de11386712 1224314 20160523 05/22/2016 20160526 GLOB

    ...in each case the URL at the end of the previous candidate has been (randomly?) truncated and there is no <space> before the new candidate begins.

    Count back six <space> will FAIL.

    Plan "B": count back five <space> and step back while numeric, BUT

    If this end-of-candidate truncation is as random as it appears to be, the second exception above might have been:

    200SC-offer-details-May111386712 1224314 20160523 05/22/2016 20160526 GLOB

    or

    200SC-offer-details-May1611386712 1224314 20160523 05/22/2016 20160526 GLOB

    ...Plan "B" FAILS when the truncated URL ends numeric.

    Plan "C":

    If you can GUARANTEE that the first <int> is ALWAYS eight digits then maybe you have a route.

    Plan "C" FAILS when the first <int> is not eight digits.

    You are being provided with un-parsable corrupt data.

    Only sensible option is to have the source send you correctly formed data.

    It's their rubbish - they should clean it up.

    Here are the tails of the URLs in your examples with character count:

    1 2 2 2 2 2 2

    9 0 1 2 3 4 5

    0 0 0 0 0 0 0[/font]

    [/code]

    Most URLs in your examples are suspiciously close to 255 chars. Google will tell you that Excel sometimes plays rough with CSV fields over 255 chars (though the specification is 32k chars).

    Oddly enough, the two exception cases mentioned above truncate before 255 characters!!

    Most of your URLs is repeated junk: maybe the first 50 chars, the campaign_code (if any) , and everything after the last '/' is all you need to address business requirements and would sidestep any 255 chars issues. This extraction would need to handled by your data source.

    Chris

    Deleted the Data.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply