Need to split the row values

Question

Need to split the row values

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 1

p.shabbir (6/3/2016)
These records are loaded while processing the text file using ssis. If the file is missing delimiter, the next row will append it to the current row.Hence, records are rejected and loaded into rejects table.
....
{snip}
....
File is Tab delimited. Hope this helps.
Thnx.

I suppose it's possible but, except for notorious "ragged right" files, I find it really unlikely that such files are missing delimiters, especially EOR (end of record) delimiters. I'd ask you to post the file as a zipped attachment but, seeing that there's a wad of names and email addresses in the data you've already posted, I'd say that there might be some PII in the file.

With that in mind, ask yourself... if your information was in the file, would you consider it to be private information? Would your boss or your "Compliance Officer"? If not, then attach the zipped version of the file so that we can get to the bottom of this. If so, then I recommend that you go back through your posts and delete the data from the posts.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Chris_M Old Hand Points: 326 More actions · Answer 2

Most example candidates begin:

[font="Courier New"]<space><int><space><int><space><YYYYMMDD><space><MM/DD/YYYY><space><YYYYMMDD><space><GLOB><space>[/font]

If this were ALWAYS the case, you could look for 'GLOB', count back six <space>, and you have found the split-point.

You might even check that the three candidate-dates are valid-as-date to give yourself some added comfort.

Even if <GLOB> can be <EUR> or <USA> or some other restricted range of values you could still handle it.

The two exceptions in your examples are:

200SC-offer-detail11362376 1224314 20160523 05/22/2016 20160526 GLOB

200SC-offer-de11386712 1224314 20160523 05/22/2016 20160526 GLOB

...in each case the URL at the end of the previous candidate has been (randomly?) truncated and there is no <space> before the new candidate begins.

Count back six <space> will FAIL.

Plan "B": count back five <space> and step back while numeric, BUT

If this end-of-candidate truncation is as random as it appears to be, the second exception above might have been:

200SC-offer-details-May111386712 1224314 20160523 05/22/2016 20160526 GLOB

or

200SC-offer-details-May1611386712 1224314 20160523 05/22/2016 20160526 GLOB

...Plan "B" FAILS when the truncated URL ends numeric.

Plan "C":

If you can GUARANTEE that the first <int> is ALWAYS eight digits then maybe you have a route.

Plan "C" FAILS when the first <int> is not eight digits.

You are being provided with un-parsable corrupt data.

Only sensible option is to have the source send you correctly formed data.

It's their rubbish - they should clean it up.

Here are the tails of the URLs in your examples with character count:

[font="Courier New"]creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

marriottpremier/Biz100k5ksh-offer-details-May16 '

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

creditcards.chase.com/a1/marriottpremier/200SC-offer-detail

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

creditcards.chase.com/a1/marriottpremier/200SC-offer-de

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16

creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16 '

1 2 2 2 2 2 2

9 0 1 2 3 4 5

0 0 0 0 0 0 0[/font]

Most URLs in your examples are suspiciously close to 255 chars. Google will tell you that Excel sometimes plays rough with CSV fields over 255 chars (though the specification is 32k chars).

Oddly enough, the two exception cases mentioned above truncate before 255 characters!!

Most of your URLs is repeated junk: maybe the first 50 chars, the campaign_code (if any) , and everything after the last '/' is all you need to address business requirements and would sidestep any 255 chars issues. This extraction would need to handled by your data source.

Chris

p.shabbir SSCrazy Points: 2404 More actions · Answer 3

ChrisM@Work (6/3/2016)
p.shabbir (6/3/2016)
ChrisM@Work (6/3/2016)
Do you have a means of detecting the split point?
After
"be.wylei.com/3dadf94363998b3bb263e8944efa95b4.smartclick?wid=${Profile.CustomerKey}&wcid=${CAMPAIGN_KEY}&wseg=${VSEG} applynow.chase.com/FlexAppWeb/pricing.do?card=FK6S&page_type=appterms creditcards.chase.com/a1/marriottpremier/200SC-offer-details-May16" value
in the above example the record must be splitted.
Thanks.
You've done this by eye. Can you describe a rule or set of rules to find the split point?
For instance, the example you posted has an 8-digit number as the first space-delimited element. It looks like an identifier, a row identifier. At the point where the row is split, the first element of the new row also has an 8-digit number as the first space-delimited element, and it's numerically similar: 11157243 compared with 11157246.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Chris_M above has pointed out above that some of your rows appear to be truncated...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

p.shabbir SSCrazy Points: 2404 More actions · Answer 5

ChrisM@Work (6/6/2016)
Chris_M above has pointed out above that some of your rows appear to be truncated...

Yes.

Can this be done or if not, I will have to to do it manually for 396 rows. Copying each row and press enter after that string.It is difficult to do for 396 rows but, seems no option for me.

Thanks.

p.shabbir SSCrazy Points: 2404 More actions · Answer 6

Ok. I have managed to work it manually (It is urgent since, these records we need to send it back to the client). I thought it would take long time but, it took just half an hour.

Once again Thanks to all (especially Chris :-))

But, I was curious to achieve through the chris query but, was not successful.

Thanks to all.

p.shabbir SSCrazy Points: 2404 More actions · Answer 7

Chris_M (6/4/2016)
Most example candidates begin:
[font="Courier New"]<space><int><space><int><space><YYYYMMDD><space><MM/DD/YYYY><space><YYYYMMDD><space><GLOB><space>[/font]
If this were ALWAYS the case, you could look for 'GLOB', count back six <space>, and you have found the split-point.
You might even check that the three candidate-dates are valid-as-date to give yourself some added comfort.
Even if <GLOB> can be <EUR> or <USA> or some other restricted range of values you could still handle it.
The two exceptions in your examples are:
200SC-offer-detail11362376 1224314 20160523 05/22/2016 20160526 GLOB
200SC-offer-de11386712 1224314 20160523 05/22/2016 20160526 GLOB
...in each case the URL at the end of the previous candidate has been (randomly?) truncated and there is no <space> before the new candidate begins.
Count back six <space> will FAIL.
Plan "B": count back five <space> and step back while numeric, BUT
If this end-of-candidate truncation is as random as it appears to be, the second exception above might have been:
200SC-offer-details-May111386712 1224314 20160523 05/22/2016 20160526 GLOB
or
200SC-offer-details-May1611386712 1224314 20160523 05/22/2016 20160526 GLOB
...Plan "B" FAILS when the truncated URL ends numeric.
Plan "C":
If you can GUARANTEE that the first <int> is ALWAYS eight digits then maybe you have a route.
Plan "C" FAILS when the first <int> is not eight digits.
You are being provided with un-parsable corrupt data.
Only sensible option is to have the source send you correctly formed data.
It's their rubbish - they should clean it up.
Here are the tails of the URLs in your examples with character count:
1 2 2 2 2 2 2
9 0 1 2 3 4 5
0 0 0 0 0 0 0[/font]
[/code]
Most URLs in your examples are suspiciously close to 255 chars. Google will tell you that Excel sometimes plays rough with CSV fields over 255 chars (though the specification is 32k chars).
Oddly enough, the two exception cases mentioned above truncate before 255 characters!!
Most of your URLs is repeated junk: maybe the first 50 chars, the campaign_code (if any) , and everything after the last '/' is all you need to address business requirements and would sidestep any 255 chars issues. This extraction would need to handled by your data source.
Chris

Deleted the Data.