SUBSTRING and CHARINDEX query.... please help.

  • Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    some more thoughts here....
    https://www.brentozar.com/archive/2016/03/splitting-strings-sql-server-2016-rescue/

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

  • Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Eirikur, Wayne Sheffield has done some testing and has a good article at http://www.sqlservercentral.com/articles/STRING_SPLIT/139338/.  He covers quite a few different scenarios in the data.  It didn't include DelimitedSplit8K_LEAD and I know you're very familiar with it - it's yours. 😉

  • Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, June 23, 2017 11:39 PM

    Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    Good points as always Jeff, it would though take a fairly twisted programming not to return the items in the correct order, slightly sinister if that's the case.
    😎
    The JSON method is almost as fast or in some cases equal but then again, haven't seen anything on the order of items returned.

  • Eirikur Eiriksson - Saturday, June 24, 2017 7:36 AM

    Jeff Moden - Friday, June 23, 2017 11:39 PM

    Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    Good points as always Jeff, it would though take a fairly twisted programming not to return the items in the correct order, slightly sinister if that's the case.
    😎
    The JSON method is almost as fast or in some cases equal but then again, haven't seen anything on the order of items returned.

    Absolutely agreed that it would actually be difficult to not return things in the correct order.  I'm just amazed that MS didn't at least make it an option because it's just not difficult to do so.  I'm also a bit miffed that they violated the basic rule that NULL begets NULL in functions.  I'm also beating the ol' "Best Practice" drum (which isn't always right) of there not being a guarantee of sort order without an ORDER BY on something other than (for example) a constant inferred by (SELECT NULL).

    And, my apologies for the rant on all of this... I'm just really ticked off that they didn't put more thought into what could have easily been a killer function.

    Here's my wish list for what they should have done with the function.  I consider items 1 thru 5 to be essential.  The rest would be really nice to have but are not essential (IMHO).
    1.  Permanently or optionally return the ordinal position of each split-out element.
    2.  Always return something.  If you pass it a NULL, return a NULL.  If you pass it a single element empty string, return an empty string.  If you pass it a one or more spaces, return the spaces (unless spaces are the delimiter, of course, and should be carefully considered because THAT'S a very common use case).
    3. If you don't pass a delimiter or any other option other than the ordinal position option (if it is optional), split at the character level.
    4. Have a "True CSV/TSV" option.
    5. Allow for multi-character delimiters.
    6. Have an option to return either NULLs or empty strings for adjacent delimiters.  Delimiters with even a single space between them would not be considered as adjacent.
    7. Have an option for a "word" split where adjacent non alpha-numeric characters are treated as one and auto-magically used as the delimiter possibly including an exclusion list for characters like dashes, underscores, etc.
    8. Have an "array" option where you can identify the delimiters for up to 3 levels (think flatted array or cube being passed) that would include the proper ordinals for each level.

    {EDIT}  And I just added that wish list to Peter Larsson's CONNECT item.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • below86 - Thursday, June 22, 2017 11:58 AM

  • Jeff Moden - Saturday, June 24, 2017 9:51 AM

    Eirikur Eiriksson - Saturday, June 24, 2017 7:36 AM

    Jeff Moden - Friday, June 23, 2017 11:39 PM

    Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    Good points as always Jeff, it would though take a fairly twisted programming not to return the items in the correct order, slightly sinister if that's the case.
    😎
    The JSON method is almost as fast or in some cases equal but then again, haven't seen anything on the order of items returned.

    Absolutely agreed that it would actually be difficult to not return things in the correct order.  I'm just amazed that MS didn't at least make it an option because it's just not difficult to do so.  I'm also a bit miffed that they violated the basic rule that NULL begets NULL in functions.  I'm also beating the ol' "Best Practice" drum (which isn't always right) of there not being a guarantee of sort order without an ORDER BY on something other than (for example) a constant inferred by (SELECT NULL).

    And, my apologies for the rant on all of this... I'm just really ticked off that they didn't put more thought into what could have easily been a killer function.

    Here's my wish list for what they should have done with the function.  I consider items 1 thru 5 to be essential.  The rest would be really nice to have but are not essential (IMHO).
    1.  Permanently or optionally return the ordinal position of each split-out element.
    2.  Always return something.  If you pass it a NULL, return a NULL.  If you pass it a single element empty string, return an empty string.  If you pass it a one or more spaces, return the spaces (unless spaces are the delimiter, of course, and should be carefully considered because THAT'S a very common use case).
    3. If you don't pass a delimiter or any other option other than the ordinal position option (if it is optional), split at the character level.
    4. Have a "True CSV/TSV" option.
    5. Allow for multi-character delimiters.
    6. Have an option to return either NULLs or empty strings for adjacent delimiters.  Delimiters with even a single space between them would not be considered as adjacent.
    7. Have an option for a "word" split where adjacent non alpha-numeric characters are treated as one and auto-magically used as the delimiter possibly including an exclusion list for characters like dashes, underscores, etc.
    8. Have an "array" option where you can identify the delimiters for up to 3 levels (think flatted array or cube being passed) that would include the proper ordinals for each level.

    {EDIT}  And I just added that wish list to Peter Larsson's CONNECT item.

    That's quite a list, but it's all good stuff.  You know, it's a shame that it takes people who don't work at MS to put together such a list.  If they had done it right the first time, the list wouldn't be nearly as long.

  • Jeff Moden - Saturday, June 24, 2017 9:51 AM

    Eirikur Eiriksson - Saturday, June 24, 2017 7:36 AM

    Jeff Moden - Friday, June 23, 2017 11:39 PM

    Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    Good points as always Jeff, it would though take a fairly twisted programming not to return the items in the correct order, slightly sinister if that's the case.
    😎
    The JSON method is almost as fast or in some cases equal but then again, haven't seen anything on the order of items returned.

    Absolutely agreed that it would actually be difficult to not return things in the correct order.  I'm just amazed that MS didn't at least make it an option because it's just not difficult to do so.  I'm also a bit miffed that they violated the basic rule that NULL begets NULL in functions.  I'm also beating the ol' "Best Practice" drum (which isn't always right) of there not being a guarantee of sort order without an ORDER BY on something other than (for example) a constant inferred by (SELECT NULL).

    And, my apologies for the rant on all of this... I'm just really ticked off that they didn't put more thought into what could have easily been a killer function.

    Here's my wish list for what they should have done with the function.  I consider items 1 thru 5 to be essential.  The rest would be really nice to have but are not essential (IMHO).
    1.  Permanently or optionally return the ordinal position of each split-out element.
    2.  Always return something.  If you pass it a NULL, return a NULL.  If you pass it a single element empty string, return an empty string.  If you pass it a one or more spaces, return the spaces (unless spaces are the delimiter, of course, and should be carefully considered because THAT'S a very common use case).
    3. If you don't pass a delimiter or any other option other than the ordinal position option (if it is optional), split at the character level.
    4. Have a "True CSV/TSV" option.
    5. Allow for multi-character delimiters.
    6. Have an option to return either NULLs or empty strings for adjacent delimiters.  Delimiters with even a single space between them would not be considered as adjacent.
    7. Have an option for a "word" split where adjacent non alpha-numeric characters are treated as one and auto-magically used as the delimiter possibly including an exclusion list for characters like dashes, underscores, etc.
    8. Have an "array" option where you can identify the delimiters for up to 3 levels (think flatted array or cube being passed) that would include the proper ordinals for each level.

    {EDIT}  And I just added that wish list to Peter Larsson's CONNECT item.

    Such a list requires 3 consecutive versions of SQL Server to be implemented.
    And probably 2 more for improvements (I mean - fixing bugs).

    I can't help noticing that the"quoted delimited" functionality (to be able to read files exported from Excel) is not even mentioned by anyone.
    It seems like everyone quietly agrees that it would be a "too hard" task for such a software house as MS.

    _____________
    Code for TallyGenerator

  • Sergiy - Tuesday, June 27, 2017 4:49 PM

    Jeff Moden - Saturday, June 24, 2017 9:51 AM

    Eirikur Eiriksson - Saturday, June 24, 2017 7:36 AM

    Jeff Moden - Friday, June 23, 2017 11:39 PM

    Ed Wagner - Friday, June 23, 2017 5:57 PM

    Jeff Moden - Friday, June 23, 2017 2:52 PM

    Eirikur Eiriksson - Friday, June 23, 2017 12:36 PM

    I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

    Understood but MS has not made a guarantee as to order even when using the likes of ROW_NUMBER.  I understand how the CLR behind the scenes may have been written and if was written as such, will reliably return the correct order with ROW_NUMBER if that's the only sort.  I don't have an example right now but I have seen multiple sorts resulting from multiple ROW_NUMBERs throw things out of whack when using an operand such as (SELECT NULL) for the ORDER BY part of ROW_NUMBER. 

    Unlike a thing like the Quirky Update, it is difficult to build in a check to see if the sort order is the same as the original string without adding a substantial amount of overhead in the form of rebuilding the string from the returned elements to do a compare to the original.

    Use with caution.

    Until MS finishes the function by adding ordinal and guarantees order, STRING_SPLIT has limited uses for me.  There are some cases where order isn't important, but when dealing with things like IP addresses, paths and a lot of other types of data, it's important.  I'll continue to use DS8K (original and improved) until the function is finished.

    I read about the rumor in the comments on Brent's site and I've seen the MS Connect item at https://connect.microsoft.com/SQLServer/Feedback/Details/2433171.  It's only 15 months old and the vote is 62 Yes to 0 No.  We'll see if MS decides to act on it.  Personally, I really hope they do.

    Between it missing the element ordinal and the fact that it doesn't return a NULL when passed a NULL, I have little use for it.

    Good points as always Jeff, it would though take a fairly twisted programming not to return the items in the correct order, slightly sinister if that's the case.
    😎
    The JSON method is almost as fast or in some cases equal but then again, haven't seen anything on the order of items returned.

    Absolutely agreed that it would actually be difficult to not return things in the correct order.  I'm just amazed that MS didn't at least make it an option because it's just not difficult to do so.  I'm also a bit miffed that they violated the basic rule that NULL begets NULL in functions.  I'm also beating the ol' "Best Practice" drum (which isn't always right) of there not being a guarantee of sort order without an ORDER BY on something other than (for example) a constant inferred by (SELECT NULL).

    And, my apologies for the rant on all of this... I'm just really ticked off that they didn't put more thought into what could have easily been a killer function.

    Here's my wish list for what they should have done with the function.  I consider items 1 thru 5 to be essential.  The rest would be really nice to have but are not essential (IMHO).
    1.  Permanently or optionally return the ordinal position of each split-out element.
    2.  Always return something.  If you pass it a NULL, return a NULL.  If you pass it a single element empty string, return an empty string.  If you pass it a one or more spaces, return the spaces (unless spaces are the delimiter, of course, and should be carefully considered because THAT'S a very common use case).
    3. If you don't pass a delimiter or any other option other than the ordinal position option (if it is optional), split at the character level.
    4. Have a "True CSV/TSV" option.
    5. Allow for multi-character delimiters.
    6. Have an option to return either NULLs or empty strings for adjacent delimiters.  Delimiters with even a single space between them would not be considered as adjacent.
    7. Have an option for a "word" split where adjacent non alpha-numeric characters are treated as one and auto-magically used as the delimiter possibly including an exclusion list for characters like dashes, underscores, etc.
    8. Have an "array" option where you can identify the delimiters for up to 3 levels (think flatted array or cube being passed) that would include the proper ordinals for each level.

    {EDIT}  And I just added that wish list to Peter Larsson's CONNECT item.

    Such a list requires 3 consecutive versions of SQL Server to be implemented.
    And probably 2 more for improvements (I mean - fixing bugs).

    I can't help noticing that the"quoted delimited" functionality (to be able to read files exported from Excel) is not even mentioned by anyone.
    It seems like everyone quietly agrees that it would be a "too hard" task for such a software house as MS.

    It actually is in my list.  Even though it's a bit of a stretch, I included "True CSV" in my wish list which should also take care of the abominations that Excel produces.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, June 27, 2017 5:54 PM

    It actually is in my list.  Even though it's a bit of a stretch, I included "True CSV" in my wish list which should also take care of the abominations that Excel produces.

    OK, I was not sure about the definition of "true CSV".
    Does it include line breaks within quoted values?

    _____________
    Code for TallyGenerator

  • Sergiy - Tuesday, June 27, 2017 10:42 PM

    Jeff Moden - Tuesday, June 27, 2017 5:54 PM

    It actually is in my list.  Even though it's a bit of a stretch, I included "True CSV" in my wish list which should also take care of the abominations that Excel produces.

    OK, I was not sure about the definition of "true CSV".
    Does it include line breaks within quoted values?

    Yes.  And tabs, commas, double pipes, doubled quotes to escape embedded quotes, etc, etc.  But, to your point, since MS has been getting it wrong for so long, they might not know what the definition of "True CSV" is and probably should have explicitly mentioned the "Comedy Separated Values" that their products seem to exude.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, June 28, 2017 7:21 AM

    Sergiy - Tuesday, June 27, 2017 10:42 PM

    Jeff Moden - Tuesday, June 27, 2017 5:54 PM

    It actually is in my list.  Even though it's a bit of a stretch, I included "True CSV" in my wish list which should also take care of the abominations that Excel produces.

    OK, I was not sure about the definition of "true CSV".
    Does it include line breaks within quoted values?

    Yes.  And tabs, commas, double pipes, doubled quotes to escape embedded quotes, etc, etc.  But, to your point, since MS has been getting it wrong for so long, they might not know what the definition of "True CSV" is and probably should have explicitly mentioned the "Comedy Separated Values" that their products seem to exude.

    Well, MS Excel reads "true CSV" files correctly.
    At least older versions of it. Latest versions are so bad in so many ways that I avoid them by all means.

    So, somebody in MS must know how to read CSV files correctly.
    Or  - knew at some point in the past.
    But the source code of Excel97 must be available somewhere for the new generatiuon of geniuses to copy-paste.

    _____________
    Code for TallyGenerator

  • Just tried to load CSV file with freaky quoted values into LibreOffice Calc - works perfectly.
    MS could contract those programmers...

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply