Reaping the benefits of the Window functions in T-SQL

  • Eirikur Eiriksson - Friday, November 2, 2018 8:08 AM

    Lynn Pettis - Friday, November 2, 2018 7:52 AM

    Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

    The reason is that it using a binary collations is more efficient than using a language specific collations.
    😎

    Out of curiosity have you tested both BIN and BIN2?

  • Lynn Pettis - Friday, November 2, 2018 10:32 AM

    Eirikur Eiriksson - Friday, November 2, 2018 8:08 AM

    Lynn Pettis - Friday, November 2, 2018 7:52 AM

    Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

    The reason is that it using a binary collations is more efficient than using a language specific collations.
    😎

    Out of curiosity have you tested both BIN and BIN2?

    Quite certain I did, cannot remember the details, will look into my notes and check.
    😎

  • Lynn Pettis - Friday, November 2, 2018 10:32 AM

    Eirikur Eiriksson - Friday, November 2, 2018 8:08 AM

    Lynn Pettis - Friday, November 2, 2018 7:52 AM

    Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

    The reason is that it using a binary collations is more efficient than using a language specific collations.
    😎

    Out of curiosity have you tested both BIN and BIN2?

    Haven't looked.  What's the difference between the two?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, November 5, 2018 5:34 AM

    Lynn Pettis - Friday, November 2, 2018 10:32 AM

    Eirikur Eiriksson - Friday, November 2, 2018 8:08 AM

    Lynn Pettis - Friday, November 2, 2018 7:52 AM

    Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

    The reason is that it using a binary collations is more efficient than using a language specific collations.
    😎

    Out of curiosity have you tested both BIN and BIN2?

    Haven't looked.  What's the difference between the two?

    Don't have time to search the Microsoft documentation at the moment as I am getting ready to take my dad to an outpatient procedure but one starts with code-point then binary (BIN I believe) the other is pure binary (BIN2).  You will need to check that out if I don't get the chance when I finally get to work.

  • I thought you knew since you brought it up.

    Did some research and found out the BIN2 stuff uses "Code Point Comparison Semantics" and the plain BIN stuff is backwards compatible brute force binary.  The former appears to be necessary for some characters in Unicode.  It also took me a bit to figure out that a "Code Point" is the byte(s) behind the scenes to represent a character.

    Hopefully anyone that needs the BIN2 stuff will do their own performance testing because it's not likely that I'll ever need that for splitting delimited strings.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The only noticeable difference between BIN and BIN2 is when you sort them.

  • Jeff Moden - Monday, November 5, 2018 8:54 AM

    I thought you knew since you brought it up.

    Did some research and found out the BIN2 stuff uses "Code Point Comparison Semantics" and the plain BIN stuff is backwards compatible brute force binary.  The former appears to be necessary for some characters in Unicode.  It also took me a bit to figure out that a "Code Point" is the byte(s) behind the scenes to represent a character.

    Hopefully anyone that needs the BIN2 stuff will do their own performance testing because it's not likely that I'll ever need that for splitting delimited strings.

    I used to know but when you aren't using the information you tend to forget.  That is why I find the Microsoft documentation to be my friend when I have to refresh my memory.  I just didn't have time this morning to dig in to answer your question.

  • Lynn Pettis - Monday, November 5, 2018 10:29 AM

    Jeff Moden - Monday, November 5, 2018 8:54 AM

    I thought you knew since you brought it up.

    Did some research and found out the BIN2 stuff uses "Code Point Comparison Semantics" and the plain BIN stuff is backwards compatible brute force binary.  The former appears to be necessary for some characters in Unicode.  It also took me a bit to figure out that a "Code Point" is the byte(s) behind the scenes to represent a character.

    Hopefully anyone that needs the BIN2 stuff will do their own performance testing because it's not likely that I'll ever need that for splitting delimited strings.

    I used to know but when you aren't using the information you tend to forget.  That is why I find the Microsoft documentation to be my friend when I have to refresh my memory.  I just didn't have time this morning to dig in to answer your question.

    Agreed on documentation being a friend.  MS didn't have much to say about what a "Code Point" was, though.  I got "lucky" with a WIKI-Pedia article on the subject.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There is a little problem with "quoted split" function (the one I spotted so far) - it does not handle "spaces only" strings well. 
    It returns NULLs instead. Even even empty strings are in quotes

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, December 26, 2018 6:13 PM

    There is a little problem with "quoted split" function (the one I spotted so far) - it does not handle "spaces only" strings well. 
    It returns NULLs instead. Even even empty strings are in quotes

    Thanks for the heads-up, hadn't noticed this before.
    😎

  • Another case where the function does not do what I'd expect it to do:

    DECLARE @Text NVARCHAR(max), @Delimiter NVARCHAR (10), @Quote NCHAR (1)
    DECLARE @rc INT

    SET @Delimiter = ','
    SET @Quote = '"'
    set @Text = '"Row
    Num","Splitter
    Name","Number
    Of
    Rows","Number
    Of
    Elements","Min
    Element
    Length","Max
    Element
    Length","Duration","Min
    Length","Avg
    Length","Max
    Length"'
    SELECT *
    FROM dbo.SplitQuoted_EEriksson (@Text, @Delimiter, @Quote) C2

    Well, it was may be not intended for that, but still - line feeds within quoted fields are quite possible in Excel exports.

    _____________
    Code for TallyGenerator

  • Sergiy - Friday, December 28, 2018 4:23 PM

    Another case where the function does not do what I'd expect it to do:

    DECLARE @Text NVARCHAR(max), @Delimiter NVARCHAR (10), @Quote NCHAR (1)
    DECLARE @rc INT

    SET @Delimiter = ','
    SET @Quote = '"'
    set @Text = '"Row
    Num","Splitter
    Name","Number
    Of
    Rows","Number
    Of
    Elements","Min
    Element
    Length","Max
    Element
    Length","Duration","Min
    Length","Avg
    Length","Max
    Length"'
    SELECT *
    FROM dbo.SplitQuoted_EEriksson (@Text, @Delimiter, @Quote) C2

    Well, it was may be not intended for that, but still - line feeds within quoted fields are quite possible in Excel exports.

    Thanks Sergiy, will look into this when I have the time.
    😎
    Can you please post the code for the function dbo.SplitQuoted_EEriksson you are using?

  • The function does work with embedded line feeds but the problem is with the first character being a quote (") and also tossed in the empty quoted string for good measure.

    If you just change the @pString to start with a quoted field, the first field contains the quote

    SELECT @pString = '"100123","Blues","West Point, Mississippi","Burnett,'

    + '""Howlin'' Wolf"" Arthur Chester",1910-06-10,empty quoted spaces next," ",previous is null,but with a value," , ","works

    even with crLf"';

    Results

    1 "100123

    2 Blues

    3 West Point, Mississippi

    4 Burnett,"Howlin' Wolf" Arthur Chester

    5 1910-06-10

    6 empty quoted spaces next

    7 NULL

    8 previous is null

    9 but with a value

    10 ,

    11 works

    even with crLf

  • Just noticed this when I was putting this on a new instance, but in the .sql file the WITH starts with a leading ;, which actually breaks the DDL statement. Suppose that's another reason to educate people that ; isn't a "beginningator". 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    Just noticed this when I was putting this on a new instance, but in the .sql file the WITH starts with a leading ;, which actually breaks the DDL statement. Suppose that's another reason to educate people that ; isn't a "beginningator". 🙂

    Here we begin(ninator) again 😉

    😎

Viewing 15 posts - 31 through 45 (of 57 total)

You must be logged in to reply to this topic. Login to reply