REPLACE Multiple Spaces with One

  • C# Screw (11/18/2009)


    Oh look ! I have become a ValuedMember :blush:

    Screw

    And pretty much the entire value from just one thread too, eh?? Glad to have you onboard!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks SQLGuru

    Managed to reduce Pauls code by just a couple lines, and added support for TABS

    (hope u don't mind Paul)

    Night all

    [SqlFunction

    (

    DataAccess = DataAccessKind.None,

    SystemDataAccess = SystemDataAccessKind.None,

    IsDeterministic = true,

    IsPrecise = true

    )

    ]

    [return: SqlFacet(MaxSize = 4000)]

    public static SqlString fn_TidySpace_and_TABS_PW_CLR

    (

    [SqlFacet(IsNullable = false, MaxSize = 4000)] SqlChars Input

    )

    {

    const char SPACE = ' ';

    const char TAB = '\t';

    char[] input = Input.Buffer; // Reference to the internal char[] buffer of the SqlChars input

    int length = input.Length; // Remember the length of the original

    int writePos = 0; // Write position

    for (int readPos = 0; readPos < length; readPos++)

    {

    input[writePos] = input[readPos];

    // Skip further Spaces and TABS

    if (input[readPos] == SPACE||input[readPos] == TAB)

    {

    while (readPos < length && (input[readPos] == SPACE || input[readPos] == TAB)) { readPos++;}

    // Hold your horses

    readPos--;

    }

    writePos++;

    }

    // Return a string constructed from the part of

    // the buffer we actually wrote to

    return new SqlString(new string(input, 0, writePos));

    }

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2667

    SQL function :Brigzy (C#Screw) Looping

    2425

    SQL function: Jeff Original : single bell char

    1867

    SQL function: Michael Meierruth : with hard coded spaces

    343

    CLR: Brigzy (C#Screw) Looping

    328

    SQL function: Michael Meierruth III - extra replace

    308

    SQL function: Michael Meierruth Original

    306

    CLR: Not looping using ||*9*9|| technique in C#

    285

    CLR: using Jeff's single char technique in C#

    262

    CLR: C#Screw II - Spaces and TABS

    184

    CLR: Flo - Spaces and TABS - v.compact

    177

    CLR :using JCBnew space replacer C#

    174

    CLR: Michael Meierruth - C#Screw conversion: with extra Replace

    153

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    146

    CLR: using Flo s space replacer C#

    119

    CLR: using Paul White space replacer C#

    115

    CLR: Paul White's function - reduced by a few lines and Added Support TABS

    110

  • Had a thought to reduce a bit more:

    [SqlFunction

    (

    DataAccess = DataAccessKind.None,

    SystemDataAccess = SystemDataAccessKind.None,

    IsDeterministic = true,

    IsPrecise = true

    )

    ]

    [return: SqlFacet(MaxSize = 4000)]

    public static SqlString fn_TidySpace_and_TABS_SharpScrew_CLR

    (

    [SqlFacet(IsNullable = false, MaxSize = 4000)] SqlChars Input

    )

    {

    const char SPACE = ' ';

    const char TAB = '\t';

    char[] input = Input.Buffer; // Reference to the internal char[] buffer of the SqlChars input

    int length = input.Length; // Remember the length of the original

    int writePos = 0;

    char curChar;

    for (int readPos = 0; readPos < length; readPos++)

    {

    // Assign current char so we don't have to reference buffer as much

    curChar = input[readPos];

    input[writePos] = curChar;

    // Skip Spaces and TABS

    if (curChar == SPACE || curChar == TAB)

    {

    do { readPos++; if (readPos < length)curChar = input[readPos]; }

    while (readPos < length && (curChar == SPACE || curChar == TAB));

    // Holdyour horses - back step one

    readPos--;

    }

    writePos++;

    }

    // Return a string constructed from the part of

    // the buffer we actually wrote to

    return new SqlString(new string(input, 0, writePos));

    }

  • C# Screw (11/18/2009)


    Managed to reduce Pauls code by just a couple lines, and added support for TABS (hope you don't mind Paul)

    Not at all 🙂

    Looks all good - I'll load it up at home later.

    Paul

  • Paul White (11/18/2009)


    Mark-101232 (11/18/2009)


    Just out of interest I converted your code to use pointers which improved performance by 5-10%. Not that I would recommend using UNSAFE code though...

    Hey Mark,

    Wouldn't mind seeing that code if you have time to post it - I considered using pointers but ran out of talent! Same goes for a C++ implementation. The pointer thing should be faster, probably more than 5-10%, but IIRC unmanaged code means the thread has to go pre-emptive rather than co-operative, so that probably slows it down a bit overall...? Anyway, would be nice to see the code 🙂

    Paul

    Hi Paul, this was a pretty dumb translation, just changing [readPos] into *readPos

    ***BTW, using UNSAFE code is NOT RECOMMENDED***

    You run the risk of corrupting your SQL Server instance, use at your peril!

    [SqlFunction

    (

    DataAccess = DataAccessKind.None,

    SystemDataAccess = SystemDataAccessKind.None,

    IsDeterministic = true,

    IsPrecise = true

    )

    ]

    [return: SqlFacet(MaxSize = 4000)]

    unsafe public static SqlString SpaceReplacerUnsafe

    (

    [SqlFacet(IsNullable = false, MaxSize = 4000)] SqlChars Input

    )

    {

    const char SPACE = ' ';

    bool skipping = false; // In space-skipping mode

    char[] input = Input.Buffer; // Reference to the internal char[] buffer of the SqlChars input

    int length = input.Length; // Remember the length of the original

    fixed (char* pInput = input)

    {

    char* writePos = pInput; // Initialise the writing position

    char* readPosEnd = pInput + length;

    // Loop through each character

    // scan = current scan position

    // write = current write position

    for (char* readPos = pInput; readPos < readPosEnd; readPos++)

    {

    // Skip spaces if we are in 'skip mode' (see later)

    if (skipping)

    {

    // Space skipping

    while (readPos < readPosEnd && *readPos == SPACE) { readPos++; }

    // Bail if we ran out of string

    if (readPos >= readPosEnd) { break; }

    }

    // If the write position is before the scan position,

    // the output string is no longer in sync with the input

    // (we must have skipped spaces at some point).

    // So, we need to copy the current character

    // from the read position to the write position

    if (writePos < readPos) { *writePos = *readPos; }

    // Advance the write position

    // (for all non-spaces, and for the first space in a sequence of spaces)

    writePos++;

    // Set or reset skip mode

    skipping = (*readPos == SPACE);

    }

    length = (int)(writePos - pInput);

    }

    // Return a string constructed from the part of

    // the buffer we actually wrote to

    return new SqlString(new string(input, 0, length));

    }

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Not sure how to get this into asembly :unsure:

    Error1CREATE ASSEMBLY for assembly 'RemoveSpace_CLR' failed because assembly 'RemoveSpace_CLR' failed verification. Check if the referenced assemblies are up-to-date and trusted (for external_access or unsafe) to execute in the database. CLR Verifier error messages if any will follow this message

    [ : UserDefinedFunctions::fn_TidySpace_PWSpaceReplacerUnsafe_CLR][mdToken=0x6000006][offset 0x0000002A][found address of Short] Expected numeric type on the stack.

    [ : UserDefinedFunctions::fn_TidySpace_PWSpaceReplacerUnsafe_CLR][mdToken=0x6000006][offset 0x0000001C][found Native Int][expected address of Short] Unexpected type on the stack.SpaceRemove_CLR

  • C# Screw (11/19/2009)


    Not sure how to get this into asembly :unsure:

    Error1CREATE ASSEMBLY for assembly 'RemoveSpace_CLR' failed because assembly 'RemoveSpace_CLR' failed verification. Check if the referenced assemblies are up-to-date and trusted (for external_access or unsafe) to execute in the database. CLR Verifier error messages if any will follow this message

    [ : UserDefinedFunctions::fn_TidySpace_PWSpaceReplacerUnsafe_CLR][mdToken=0x6000006][offset 0x0000002A][found address of Short] Expected numeric type on the stack.

    [ : UserDefinedFunctions::fn_TidySpace_PWSpaceReplacerUnsafe_CLR][mdToken=0x6000006][offset 0x0000001C][found Native Int][expected address of Short] Unexpected type on the stack.SpaceRemove_CLR

    You'll need the mark the assembly as UNSAFE when you build it.

    In Visual Studio, Properties->Database->Permission Level

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • We have a new leader !! Absolutely fab !! 😀

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2634

    SQL function :Brigzy (C#Screw) Looping

    2514

    SQL function: Jeff Original : single bell char

    1856

    CLR: Brigzy (C#Screw) Looping

    605

    SQL function: Michael Meierruth III - extra replace

    331

    SQL function: Michael Meierruth Original

    326

    SQL function: Michael Meierruth : with hard coded spaces

    309

    CLR: Not looping using ||*9*9|| technique in C#

    283

    CLR: using Jeff's single char technique in C#

    245

    CLR: C#Screw II - Spaces and TABS

    182

    CLR :using JCBnew space replacer C#

    171

    CLR: Flo - Spaces and TABS - v.compact

    165

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    151

    CLR: Michael Meierruth - C#Screw conversion: with extra Replace

    149

    CLR: using Paul White space replacer C#

    110

    CLR: using Flo s space replacer C#

    108

    CLR: C#Screw : Slightly modified/reduced Paul White's function and Added Support TABS

    106

    CLR: Paul White's function with Pointers by Mark-101232

    100:w00t:

  • C# Screw (11/19/2009)


    We have a new leader !! Absolutely fab !! 😀

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2634

    SQL function :Brigzy (C#Screw) Looping

    2514

    SQL function: Jeff Original : single bell char

    1856

    CLR: Brigzy (C#Screw) Looping

    605

    SQL function: Michael Meierruth III - extra replace

    331

    SQL function: Michael Meierruth Original

    326

    SQL function: Michael Meierruth : with hard coded spaces

    309

    CLR: Not looping using ||*9*9|| technique in C#

    283

    CLR: using Jeff's single char technique in C#

    245

    CLR: C#Screw II - Spaces and TABS

    182

    CLR :using JCBnew space replacer C#

    171

    CLR: Flo - Spaces and TABS - v.compact

    165

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    151

    CLR: Michael Meierruth - C#Screw conversion: with extra Replace

    149

    CLR: using Paul White space replacer C#

    110

    CLR: using Flo s space replacer C#

    108

    CLR: C#Screw : Slightly modified/(maybe simplified?) Paul White's function and Added Support TABS

    106

    CLR: Paul White's function with Pointers by Mark-101232

    100:w00t:

    You should also see an improvement to fn_TidySpace_and_TABS_SharpScrew_CLR using pointers. I don't have time to do this at the moment, but it's not difficult.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thank you Mark - a simple translation was exactly what I had hoped for. I've been meaning to look at this stuff for ever. Now's my chance!

    Just to be clear about this (to the wider audience) there is *no way* I would ever use the unsafe pointer stuff with SQL Server. This is just a technical learning exercise for me, on my personal, disposable, SQL Server.

    The (safe) managed code equivalent is pretty darn near as fast, especially compared to the non-NET solutions. Using pointers and unsafe permission set assemblies is a pretty good way to put your SQL Server at serious risk.

    NEVER USE POINTERS IN .NET CODE WITH SQL SERVER. That's my advice 🙂

    I'm not sure we should even include it in the performance results. Not because it wins!!! ...because the casual observer may be tempted to copy & paste the 'best' solution...

  • Hi Paul

    Does this particular function with pointers put SQL server at risk?

    Does the use of pointers create a memory overwrite issue?

    Just curious, I have learnt so much in this thread I might as well learn a bit more 🙂

  • C# Screw (11/19/2009)


    Hi Paul

    Does this particular function with pointers put SQL server at risk?

    Does the use of pointers create a memory overwrite issue?

    As long as there are unchecked memory accesses, then yes to both.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • C# Screw (11/19/2009)


    Hi Paul

    Does this particular function with pointers put SQL server at risk?

    Does the use of pointers create a memory overwrite issue?

    Just curious, I have learnt so much in this thread I might as well learn a bit more 🙂

    There's a whole load of protection built into SQLOS for the .NET hosting, but UNSAFE assemblies and pointer manipulation kinda skirt past that good stuff, so I guess the answer has to be 'yes' - at least in principle. See http://msdn.microsoft.com/en-us/library/ms345101.aspx and linked pages for the scary details.

    None of that is aimed at Mark in any way of course - he just obliged me when I asked to see the code. Ok so perhaps I should have asked for it by mail or PM instead 🙂

    Oops!

  • Just to draw breath and summarize a bit:

    The fastest CLR solutions appear to be almost 20x at least 3x faster than the fastest T-SQL implementation so far. I think that's pretty amazing, considering that the .NET routines have to convert every row to Unicode first, then pass the (fairly large) data to and from the hosted CLR.

    My guess is that those overheads are the dominant factor: the time spent actually executing CLR code is probably pretty small in comparison. So, to an extent, any reasonable CLR implementation will do (even the nested string replace!) in all but the most extreme cases.

    I'm not taking anything away from the variously clever and ingenious CLR approaches, I just want to make the point that, as far as outright performance goes, a sense of perspective is useful. It's always good to know when to stop optimizing!

    Paul

  • Paul White (11/19/2009)


    good to know when to stop optimizing!

    Ok ..... might be nice to have a pure T-SQL solution to remove multiple spaces AND multiple tabs - just to complete the vista :ermm:

Viewing 15 posts - 151 through 165 (of 425 total)

You must be logged in to reply to this topic. Login to reply