REPLACE Multiple Spaces with One

  • Paul White (11/19/2009)


    Paul White (11/19/2009)


    I must admit I am less than clear how sort order affects nested REPLACEs, but at least REPLACE is documented as being collation-aware so there is some sense to it.

    Of course - collation is important when comparing the string to replace. D'oh.

    Isn't it nice of me that I let you come to that on your own instead of pointing out what a dodo you were?? 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/19/2009)


    Isn't it nice of me that I let you come to that on your own instead of pointing out what a dodo you were?? 😀

    Oh yes thanks so much for that :laugh:

    I owe you one 😛

  • C# Screw (11/19/2009)


    Hi folks,

    Well this is the 'last post' from me but this is what things look like if there are no double spaces in the data at all:

    First, I have to profusely thank you for all the great testing you've done... it's like this thread had its own personal QA team. Between you and Paul and a couple of others, I don't believe that anything has been left uncovered in the realm of replacing multiple spaces.

    I do have 2 questions that I may have simply missed the answer to... what did you use to measure the times with and are they CPU times or duration?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (11/19/2009)


    ...my timing results have always been CPU-only. I will admit to being lazy and using SET STATISTICS TIME instead of total_worker_time / execution_count from sys.dm_exec_query_stats or Profiler which I normally prefer.

    Just in case you missed this Jeff (I know your question was directed at C# screw, just being sure).

  • Connect item created for the unexpectedly slow performance with Windows collations.

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=512459

    Please vote for it!

    Thanks

    Paul

  • Paul White (11/18/2009)


    Hey Jeff,

    The performance of the code in the article (unless I have really messed up) seems very dependent on collation. For a long time yesterday, I couldn't understand why your routine was so slow on my machine.

    Holy Moly... I just got a little time to play with some of the things on this thread. This fact is one of the most amazing. Thanks, Paul.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (11/19/2009)


    Paul White (11/19/2009)


    ...my timing results have always been CPU-only. I will admit to being lazy and using SET STATISTICS TIME instead of total_worker_time / execution_count from sys.dm_exec_query_stats or Profiler which I normally prefer.

    Just in case you missed this Jeff (I know your question was directed at C# screw, just being sure).

    Thanks, Paul. Your statement was what made me as C# Screw the question... I sure hope he comes back to answer because he ran all of the tests on a common machine... makes it all very important.

    Hmmmmm.... I also wonder which collation he was using...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... last but not least... if you want to see all of the T-SQL code example really start dragging their feet, just change the datatype of the column being cleaned up to VARCHAR(MAX). Add the wrong collation on top of that and you can take a smoke break while it runs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/19/2009)


    Heh... last but not least... if you want to see all of the T-SQL code example really start dragging their feet, just change the datatype of the column being cleaned up to VARCHAR(MAX). Add the wrong collation on top of that and you can take a smoke break while it runs.

    In fairness, the CLR routines start to make sucking noises when we have to start passing MAX data types around too (limit is 4000 Unicode characters remember). Passing MAX parameters to CLR functions also has the inconvenient side-effect of waving bye-bye to parallelism...

    From the test results, I would guess C# screw was using a SQL collation. Be good to get confirmation though.

  • Regarding this issue of collate, I have also run into a situation where, based on the database I'm in, Jeff's sample code from the article simply doesn't work, i.e. the extra spaces are NOT removed.

    There is definitely a problem with this on SQL Server 2000.

    SQL Server 2005 seems muche better behaved.

    Will look into this.

  • Morning:-)

    I expect you all asleep your side of the Atlantic, but any ways I have a question ...

    Currently I am capturing elapsed time like this :

    DECLARE @StartTime DATETIME

    SELECT @StartTime= GETDATE()

    UPDATE #TEMP2 SET COL3 = dbo.fn_CleanUp_Recursion_Brigzy(COL2)

    FROM #TEMP2

    INSERT INTO #tResults(Tag,Result)VALUES ('SQL function: C#Screw : Recursive',DATEDIFF(ms,@StartTime,GETDATE()))

    Go 10

    I am doing this for every sample submitted on the thread.

    At the end I then query the #tResults table giving us out leader board.

    But I am wondering how to capture the CPU time you are looking for?

    I can see that SET STATISTICS TIME will output to the results pane, but is there any way I can capture it in T-SQL and insert into our leader board ....

    🙂

  • C#Screw,

    I use your method. But knowing that this not always reliable, I generally run a test several times until it 'settles down' to a pretty constant value. IMO it's the most important measurement, i.e. how long an end user has to wait for something. The rest is just statistics.

  • Hey C# Screw!

    So not your last post after all then 😉

    Other side of the Pacific, if you don't mind!

    For an example of how to capture CPU time (worker time) forgive me for referencing one of my own old posts: http://www.sqlservercentral.com/Forums/FindPost764706.aspx - I'm just too lazy to type it out. It essentially just uses the sys.dm_exec_query_stats DMV.

    One can also use Profiler of course.

    Paul

  • Ok

    Paul's stats query is awesome - am sorry about the Pacific thing, I just assume I am the only one here in Blighty and U stars are all in Silicon Valley somewhere 🙂

    I need a bit of help with just a couple questions -

    1/I am not sure the best way to post a results table to the forum?

    2/Problem I have at the moment is for some reason I get two entries for each run :

    run_count,total_cpu_time_µs,total_logical_reads,total_elapsed_time_µs,avg_cpu_time_µs,avg_logical_reads,avg_elapsed_time_µs

    SQL function: Jeff Original : single bell char',1029604694422875296086942960469422872960869

    SQL function: Jeff Original : single bell char',1030011030013001300

    SELECT --query = QT.[text],

    SUBSTRING(QT.[text],CHARINDEX('VALUES',text)+8,(CHARINDEX('DATEDIFF(',text)-CHARINDEX('VALUES(',text))-8),

    --execution_plan = QP.query_plan,

    run_count = QS.execution_count,

    total_cpu_time_µs = QS.total_worker_time,

    total_logical_reads = QS.total_logical_reads,

    total_elapsed_time_µs = QS.total_elapsed_time,

    avg_cpu_time_µs = QS.total_worker_time / QS.execution_count,

    avg_logical_reads = QS.total_logical_reads / QS.execution_count,

    avg_elapsed_time_µs = QS.total_elapsed_time / QS.execution_count

    FROM sys.dm_exec_query_stats QS

    CROSS

    APPLY sys.dm_exec_sql_text (QS.[sql_handle]) QT

    CROSS

    APPLY sys.dm_exec_query_plan (QS.[plan_handle]) QP

    WHERE QT.[text] LIKE '%INSERT INTO #tResults%'

    AND QT.[text] NOT LIKE '%dm_exec_query_stats%'

    ORDER BY

    QS.last_execution_time ASC;

    --QS.total_elapsed_time / QS.execution_count DESC

  • On the second point:

    There are a couple of reasons for getting more than one row (allow me to gloss over most of them for a moment).

    Usually though it is because you get one row for each statement in the batch.

    The sys.dm_exec_query_stats DMV provides 'statement_start_offset' and 'statement_end_offset' to pick the statement text out from the batch text returned (see Books Online!) so you don't need that CHARINDEXing stuff.

    Please do look at it in BOL because it provides code to account for statement_end_offset returning -1 to represent the end of the string.

    There is also an example early in this blog entry: http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/how-to-get-high-quality-information-about-query-performance.aspx. That's a good one to read because it highlights some of the limitations of this method too!

    In the example I pointed you at earlier, I had arranged the test so that there was only one statement per batch. Putting GO before and after each statement to monitor ensured that.

    Finally, notice that DBCC FREEPROCCACHE is required to clear the server procedure cache before each test run.

    Paul

Viewing 15 posts - 181 through 195 (of 425 total)

You must be logged in to reply to this topic. Login to reply