Retrieve data between 2 '.'

  • so i went for the SQL CLR just to see if i could get it to work (been on a learning kick right now so im going with it) and here are my results.

    Duration for CHARINDEX = 00:00:21:963

    Duration for LEN = 00:00:16:217

    Duration for CHARINDEX with COLLATE = 00:00:10:260

    Duration for PARSENAME = 00:00:11:043

    Duration for LIKE = 00:00:11:293

    Duration for LIKE with Collate = 00:00:10:707

    Duration for SQLCLR = 00:00:14:427

    What i added to the run

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE dbo.RegexCLR(ISNULL(yourData,'')) = 1

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for SQLCLR = %s',0,1,@Duration) WITH NOWAIT;

    and the code for the SQLCLR

    public partial class UserDefinedFunctions

    {

    [Microsoft.SqlServer.Server.SqlFunction()]

    public static bool RegexCLR(string input)

    {

    return Regex.IsMatch(input,@"^[^\.]+\.[^\.]+$");

    }

    };

    of course when we change the NTEXT to NVARCHAR(MAX) i get the following:

    Duration for CHARINDEX = 00:00:06:030

    Duration for LEN = 00:00:03:253

    Duration for CHARINDEX with COLLATE = 00:00:00:943

    Duration for PARSENAME = 00:00:01:147

    Duration for LIKE = 00:00:07:253

    Duration for LIKE with Collate = 00:00:01:353

    Duration for SQLCLR = 00:00:04:040

    they are all really close over 1 million records.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Try the Parsename function. Not what is was made for, but works quite well for what you need and is much faster that CharIndex and Len.

    Duration for CHARINDEX = 00:00:00:933

    Duration for Len = 00:00:00:607

    Duration for PARSENAME = 00:00:00:127

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    --1,000,000 Random rows of data

    SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS varchar(50)) AS yourData

    INTO #testEnvironment

    FROM (SELECT TOP 1000000

    REPLICATE(

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(NEWID(),'-','')

    ,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',

    (ABS(CHECKSUM(NEWID())) % 2) + 2)

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    )a(yourData);

    --select * from #testEnvironment where PARSENAME(yourData,3) is null --00:00:48:00

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;

    SELECT @StartTime = SYSDATETIME();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for Len = %s',0,1,@Duration) WITH NOWAIT;

    SELECT @StartTime = SYSDATETIME();

    select @HOLDER = ID from #testEnvironment where PARSENAME(yourData,3) is null

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for PARSENAME = %s',0,1,@Duration) WITH NOWAIT;

  • jdfletchr (10/26/2012)


    Try the Parsename function. Not what is was made for, but works quite well for what you need and is much faster that CharIndex and Len.

    Did you read the thread with the tests made and the observation on a solultion that was the same as yours?

    Without an extra condition, your PARSENAME solution will throw incorrect results.

    By the way, another problem is we're dealing with an ntext column.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • ...

    they are all really close over 1 million records

    ...

    If you want your CLR with Regex to perform well you need to declare your Regex object as static and use Compile option for the pattern. Try changing your CLR to this:

    public partial class UserDefinedFunctions

    {

    static readonly Regex _regex = new Regex(@"^[^\.]+\.[^\.]+$", RegexOptions.Compiled);

    [Microsoft.SqlServer.Server.SqlFunction()]

    public static bool RegexCLR(string input)

    {

    return Regex.IsMatch(input);

    }

    };

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (10/26/2012)


    ...

    they are all really close over 1 million records

    ...

    If you want your CLR with Regex to perform well you need to declare your Regex object as static and use Compile option for the pattern. Try changing your CLR to this:

    public partial class UserDefinedFunctions

    {

    static readonly Regex _regex = new Regex(@"^[^\.]+\.[^\.]+$", RegexOptions.Compiled);

    [Microsoft.SqlServer.Server.SqlFunction()]

    public static bool RegexCLR(string input)

    {

    return Regex.IsMatch(input);

    }

    };

    Thanks for the tip on the regex im new to C# but chose to learn that language specifically for CLR's. after the code change and rerunning the tests this is what i got with NTEXT:

    Duration for CHARINDEX = 00:00:14:957

    Duration for LEN = 00:00:15:723

    Duration for CHARINDEX with COLLATE = 00:00:09:790

    Duration for PARSENAME = 00:00:10:717

    Duration for LIKE = 00:00:11:087

    Duration for LIKE with COLLATE = 00:00:10:683

    Duration for SQLCLR = 00:00:11:427

    and now for NVARCHAR(MAX)

    Duration for CHARINDEX = 00:00:06:253

    Duration for LEN = 00:00:03:200

    Duration for CHARINDEX with COLLATE = 00:00:01:033

    Duration for PARSENAME = 00:00:01:327

    Duration for LIKE = 00:00:07:237

    Duration for LIKE with COLLATE = 00:00:01:457

    Duration for SQLCLR = 00:00:01:953

    Im actually supprised that by changing the datatype we can chop a factor of 10 off the execution times. never really saw the direct impact of data types like this before.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply