Are the posted questions getting worse?

  • Steve Jones - Editor (4/13/2009)


    OK, I'm lost again. I'll pick up the Flo story when someone posts a link.

    Here's the link:

    http://www.sqlservercentral.com/Forums/FindPost695508.aspx

    Edit: link added to url.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • RBarryYoung (4/13/2009)


    Jeff Moden (4/13/2009)


    Heh... crud... I just sent Tony an email saying that I may have to cancel the book. Flo just beat the hell out of a Tally table split using a While Loop, no less. I'm still doing some testing but things aren't looking real good for the Tally table on this one. With the possible exception of the "quirky update", not looking so hot for pseudo-cursors either.

    BWAA-HAAA... I may also have to learn "C". She also just wrote a CLR that does a dandy split... 6 times faster than even her old/new SQL method. 'Bout time someone actually took the time to do that.

    Hmmm... maybe it's finally time to retire. 🙂

    Do you have a pointer to that thread Jeff?

    http://www.sqlservercentral.com/Forums/FindPost695508.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CLR does a faster running total than the quirky update too.

    As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/13/2009)


    CLR does a faster running total than the quirky update too.

    As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.

    I don't know how to program in "C" (yet... gotta learn it for "self defense" 😉 )... I sure do wish someone would post some code and some test results for the running total example.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/13/2009)


    GSquared (4/13/2009)


    CLR does a faster running total than the quirky update too.

    As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.

    I don't know how to program in "C" (yet... gotta learn it for "self defense" 😉 )... I sure do wish someone would post some code and some test results for the running total example.

    Well, I'm not sure I'd want to tackle this in C either. I haven't coded in C in quite some time, and never enough to be good at it.

    I found a sample CLR version here. The guy's an idiot when it comes to T-SQL, and it shows, but the CLR proc should be an okay place to start.

    A version that I tested before was much simpler, and in VB.NET, not C#. I don't have that available, but I'll see if I can reconstruct it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Jeff Moden (4/13/2009)


    RBarryYoung (4/13/2009)


    Jeff Moden (4/13/2009)


    Heh... crud... I just sent Tony an email saying that I may have to cancel the book. Flo just beat the hell out of a Tally table split using a While Loop, no less. I'm still doing some testing but things aren't looking real good for the Tally table on this one. With the possible exception of the "quirky update", not looking so hot for pseudo-cursors either.

    BWAA-HAAA... I may also have to learn "C". She also just wrote a CLR that does a dandy split... 6 times faster than even her old/new SQL method. 'Bout time someone actually took the time to do that.

    Hmmm... maybe it's finally time to retire. 🙂

    Do you have a pointer to that thread Jeff?

    http://www.sqlservercentral.com/Forums/FindPost695508.aspx

    I think I found the reason why. If you check the code, the definition field is nvarchar(max). I made some changes to the code, and only selected those procedures where the definition was less than or equal to 4000 characters. then I ran two different versions of the code. Check it out yourself. Here is the code:

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE function [dbo].[ufn_Tally2](

    @pStartValue bigint = 1,

    @pEndValue bigint = 1000000,

    @pIncrement bigint = 1

    )

    returns table

    as

    return(

    with BaseNum (

    N

    ) as (

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1 union all

    select 1

    ),

    L1 (

    N

    ) as (

    select

    bn1.N

    from

    BaseNum bn1

    cross join BaseNum bn2

    cross join BaseNum bn3

    ),

    L2 (

    N

    ) as (

    select

    a1.N

    from

    L1 a1

    cross join L1 a2

    ),

    L3 (

    N

    ) as (

    select top ((abs(case when @pStartValue < @pEndValue

    then @pEndValue

    else @pStartValue

    end -

    case when @pStartValue < @pEndValue

    then @pStartValue

    else @pEndValue

    end))/abs(@pIncrement) + 1)

    a1.N

    from

    L2 a1

    cross join L2 a2

    ),

    Tally (

    N

    ) as (

    select

    row_number() over (order by a1.N)

    from

    L3 a1

    )

    select

    ((N - 1) * @pIncrement) + @pStartValue as N

    from

    Tally

    );

    create table #TestData(

    [name] sysname,

    [definition] nvarchar(4000),

    deflen int

    )

    ;

    go

    set statistics io on;

    set statistics time on;

    with TestData as (

    SELECT TOP(200)

    o.name,

    m.definition,

    len(m.definition) as deflen

    FROM

    master.sys.all_objects o

    INNER JOIN master.sys.all_sql_modules m

    ON (o.object_id = m.object_id)

    WHERE

    [type] = 'P'

    and len(m.definition) <= 4000

    )

    insert into #TestData

    select

    [name],

    cast(definition as nvarchar(4000)),

    deflen

    from

    TestData

    ;

    go

    DECLARE @result TABLE (line nvarchar(max));

    set statistics io on;

    set statistics time on;

    with Lines

    (

    Item,

    ProcName,

    Line

    )

    as

    (

    select

    row_number()

    over

    (partition by

    td.name

    order by

    td.name

    ),

    td.name,

    substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)

    from

    -- dbo.ufn_Tally2(1,10000,1) t

    -- dbo.ufn_Tally2(1,(select max(td1.deflen) from TestData td1),1) t

    -- dbo.Tally t

    -- cross join TestData td

    #TestData td

    cross apply dbo.ufn_Tally2(1,td.deflen,1) t

    where

    t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))

    and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)

    )

    insert into @result

    select

    Line

    from

    Lines

    ;

    set statistics time off;

    set statistics io off;

    select

    *

    from

    @result;

    go

    DECLARE @result TABLE (line nvarchar(max));

    set statistics io on;

    set statistics time on;

    with Lines

    (

    Item,

    ProcName,

    Line

    )

    as

    (

    select

    row_number()

    over

    (partition by

    td.name

    order by

    td.name

    ),

    td.name,

    substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)

    from

    dbo.Tally t

    cross join #TestData td

    where

    t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))

    and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)

    )

    insert into @result

    select

    Line

    from

    Lines

    ;

    set statistics time off;

    set statistics io off;

    select

    *

    from

    @result;

    go

    select * from #TestData;

    go

    drop table #TestData;

    go

  • Lynn Pettis (4/13/2009)


    I think I found the reason why. If you check the code, the definition field is nvarchar(max). I made some changes to the code, and only selected those procedures where the definition was less than or equal to 4000 characters. then I ran two different versions of the code. Check it out yourself. Here is the code:

    I'll give the code a try right now and sure do appreciate the help, but would you mind summarizing what you found? I'm thinking that limiting something to 4k probably isn't what most people had in mind for a split in 2k5. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... I give up... the Tally table part of the code took 22 seconds to do the split. I stopped the function table split after 8 minutes.

    I do see that the Tally table solution you have creates a working table with half a million rows in it. Is that what you were trying to show?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I guess I didn't do a good job of explaining what I found. I rewrite the code some, and it is posted below along with the statistics time and io from a run on my system here at home.

    If you notice the second run ran much longer than the first. The difference being the data type of the difference column. In the first, it is limited to data the is 4000 characters or shorter. The second using the nvarchar(max) data type.

    This is what I think is causing the difference in the Tally table version of the split. The blob column causes more work for SQL Server.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table '#2B3F6F97'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 200, logical reads 1201, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestData1__________________________________________________________________________________________________________00000000002A'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 11157 ms, elapsed time = 11973 ms.

    (10468 row(s) affected)

    (10468 row(s) affected)

    Table '#2C3393D0'. Scan count 0, logical reads 14647, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 200, logical reads 1472, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestData2__________________________________________________________________________________________________________00000000002B'. Scan count 1, logical reads 93, physical reads 0, read-ahead reads 0, lob logical reads 29709670, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 524860 ms, elapsed time = 533627 ms.

    [/qoute]

    create table #TestData1(

    [name] sysname,

    [definition] nvarchar(4000),

    deflen int

    )

    ;

    go

    create table #TestData2(

    [name] sysname,

    [definition] nvarchar(max),

    deflen int

    )

    ;

    go

    with TestData as (

    SELECT TOP(200)

    o.name,

    m.definition,

    len(m.definition) as deflen

    FROM

    master.sys.all_objects o

    INNER JOIN master.sys.all_sql_modules m

    ON (o.object_id = m.object_id)

    WHERE

    [type] = 'P'

    and len(m.definition) <= 4000

    )

    insert into #TestData1

    select

    [name],

    cast(definition as nvarchar(4000)),

    deflen

    from

    TestData

    ;

    go

    with TestData as (

    SELECT TOP(200)

    o.name,

    m.definition,

    len(m.definition) as deflen

    FROM

    master.sys.all_objects o

    INNER JOIN master.sys.all_sql_modules m

    ON (o.object_id = m.object_id)

    WHERE

    [type] = 'P'

    )

    insert into #TestData2

    select

    [name],

    cast(definition as nvarchar(4000)),

    deflen

    from

    TestData

    ;

    go

    DECLARE @result TABLE (line nvarchar(max));

    set statistics io on;

    set statistics time on;

    with Lines

    (

    Item,

    ProcName,

    Line

    )

    as

    (

    select

    row_number() over (partition by td.name order by td.name),

    td.name,

    substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)

    from

    dbo.Tally t

    cross join #TestData1 td

    where

    t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))

    and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)

    )

    insert into @result

    select

    Line

    from

    Lines

    ;

    set statistics time off;

    set statistics io off;

    select

    *

    from

    @result;

    go

    DECLARE @result TABLE (line nvarchar(max));

    set statistics io on;

    set statistics time on;

    with Lines

    (

    Item,

    ProcName,

    Line

    )

    as

    (

    select

    row_number() over (partition by td.name order by td.name),

    td.name,

    substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)

    from

    dbo.Tally t

    cross join #TestData2 td

    where

    t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))

    and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)

    )

    insert into @result

    select

    Line

    from

    Lines

    ;

    set statistics time off;

    set statistics io off;

    select

    *

    from

    @result;

    go

    select * from #TestData1;

    go

    drop table #TestData1;

    go

    select * from #TestData2;

    go

    drop table #TestData2;

    go

  • I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:

    Table '#2F10007B'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 200, logical reads 1201, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestData1__________________________________________________________________________________________________________00000000002C'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 11187 ms, elapsed time = 11275 ms.

    (10468 row(s) affected)

    (10468 row(s) affected)

    Table '#300424B4'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 200, logical reads 1201, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestData2__________________________________________________________________________________________________________00000000002D'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 204765 ms, elapsed time = 206609 ms.

    (10468 row(s) affected)

    (10468 row(s) affected)

  • Lynn Pettis (4/13/2009)


    I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:

    Not sure what you mean by you "kept the data the same in both tables this time". Why wouldn't you use the same data for each test?

    Anyway, it's been known for a while that many solutions (not just Tally table splits) have some performance issues with the MAX datatypes. With that in mind, what are your conclusions about your latest test?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/14/2009)


    Lynn Pettis (4/13/2009)


    I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:

    Not sure what you mean by you "kept the data the same in both tables this time". Why wouldn't you use the same data for each test?

    Anyway, it's been known for a while that many solutions (not just Tally table splits) have some performance issues with the MAX datatypes. With that in mind, what are your conclusions about your latest test?

    The first time, I allowed any length data to be allowed in the definition field in the table defined as nvarchar(max). Realized after the run that this is apples to oranges, but still worthwhile to post the stats for the run. If you compare the times between the two runs where the data could exceed 4000 characters (nchar's nvarchar(max)) with the run time where the data was held to <= 4000 characters, the second run was faster, but still significantly slower than when the data type of the column was declared nvarchar(4000).

    This tells me that when you move to varchar(max)/nvarchar(max) data types, you really need to look closer at how you are doing things. Is this point where a CLR procedure begins to make more sense? Is a while loop more efficient? The old adage, test, test, and test some more.

  • I have a lot of catching up to do! :w00t:

    Just want to say how great it is to see so much hard discussion and comparison going on in so many threads on here. It's like attending a black belt seminar just watching you all work.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • This tells me that when you move to varchar(max)/nvarchar(max) data types, you really need to look closer at how you are doing things

    And since CLR (.NET) strings are unicode, I wonder how this would affect any possible solution.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • <Sigh>

    http://www.sqlservercentral.com/Forums/Topic696113-92-1.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 3,361 through 3,375 (of 66,742 total)

You must be logged in to reply to this topic. Login to reply