Get month names ordered using recursion

  • ChrisM@home (3/24/2012)


    the sqlist (3/24/2012)


    ...I think we should not be worried about how many rows are actually generated...

    I couldn't disagree more. When you're looking at an overall time of 25ms per run of a splitting query, the difference between returning all of the rows from the row/number generator and returning exactly the number you need, is very significant indeed.

    Chris, you can disagree all you want. The numbers don't lie, the time spent is on rendering the data, that is build it and sending it to the client. That is why I ran a COUNT(*) on the 1M rows version and the numbers are, again:

    1M:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • the sqlist (3/24/2012)


    ChrisM@home (3/24/2012)


    the sqlist (3/24/2012)


    ...I think we should not be worried about how many rows are actually generated...

    I couldn't disagree more. When you're looking at an overall time of 25ms per run of a splitting query, the difference between returning all of the rows from the row/number generator and returning exactly the number you need, is very significant indeed.

    Chris, you can disagree all you want. The numbers don't lie, the time spent is on rendering the data, that is build it and sending it to the client. That is why I ran a COUNT(*) on the 1M rows version and the numbers are, again:

    1M:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    I guess I'll disagree as well. Using COUNT(*) on the rows isn't quite the same as using the rows for something practical. If you only need to use 10 rows, then generating the full 100,000 is a waste of clock cycles and time. Since the rows DO have to be "rendered" to be used, you have to "render" only what you need or performance can really suffer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/24/2012)


    the sqlist (3/24/2012)


    ChrisM@home (3/24/2012)


    the sqlist (3/24/2012)


    ...I think we should not be worried about how many rows are actually generated...

    I couldn't disagree more. When you're looking at an overall time of 25ms per run of a splitting query, the difference between returning all of the rows from the row/number generator and returning exactly the number you need, is very significant indeed.

    Chris, you can disagree all you want. The numbers don't lie, the time spent is on rendering the data, that is build it and sending it to the client. That is why I ran a COUNT(*) on the 1M rows version and the numbers are, again:

    1M:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    I guess I'll disagree as well. Using COUNT(*) on the rows isn't quite the same as using the rows for something practical. If you only need to use 10 rows, then generating the full 100,000 is a waste of clock cycles and time. Since the rows DO have to be "rendered" to be used, you have to "render" only what you need or performance can really suffer.

    Jeff, I know who you are and I respect you and your knowledge but I will post here what I said few posts above:

    "Sometimes it is better to keep the things simple in terms of solution and not to think about the effects at the deepest level. But again, sometimes."

    I think is wise to keep a border between doing something just to prove a point and a real life problem.

    The idea is that to over complicate the code in this case is not worthed and you know it. Give me some numbers and I will change my mind.

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

  • I'm not sure what you're speaking of. You already have the numbers that prove that if you calculate more than the number of rows you need, then you waste time and resources whether you use the code once or a million times.

    As for simple, how much more simple can you get than the code I offered?

    DECLARE @SomeNumber INT;

    SELECT @SomeNumber = ??? ; --Change the ??? to the number to test

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b) --100,000,000 rows

    SELECT TOP (@SomeNumber) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • the sqlist (3/24/2012)


    Jeff Moden (3/24/2012)


    the sqlist (3/24/2012)


    ChrisM@home (3/24/2012)


    the sqlist (3/24/2012)


    ...I think we should not be worried about how many rows are actually generated...

    I couldn't disagree more. When you're looking at an overall time of 25ms per run of a splitting query, the difference between returning all of the rows from the row/number generator and returning exactly the number you need, is very significant indeed.

    Chris, you can disagree all you want. The numbers don't lie, the time spent is on rendering the data, that is build it and sending it to the client. That is why I ran a COUNT(*) on the 1M rows version and the numbers are, again:

    1M:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 4 ms.

    I guess I'll disagree as well. Using COUNT(*) on the rows isn't quite the same as using the rows for something practical. If you only need to use 10 rows, then generating the full 100,000 is a waste of clock cycles and time. Since the rows DO have to be "rendered" to be used, you have to "render" only what you need or performance can really suffer.

    Jeff, I know who you are and I respect you and your knowledge but I will post here what I said few posts above:

    "Sometimes it is better to keep the things simple in terms of solution and not to think about the effects at the deepest level. But again, sometimes."

    I think is wise to keep a border between doing something just to prove a point and a real life problem.

    The idea is that to over complicate the code in this case is not worthed and you know it. Give me some numbers and I will change my mind.

    Ah... I think I've finally figured out what you're talking about and what you mean by "rendered". I absolutely agree that "rendering" to the screen is the "great equalizer" and that the performance of code should not be judged when returning results to the screen.

    HOWEVER, your COUNT(*) example actually does show that the number of rows created does have an impact on the performance. You just made the mistake of using a measurment tool that wasn't up to the task of measuring something that runs so quickly.

    Here's some test code. The first chunk of code generates 100,000 rows (pretending that only 10 were actually needed) and the second only generates 10 rows. All it does is generate the rows and return the count of each section of code.

    --===== 2005 Method, 100,000 rows ====================================================================================

    DECLARE @SomeNumber INT;

    SELECT @SomeNumber = 100000 ;

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 rows

    cteTally(N) AS (SELECT TOP (@SomeNumber) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    SELECT COUNT(*) FROM cteTally

    ;

    GO 5

    --===== 2005 Method, 10 rows =========================================================================================

    DECLARE @SomeNumber INT;

    SELECT @SomeNumber = 10 ;

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 rows

    cteTally(N) AS (SELECT TOP (@SomeNumber) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    SELECT COUNT(*) FROM cteTally

    ;

    GO 5

    Here are the results from that test that, because I have SQL Profiler running in the microsecond mode, clearly shows that the necessary "complication" of limiting rows in these types of number generators still matters. If you use the same code repetitively (perhaps hundreds of thousands of times in a single run like you might with a splitter function), the differences can really add up. On a 100,000 row run where the number generator would be used 100,000 times, the difference of only 4 milliseconds is a difference of 400 seconds or 6 minutes and 40 seconds. That kind of time savings is certainly worth the "complication".

    Of course, I have the word "complication" in quotes because it's just not that complicated. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/24/2012)


    ....

    Seems like a lot more work than necessary. Wouldn't a modification for 2008 of Itzik Ben-Gan's original code be a little more eye-ball friendly not to mention being a wee bit faster?

    WITH

    E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b) --100,000,000 rows

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8

    ;

    You're right Jeff, although it's faster than generating all rows, it's a "wee bit slower" than this:

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) n = n1+n2+n3+n4

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t2(n2)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t3(n3))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n1))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4(n4)

    - which should look familiar to you as the limiter for the row generator in the T1 splitter. I spent a while playing with this toy in the T1 test harness, comparing against the other splitters. On this rig and one more at work, the times were very similar between T1 and this method.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • the sqlist (3/24/2012)


    ...Which makes the issue more rhetorical than anything.

    Sometimes it is better to keep the things simple in terms of solution and not to think about the effects at the deepest level. But again, sometimes.

    It's not really a case of thinking of something at the deepest level - it's a case of finding the fastest tool to add to your toolkit. Developers are always looking to reuse code, we root around in whatever is already there and cherrypick code to do similar jobs. Choosing a slow row generator over a fast one for production code becomes a double whammy - the first job to use it, then every time a developer cherrypicks it for a new job.

    Now, imagine importing a bunch of csv files. There are 25 of them, each around 250k rows. The delimiters vary, pipe or comma depending on the file, and the line termination is any combination of CR, LF, or both - so Bulkcopy and Import won't work to resolve the columns out of the rows. You would face a similar problem if the column layout in the csv files was different.

    A good solution to this is to import the csv files into a staging table(s) as one column, then split the delimited string in that column using a TSQL splitter. Choosing the right tool for this is important, because you're no longer looking at 23 elements in a single string - you're looking at over 6 million executions of the splitter.

    The splitter based on the row generator I posted isn't overcomplicated. Here's a snapshot (a snapshot because I tinker with it from time to time):

    ALTER FUNCTION [dbo].[DelimitedSplit8K_CTE]

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    SELECT

    ItemNumber = 0,

    Item = LEFT(@pString, CHARINDEX(@pDelimiter,@pString+@pDelimiter,0)-1)

    UNION ALL

    SELECT

    ItemNumber = ROW_NUMBER() OVER(ORDER BY @@SPID),

    Item = SUBSTRING(@pString,1+n,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,1+n),0)-(1+n),8000))

    FROM (

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) n = n1+n2+n3+n4

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t3 (n3)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t1 (n1))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t2 (n2))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4 (n4)

    ) d

    WHERE SUBSTRING(@pString,n,1) = @pDelimiter

    4ms per execution adds an extra hour per million rows.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (3/25/2012)


    You're right Jeff, although it's faster than generating all rows, it's a "wee bit slower" than this:

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) n = n1+n2+n3+n4

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t2(n2)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t3(n3))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n1))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4(n4)

    - which should look familiar to you as the limiter for the row generator in the T1 splitter. I spent a while playing with this toy in the T1 test harness, comparing against the other splitters. On this rig and one more at work, the times were very similar between T1 and this method.

    Thanks, Chris. I'm always looking for different ways to make things run faster. Time for me to "play" with some code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/25/2012)


    ChrisM@home (3/25/2012)


    You're right Jeff, although it's faster than generating all rows, it's a "wee bit slower" than this:

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) n = n1+n2+n3+n4

    FROM (((VALUES (0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) t2(n2)

    CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t3(n3))

    CROSS JOIN (VALUES (0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) t1(n1))

    CROSS JOIN (VALUES (0),(1000),(2000),(3000),(4000),(5000),(6000),(7000),(8000),(9000)) t4(n4)

    - which should look familiar to you as the limiter for the row generator in the T1 splitter. I spent a while playing with this toy in the T1 test harness, comparing against the other splitters. On this rig and one more at work, the times were very similar between T1 and this method.

    Thanks, Chris. I'm always looking for different ways to make things run faster. Time for me to "play" with some code.

    Anytime, Jeff.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • --===== 2005 Method, 100,000 rows ====================================================================================

    DECLARE @SomeNumber INT;

    SELECT @SomeNumber = 100000 ;

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 rows

    cteTally(N) AS (SELECT TOP (@SomeNumber) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    SELECT COUNT(*) FROM cteTally

    ;

    GO 5

    --===== 2005 Method, 10 rows =========================================================================================

    DECLARE @SomeNumber INT;

    SELECT @SomeNumber = 10 ;

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), --10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 rows

    cteTally(N) AS (SELECT TOP (@SomeNumber) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    SELECT COUNT(*) FROM cteTally

    ;

    GO 5

    Here are the results from that test that, because I have SQL Profiler running in the microsecond mode, clearly shows that the necessary "complication" of limiting rows in these types of number generators still matters. If you use the same code repetitively (perhaps hundreds of thousands of times in a single run like you might with a splitter function), the differences can really add up. On a 100,000 row run where the number generator would be used 100,000 times, the difference of only 4 milliseconds is a difference of 400 seconds or 6 minutes and 40 seconds. That kind of time savings is certainly worth the "complication".

    Of course, I have the word "complication" in quotes because it's just not that complicated. 😉

    How did you run those queries. I got totally different numbers:

    Apparently the forum doesn't accept images from other domain. Here is the link:

    Don't just give the hungry man a fish, teach him how to catch it as well.

    the sqlist

Viewing 10 posts - 121 through 129 (of 129 total)

You must be logged in to reply to this topic. Login to reply