Tally Tables

  • John Mitchell-245523 (9/3/2013)


    The stuff about the execution plans was interesting, too. One thing I don't understand - if the second query has less work to do because of the single NEWID evaluation, why is it that query that has the extra Compute Scalar operator?

    It doesn't. At least not on my system (SQL 2012, SP1). And I based my explanation on the plans I saw.

    What version of SQL are you seeing this on? Also, can you right-click the execution plans, choose "Save Execution Plan As ..." to save it as a .sqlplan file and attach that to a reply? I'd like to see that version of the plan.

    EDIT: If you do the above, please save and attach the plans for BOTH queries, and name them in a way that makes it easy for me to see which is which.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo

    I'm on SQL Server 2008 R2 SP2. I ran both queries as a single batch, so there's a single execution plan, making it easy to compare them at a glance. You can see which query is which in the header at the top of each section.

    John

  • Interesting. Those are somewhat different from the SQL 2012 plans. And goes to show that when something isn't documented it may very well change between releases without any formal notice.


    Just because you're right doesn't mean everybody else is wrong.

  • John Mitchell-245523 (9/3/2013)


    Hugo

    I'm on SQL Server 2008 R2 SP2. I ran both queries as a single batch, so there's a single execution plan, making it easy to compare them at a glance. You can see which query is which in the header at the top of each section.

    John

    Thanks, John.

    I had a look at the plans. Apparently the optimization steps I described were recently introduced. Your plans are in fact identical (I'll get to the extra Compute Scalar in a bit).

    I presume that if you run the queries I posted, you would get different results as well. For me, the query with RANK instead of ROW_NUMBER produces 1 in each row. After seeing your plans for the ROW_NUMBER query, I would predict that on your system, RANK would still produce consecutive numbers.

    The extra Compute Scalar you see is a result of using the subquery. I have seen the same when I was experimenting with "SELECT NEWID() FROM ..." versus "SELECT (SELECT NEWID()) FROM ...". If you go into the properties of the operator and examine its defined values property, you'll see that it only performs one "computation": "[Expr1064]=[Expr1063]". (And in the other Compute Scalar operator, you will see that this is indeed the NEWID(): "[Expr1063]=newid()".

    <speculation>

    My best guess is that, even though it would be fairly trivial to remove these superfluous Compute Scalar nodes during query optimization, the development team have done the math and concluded that the extra time spent to decide if a node can be removed is already more than the time it would save during execution - and hence they have chosen not to implement this optimization.

    </speculation>


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Rune Bivrin (9/3/2013)


    L' Eomot Inversé (9/3/2013)


    Rune Bivrin (9/3/2013)


    Just look at the execution plan and tell me if there is no difference.

    Or even try this, where NEWID() is completely eliminated from the second query.

    -- 1

    SELECT ROW_NUMBER() OVER (ORDER BY NEWID())

    FROM sys.all_columns

    -- 2

    SELECT ROW_NUMBER() OVER (ORDER BY (select 1))

    FROM sys.all_columns

    I'd argue that they are different but yield the same result, as is the example given in the question.

    I'd be somewhat surprised if the first query generated the error message

    Windowed functions and NEXT VALUE FOR functions do not support integer indices as ORDER BY clause expressions.

    So I suspect these two don't both generate the same.

    Well, on SQL Server 2012 they do. ORDER BY (select 1) is no different from ORDER BY (SELECT NEWID())

    SELECT ROW_NUMBER() OVER (ORDER BY 1) FROM sys.all_columns

    yields an error, however, as would be expected.

    Woops, that's me reading absolutely carelessly. Sorry.

    Tom

  • Hugo Kornelis (9/3/2013)


    Thanks, John.

    I had a look at the plans. Apparently the optimization steps I described were recently introduced. Your plans are in fact identical (I'll get to the extra Compute Scalar in a bit).

    I presume that if you run the queries I posted, you would get different results as well. For me, the query with RANK instead of ROW_NUMBER produces 1 in each row. After seeing your plans for the ROW_NUMBER query, I would predict that on your system, RANK would still produce consecutive numbers.

    The extra Compute Scalar you see is a result of using the subquery. I have seen the same when I was experimenting with "SELECT NEWID() FROM ..." versus "SELECT (SELECT NEWID()) FROM ...". If you go into the properties of the operator and examine its defined values property, you'll see that it only performs one "computation": "[Expr1064]=[Expr1063]". (And in the other Compute Scalar operator, you will see that this is indeed the NEWID(): "[Expr1063]=newid()".

    <speculation>

    My best guess is that, even though it would be fairly trivial to remove these superfluous Compute Scalar nodes during query optimization, the development team have done the math and concluded that the extra time spent to decide if a node can be removed is already more than the time it would save during execution - and hence they have chosen not to implement this optimization.

    </speculation>

    Hugo, you're right - RANK does indeed produce consecutive numbers.

    John

  • Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    I'm talking about the following two answer choices:

    The first query returns a Tally table and the second returns the same row count and numbers but in a randomized order.

    The second query returns a Tally table and the first returns the same row count and numbers but in a randomized order.

    See the problem? They imply that a tally table has to be intrinsically ordered -- this is simply untrue. A tally table simply has to be orderable.

    So even if one of the two queries returned its values in a randomized order, it would still be a tally table.

    PS. To create the "randomized order", just copy the ORDER BY clauses to the end of the query, thus:

    -- 1

    SELECT ROW_NUMBER() OVER (ORDER BY NEWID())

    FROM sys.all_columns

    ORDER BY NEWID()

    -- 2

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NEWID()))

    FROM sys.all_columns

    ORDER BY (SELECT NEWID())

  • sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (9/3/2013)


    sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.

    Guess it's my part to never have my last name spelled correctly. :crying:

    Not even my fellow Swedes get it right.


    Just because you're right doesn't mean everybody else is wrong.

  • I wanted to say thank you to Hugo for the in-depth explanation.

  • sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    I'm talking about the following two answer choices:

    The first query returns a Tally table and the second returns the same row count and numbers but in a randomized order.

    The second query returns a Tally table and the first returns the same row count and numbers but in a randomized order.

    See the problem? They imply that a tally table has to be intrinsically ordered -- this is simply untrue. A tally table simply has to be orderable.

    So even if one of the two queries returned its values in a randomized order, it would still be a tally table.

    I think that's arguable either way.

    It could be argued that to be a table in a relational database a table has to have a primary key; that means there is an intrinsic order determined by the key attributes; a tally table has only one attribute, so it is intrinsically ordered by that attribute.

    That's a theory argument. Of course it only applies to base tables and to manifest derived relations (called indexed views in SQL Server), not to other derived relations.

    The practical argument is that a tally table, for performance reasons, needs an index on it's single column - otherwise a sort will be required in many cases where if it were ordered the sort would not be needed, so the Tally table would be less useful. But derived relations (other than ordered views) are not ordered, so this argument can only apply when the tally table is a base table (or an ordered view).

    Derived tables provided as subqueries (or as CTEs) aren't ordered, just orderable; if few enough rows are required, it's possible that using such a relation as a tally table involves no IO, while using a base table would require IO; and by a happy coincidence (or sometimes by deliberate design) its generation usually happens in the desired order anyway, despite the derived relation not being constrained to have any particular order; so the practical performance argument doesn't work in favour of an intrinsic order in this case. So when a tally table is a derived relation rather than a base table it is reasonable for it not to be in order (and it certainly isn't intrinsically in order).

    So whether a tally table has to be ordered or not seems to depend on whether it's a base table (in which case it is ordered) or a subquery (possibly a CTE) delivering a derived relation.

    So whether your statement is correct or not depends on whether you call a non-manifest derived relation used in the same way as a (base) Tally table a Tally table or not.

    Tom

  • Rune Bivrin (9/3/2013)


    Hugo Kornelis (9/3/2013)


    sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.

    Guess it's my part to never have my last name spelled correctly. :crying:

    Not even my fellow Swedes get it right.

    Awww, I'm so sorry, Rune.

    And looking back, I think I also miscredited a post you made to John. I'm really on a roll today. :blush:


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • L' Eomot Inversé (9/3/2013)


    It could be argued that to be a table in a relational database a table has to have a primary key; that means there is an intrinsic order determined by the key attributes

    In the theoretic approach, a relational table does have to have a key. (Not sure if "primary key" is a requirement; "candidate key" is).

    But a key does not impose an order.

    A key does not even require an index - that's just an implementation choice. (And a very smart one, I should add).

    And even an index does not impose order. Yes, the B-tree indexes we know in SQL Server do. But hash indexes (available in Oracle, and also in the "Hekaton" engine that you can already play with in the CTP of SQL 2014) don't, and neither do columnstore indexes (available as nonclustered in SQL2012, and as both clustered or nonclustered in SQL Server PDW and in the CTP of SQL 2014).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Rune Bivrin (9/3/2013)


    Hugo Kornelis (9/3/2013)


    sknox (9/3/2013)


    Wow.

    Nobody's mentioned what I consider to be the biggest problem with this question yet.

    You must have overlooked the things already mentioned about ordering by Rune Irvin, Terreador, and me.

    Guess it's my part to never have my last name spelled correctly. :crying:

    Not even my fellow Swedes get it right.

    Don't worry, he can't spell my nickname properly either 😉

  • kapil_kk (9/2/2013)


    bitbucket-25253 (9/2/2013)


    Thanks for a nice question - good way to start a work week

    +1 🙂

    +1



    Everything is awesome!

Viewing 15 posts - 16 through 30 (of 48 total)

You must be logged in to reply to this topic. Login to reply