Tablesample

Question

Tablesample

Viewing 15 posts - 31 through 45 (of 47 total)

You must be logged in to reply to this topic. Login to reply

okbangas SSChampion Points: 11773 More actions · Answer 1

sknox (9/19/2012)
Another QotD which does not test what it purports to test.
Even if you know TABLESAMPLE inside and out, you cannot expect to get this question correct without knowing the default clustering setting for PRIMARY KEY and the undocumented behaviour of TOP without an ORDER BY.
Since in this artificial situation, TABLESAMPLE (10 PERCENT) itself will always return either zero or both rows, the determining factor for getting the first answer (always 0 or 1 rows) correct is understanding of TOP, not TABLESAMPLE, and the detemining factor for getting the second answer (value of 1) correct is understanding of PRIMARY KEY and the current undocumented behaviour of TOP.
So why title the question TABLESAMPLE, or bother to include it ? You're not testing for knowledge of it, that's for sure. Call it "Complex T-SQL Issue" or "Multiple non-deterministic filtering factors".
This question would be OK, if it weren't for that dishonesty.

This test does indeed test whether you know tablesample. To know the anwers here you HAVE TO know that I returns whole pages only, but that it does not always return any pages at all if the percentage (or number of rows) are too low, which I personally find important. Although the top without order is not documented, I added it because everyone that knows tablesample in and out, has seen that when there is a clustering key on the table, whole pages are returned and the rows are returned in order of the clustering key, even though I realize that this is not documented. So, I honestly mean that anyone knowing tablesample in and out, should be capable of answering this question correctly.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

Miles Neale SSChampion Points: 13147 More actions · Answer 2

Hugo Kornelis (9/19/2012)
Miles Neale (9/19/2012)
Kind of erodes the confidence in certain uses of the product.
Huh? Why?
SELECT TOP ... without ORDER BY is like saying "give me the first by any order of your choice" - and now you say that because that order might be different than you expected (but didn't specify), it is eroding your confidence in the product?

"The query will return zero or one row" Does it always do the same thing with the same request and components? Or do you have different results at different times with the same request and components?

Thus can you depend on it to be consistent? If in fact "The query will return zero or one row" how can it be consistent? And lack of consistency then will erode the confidence in a product.

Not all gray hairs are Dinosaurs!

Sean Lange SSC Guru Points: 286609 More actions · Answer 3

Miles Neale (9/19/2012)
Hugo Kornelis (9/19/2012)
Miles Neale (9/19/2012)
Kind of erodes the confidence in certain uses of the product.
Huh? Why?
SELECT TOP ... without ORDER BY is like saying "give me the first by any order of your choice" - and now you say that because that order might be different than you expected (but didn't specify), it is eroding your confidence in the product?
"The query will return zero or one row" Does it always do the same thing with the same request and components? Or do you have different results at different times with the same request and components?
Thus can you depend on it to be consistent? If in fact "The query will return zero or one row" how can it be consistent? And lack of consistency then will erode the confidence in a product.

I am guessing by your comments you didn't read about what TABLESAMPLE is. It is designed to return an approximate number of random rows equal to the sample size. That is the definition of a sample. It is not intended to give you exact outputs like a normal query. You can in fact make the sample repeatable by adding the keyword REPEATABLE and providing a seed number. The point is that there are only two rows so what exactly is the top 10% of 2??? How do you represent that as the number of rows to return? More often than not .2 is going to return 0 rows but occasionally it will return 1 row. In practice you would use TABLESAMPLE when you have a large table and you want to get a representation of what the data looks like. It is great for populating test/dev databases from a very large table or for other types of analysis.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 4

Miles Neale (9/19/2012)
Thus can you depend on it to be consistent? If in fact "The query will return zero or one row" how can it be consistent? And lack of consistency then will erode the confidence in a product.

The TABLESAMPLE clause is intended (and documented) to return a semi-random^* sampling of the rows in the data. If a query that is intended to return a semi-random result returns consistently the same, something is wrong. For this kind of functionality (which includes the RAND function as well), inconsistent results SHOULD be returned.

^* I use the term "semi-random" because there is some random factor involved, but it is not a true, statistically sound random sampling algorithm.

One of the catches you must be aware of when using TABLESAMPLE (and this catch is explicitly documented, so it does not in any way impact my confidence) is that TABLESAMPLE works on a page by page base. So if you for instance have a table with 40 rows, evenly divided over 2 pages, using TABLESAMPLE with nothing else to limit the results and with any percentage (except 0 and 100) will result in 0, 20, or 40 rows being returned, but never 15, 31, or whatever other number.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

EL Jerry SSCertifiable Points: 7065 More actions · Answer 5

Nice question, learned something new today.

"El" Jerry.

"A watt of Ottawa" - Gerardo Galvan

To better understand your help request, please follow these best practices.[/url]

TomThomson SSC Guru Points: 104768 More actions · Answer 6

Hugo Kornelis (9/19/2012)
Miles Neale (9/19/2012)
Kind of erodes the confidence in certain uses of the product.
Huh? Why?
SELECT TOP ... without ORDER BY is like saying "give me the first by any order of your choice" - and now you say that because that order might be different than you expected (but didn't specify), it is eroding your confidence in the product?

+1

I strongly agree with Hugo. I have absolute confidence that SQL Server will use some order it chooses if I don't specify any particlular order. The fact that it might choose an order that I didn't expect doesn't disturb my confidence that it will conform to specification, not at all.

Tom

Miles Neale SSChampion Points: 13147 More actions · Answer 7

Hugo Kornelis (9/19/2012)
Miles Neale (9/19/2012)
Thus can you depend on it to be consistent? If in fact "The query will return zero or one row" how can it be consistent? And lack of consistency then will erode the confidence in a product.
The TABLESAMPLE clause is intended (and documented) to return a semi-random^* sampling of the rows in the data. If a query that is intended to return a semi-random result returns consistently the same, something is wrong. For this kind of functionality (which includes the RAND function as well), inconsistent results SHOULD be returned.
^* I use the term "semi-random" because there is some random factor involved, but it is not a true, statistically sound random sampling algorithm.
One of the catches you must be aware of when using TABLESAMPLE (and this catch is explicitly documented, so it does not in any way impact my confidence) is that TABLESAMPLE works on a page by page base. So if you for instance have a table with 40 rows, evenly divided over 2 pages, using TABLESAMPLE with nothing else to limit the results and with any percentage (except 0 and 100) will result in 0, 20, or 40 rows being returned, but never 15, 31, or whatever other number.

HUGO, Thank you for taking the time and making the effort to explain this further. In understanding the use of the functionality and the intended design, it helped considerably. In the line of business I have supported over the years the need for accuracy is paramount. However in parts of the scientific arena where random samples could be very valuable this would be perfect. Thank you!

M.

Not all gray hairs are Dinosaurs!

ramkaay SSC Enthusiast Points: 156 More actions · Answer 8

The whole query will not fetch any rows.

Because the table is dropped at the end.

okbangas SSChampion Points: 11773 More actions · Answer 9

ramkaay (9/20/2012)
The whole query will not fetch any rows.
Because the table is dropped at the end.

The query will return rows anyway, otherwise temporary objects would not have worked very well, would they? The question is how many rows are returned, not how many rows are stored in the table at the end of execution.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

BP-503183 SSC Eights! Points: 842 More actions · Answer 10

Running the query on SQL 2008R2 10.50.2796, nothing is returned. Still trying to determine why.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 11

BP-503183 (9/20/2012)
Running the query on SQL 2008R2 10.50.2796, nothing is returned. Still trying to determine why.

Run it lots of times. Then read the explanation, the documentation on TABLESAMPLE, or the discussion here to understand what happened.

EDIT: I realize the above sounds a bit harsh. It was not meant that way. I just wanted to point out that the explanation is there, and that there is no need in repeating it,

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

SQLRNNR SSC Guru Points: 281344 More actions · Answer 12

L' Eomot Inversé (9/19/2012)
I got it wrong. I looked at it, and realised that there are in theory 3 correct answers there: the table will always return no more than 1 row, the query may return value 1, and the query may return value 2 (top without an order clause isn't constrained to use the clustered index order). So I had to pick 2 out of the 3 poeeibilities, and picked the two values because I thought maybe that was the point of the question and "at most one row" had got in there by mistake. Of course I don't thing that picking value 1 and value 2 is right, but neither is picking at most one row and value 1 (the official "correct" answer that reinforces a potentially dangerous misconception about some imaginary inherent ordering) and neither of course is picking at most one row and value 2 (because that would depend on the same fallacious concpt of an inherent order).
So maybe a good question (it's on a topic not seen before in QotD, I think, and that at least is good) - but maybe not (reinforcing the inherent order myth is NOT good). I haven't got at SQL 2012 yet, maybe the optimiser there can pick a different order? Or the next release, or the one after that - the SQL community has been burnt before by assuming rules which don't exist (as pointed out by Hugo).
The row in one of our forums about using quirky update (where the problem is in fact the same inherent order assumption) makes it clear that people expect that there's an unacceptably high risk that a new release will blow that myth out of the water. Even though Jeff wrapped up his use of quirky update with conditions designed to ensure that there was a high probability of the mythical inherent order being used by the optimiser, and insisted that everything must be carefully tested and checked in the real environment and not just in testbeds, and Paul proposed some checks which were fairly certain to ensure that there was an extremely high probability that if the optimiser chose a different order an error would be signalled, and I refined those checks to bring that probability to 100% a lot of people still claim that it is catastrophically dangerous to assume that order when quirky update was used; surely it's just as dangerous to assume that order when TOP is used without an order by clause - or even more dangerous, since no one has suggsted any checks to detect the order not being used and signal it when it happens.
edit: English & spelling

+1

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Paul White SSC Guru Points: 150468 More actions · Answer 13

L' Eomot Inversé (9/19/2012)
I got it wrong. I looked at it, and realised that there are in theory 3 correct answers there: the table will always return no more than 1 row, the query may return value 1, and the query may return value 2 (top without an order clause isn't constrained to use the clustered index order). So I had to pick 2 out of the 3 poeeibilities, and picked the two values because I thought maybe that was the point of the question and "at most one row" had got in there by mistake. Of course I don't thing that picking value 1 and value 2 is right, but neither is picking at most one row and value 1 (the official "correct" answer that reinforces a potentially dangerous misconception about some imaginary inherent ordering) and neither of course is picking at most one row and value 2 (because that would depend on the same fallacious concpt of an inherent order).

This is exactly why I think this is such a poor question. Goodness knows myths around implied ordering are prevalent enough without questions like this reinforcing those misconceptions. The fact is the logical SQL query says that at most one row will be returned; it says nothing about which row. The author seeks to rely on what he thinks he knows about the internals when there was no good reason to do so.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

vk-kirov SSCertifiable Points: 7686 More actions · Answer 14

SQL Kiwi (9/21/2012)
Goodness knows myths around implied ordering are prevalent enough without questions like this reinforcing those misconceptions.

I've just run the 'sp_who2 active' command on one of our servers. This is how SPIDs were 'sorted' in the output: ... 35, 36, 1068, 1100, 1131, 135, 144, 148, 162, 1199, 227 ...

This 'sorting' always drives me crazy when trying to find a certain SPID (out of ~100 SPIDs). All because those MS guys wrote in 1995 (an excerpt from the 'sp_who2' source):

SELECT

SPID = convert(char(5),spid)

,Status = ...

...

from

#tb1_sysprocesses

where

...

-- (Seems always auto sorted.) order by spid_sort

They had had a sorting but commented it out :ermm:

iguillon SSC Rookie Points: 48 More actions · Answer 15

With the drop in the last line, it returns nothing, does it?