April 3, 2014 at 7:37 am
Interesting question
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 3, 2014 at 7:56 am
david.gugg (4/3/2014)
Great question, I'd never heard of TABLESAMPLE, so I looked it up - and then still ended up getting the answer wrong :O
+1 Thanks for the question.
April 3, 2014 at 10:51 am
twin.devil (4/3/2014)
Nice question Anoo. learn something new today. thanks
+1
April 3, 2014 at 1:11 pm
pmadhavapeddi22 (4/3/2014)
when we are specifying high range of rows nearing to the exact no of rows then it will return all the rows of the tablelike , in the QOD
we have total no of rows 5 and tablesample is given for 4 rows, then that particular statement will return 5 rows.
When I ran it for the 5th or 6th time it returned zero rows.
[font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
Connect to me on LinkedIn
April 3, 2014 at 1:29 pm
Wow, I didn't know that.
Thanks for the question.
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
April 3, 2014 at 2:01 pm
Good question, and good explanation. Thanks, Anoo! 😉
April 3, 2014 at 3:13 pm
pmadhavapeddi22 (4/3/2014)
when we are specifying high range of rows nearing to the exact no of rows then it will return all the rows of the tablelike , in the QOD
we have total no of rows 5 and tablesample is given for 4 rows, then that particular statement will return 5 rows.
No, it will decide whether the first page is selected based on some statistical calculation, which is NOT deterministic. If teh first page is selected in our case the answer is 5, because the first page contaons the whole table. If that page is not selected, the answer is 0 because there are no ther pages to contain rows.
Tom
April 3, 2014 at 11:28 pm
thanks tom for the detailed explanation
April 3, 2014 at 11:35 pm
But i really dont understand one thing, tablesample is expecting 4 rows to return, but the result set is totally different, that too on different execution, the purpose of the tablesample is confusing, is there anything specific about this ?
Thanks in advance
April 4, 2014 at 1:32 pm
pmadhavapeddi22 (4/3/2014)
But i really dont understand one thing, tablesample is expecting 4 rows to return, but the result set is totally different, that too on different execution, the purpose of the tablesample is confusing, is there anything specific about this ?Thanks in advance
The TABLESAMPLE clause is supposed to be used to get a "somewhat random" sample of the table contents. There is no guarantee that the amount or percentage of rows returned will be exactly as speceified.
The internal implementation is that each page allocated to the table is either included or not included. For large tables, this works out okay (allthough the results will always consist of series of consecutive rows, with gapes in between those series - as I said: "somewhat random" at best). But for smaller tables, and especially as small as the one in this question, it just acts weird.
The table in the question fits completely on a single page. Because TABLESAMPLE always works at the page level, the effect is that the page either will or will not be included in the sample.
I hope this helps!
April 4, 2014 at 2:00 pm
Just tosupplement what Hugo wote: the reason for working at page level insteadof at row level is performance. Think of a moderate sized table, say a million rows spread over twenty thousand pages. Calculating for each row whether it is to be included or not would mean getting a million random numbers (one for each row) and testing whether each is over some threshold to determine whether teh correcponding row is to retrieved. It's cheaper to do that operation only twenty thousand times, rather than amillion times. If the required sample size is say 10% of the table, then the row-wise method will collect on average 5 rows from each page and just about all 20000 pages will be read because the chance of a page not being read is, on average, 0.9^50 which is a lot less that 1/20000; but the page-oriented version will on average read only 400 pages - a factor of 50 reduction in IO and buffer requirements.
As Hugo says, the paging means that the samples delivered by this feature are somewhat less randomly chosen than the would be with a row-oriented method, and there can be cases where that as bad effects on teh validity of what you get from using this for testing - it's worth looking at whether this will be significant or not (but difficult to work this out if the table is a heap ).
Tom
April 5, 2014 at 5:37 am
Ed Wagner (4/3/2014)
I'd never used this option either. Very interesting, so thanks.
Me neither, but very interesting QotD!
Thanks & Best Regards,
Hany Helmy
SQL Server Database Consultant
April 7, 2014 at 2:20 am
Thank you Hugo Kornelis and TomThomson for the explanation
April 7, 2014 at 6:15 am
Thanks for the interesting question. I ran query several times on SQL 2012; output was 0 or 5 rows, but once 4 rows were returned.
April 7, 2014 at 2:10 pm
nice question thanks Anoo.
Viewing 15 posts - 16 through 30 (of 31 total)
You must be logged in to reply to this topic. Login to reply