March 8, 2010 at 6:29 am
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.
Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!
_________________________________
seth delconte
http://sqlkeys.com
March 9, 2010 at 1:45 pm
Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!
He has been served.
EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote
At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
March 9, 2010 at 3:00 pm
GabyYYZ (3/9/2010)
Seth Delconte (3/8/2010)
nick.mcdermaid (3/7/2010)
Someone appears to have drawn some inspiration from your article.Yes it certainly looks that way - even the reference links are the same. A little credit would have been nice!
He has been served.
EDIT: The comment I submitted is awaiting moderation. If he rejects it, this is what I wrote
At least properly credit the appropriate article if you’re going to copy from it (not the author of this original article but on the same forum) <link back to the article here>
Ahh thank you Gaby!
_________________________________
seth delconte
http://sqlkeys.com
March 14, 2010 at 3:02 pm
You're missing one really important point.
Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages.
When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.
Ron
March 14, 2010 at 9:41 pm
rd_in_sd (3/14/2010)
Using GUID's is fine, but when the primary key is a clustered index, then you end up with half-or-less empty data pages which creates a lot of I/O and useless, partially empty data pages. When you use a GUID as a primary key, make sure it isn't a clustered index or you'll get dog-poor performance.
This point has been raised numerous times already in this thread...and the answer is always the same: use the NEWSEQUENTIALID in a default constraint to avoid this issue completely. Full pages, minimal page-splits...globally-unique goodness.
March 14, 2010 at 10:11 pm
Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast
/*
FAST Simple Random Sample from a source table using pseudo random number generator RAND()
Usage:
Sample Size is set following the "select top"
Copy and paste more random value subqueries than the sample size
If the sample size is more than 20% of the source table size, you may have to include many time the sample size of random value subqueries
and if more than 50%, consider changing the query to exlude the random rows.
Modify the @SourceRowCount and the first part of the query for your source table name (dbo.phrases in this example)
A random seed is used, but a fixed one can be used if desired
Advantages:
Does not use NEWID for randomness, which is not technically a Random Number Genrator
Is FAST, even when the source table is large Select 10 from 10 million in 6 seconds, 1000 in 8 seconds
Does not rely on data page assumptions (see TABLESAMPLE)
Disadvantages:
Not TRUE random numbers
See Also:
http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx
Presents three methods for random samples
1) Cursors and RAND() Cursor Overhead
2) Rand( ROW ID) Bad randomness
3) NewID
http://msdn.microsoft.com/en-us/library/ms189108.aspx
Documentation for TABLESAMPLE
1) Can only be used on true tables, not linked, views, XML, etc
2) Samples pages, not rows, if pages contain correlated row its no longer a simple random sample
*/
/*
Find the number of rows in the source table
*/
declare @SourceRowCount integer
select @SourceRowCount=COUNT(*) from dbo.phrases
/*
Initialize the random number generator with a random seed (or a particular seed)
*/
declare @d integer
set @d=RAND() --set @d=RAND(an integer)
/*
Create Source table from the desired table and ROW_NUMBER()
Join Source to a table of random values between
Select top n from a slightly larger set
*/
select * from
(select *, ROW_NUMBER() over(order by GETDATE()) as Row from dbo.phrases)
as source,
(select top 10 n from
(select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n union all
select convert(integer, RAND()*@SourceRowCount) as n)
as tmp)
as R where source.Row=R.n
March 14, 2010 at 10:47 pm
DannyS (3/14/2010)
Ok, this isn't the most elegant solution, but is is a proper "Simple Random Sample" and is very fast
Yes it is neat. If you are interested in a more generic version, but using the same basic idea, check this active thread out:
March 23, 2010 at 2:52 pm
A nice alternative to RAND and one of which I wasn't aware. Thanks!
John Trudo
Viewing 8 posts - 31 through 37 (of 37 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy