December 8, 2006 at 8:08 am
I see your point... but I think there's a limit to bulletproofing the code .
use master
go
select top 100000 rand(checksum(newid()))
...
The only editing I would suggest would be to remove the 3rd cross join because it is just not required in this case!
December 8, 2006 at 8:32 am
Both fair points.
Here's something interesting though. Run it a couple of times with the 1 cross-join. Now uncomment the other 8 joins to syscomments. See any meaningful change in io reads etc ?
Set Statistics io on
go
Set statistics time on
go
dbcc dropcleanbuffers
go
if object_id('tempdb.dbo.#test') is not null drop table #test
Select top 100000 NewId() As id
into #test
From master.dbo.syscomments as c1
cross join master.dbo.syscomments as c2
/*
cross join master.dbo.syscomments as c3
cross join master.dbo.syscomments as c4
cross join master.dbo.syscomments as c5
cross join master.dbo.syscomments as c6
cross join master.dbo.syscomments as c7
cross join master.dbo.syscomments as c8
cross join master.dbo.syscomments as c9
cross join master.dbo.syscomments as c10
*/
December 8, 2006 at 8:38 am
The scans go FROM 2 to 10 which is expected but the IO barely go up. Very nice to see that .
December 8, 2006 at 1:07 pm
It sounds like you're running SQL2005 Ninja. In 2005, I see pretty decent IO performance. Running the query in 2000 shows similar results, but scans go from 46 to 54 and reads go from 51884 to 51908.
Why not just avoid the IO cost altogether? Doing everything in memory is faster.
SQL guy and Houston Magician
December 8, 2006 at 1:32 pm
Because it's not a user application. It's a SERVER.
It's a single machine serving potentially thousands of users.
And if every one of them will consume 10% of memory 11th user will kill the server.
I know the company where they decided to move business logic code to SP without changing approaches. Their server went down in seconds after deployment.
If you are stepping into the world of databases you need to learn couple of rules of this world and obey those rules.
_____________
Code for TallyGenerator
December 8, 2006 at 2:23 pm
I agree that memory consideration is a very important ase are hundreds of other issues that must be considered when desiging a database or writing TSQL. Perhaps my statement was overly glib. Perhaps you're taking my statement out of context. Etiher way, I should probably clarify my earlier point.
I am merely talking about the code samples posted on this thread. We are discussing cross joining system tables to generate N rows. My question, why add the additional overhead in the first place? The alternatives suggested are faster and consume FEWER resources (Memory, CPU, and IO).
Why 52000 reads (416MB of data that will find it's way into the buffer cache) when you can do fewer reads. If you're running sql server 2005, why do 50 reads? If you can do fewer reads, consume less memory and use less CPU time, doesn't that sounds like the best solution?
SQL guy and Houston Magician
December 8, 2006 at 2:39 pm
Both approaches are wrong.
Both take resourses DEDICATED to each running process. Both use calculation on fly, both use hardcoded rules.
But if you have single table storing all possible numbers and select subset of those numbers you:
1) use SHARED resource, common for all users;
2) can change returning values without changing code;
3) don't repeat the same calculations every time you need the data;
4) and because of 3) use minimum of resourse, just to pass data to the application.
_____________
Code for TallyGenerator
December 11, 2006 at 11:38 am
I’ll say this first:
I’ve reread the thread and, out of the three solutions, your solution actually seems to solve the problem by creating whole random numbers that are equally distributed without a costly UDF. There is also the issue of hardcoded rules. If you want more than 100k rows with my function, a code change will be required.
I also agree that a shared resource is a good thing because, in many cases, allows an application to scale better. I think numbers tables are great and, perhaps, I make too frequent use of them. With your solution, one rowset will exist in the buffer cache and not exist in multiple sessions and that seems (to me, anyway) that it would scale better.
I will argue this:
All three solutions require calculation, be cross joining, be it sorting N rows, whatever. All of these calculations have cost. (These calculations are repeated in every single call)
Furthermore, I disagree to your right/wrong black/white view of SQL Server. I’m sure you know that you tune T-SQL based on how it will be used. How often will this be called? A dozen times a day, a dozen times a second? Will the numbers table need to be loaded into cache every time it’s run? What are the constraints on the system? What else is running in the batch? Will the query need additional memory? How many rows are actually needed? 100,000? Depending on the circumstances, either solution could be correct. Perhaps we both should consider several factors before declaring one solution or another as universally correct.
I must also point out that in all of my tests (running once, in loops, in multiple threads on 2000 and 2005), my code outperformed yours (even after rewriting to generate whole integers rather than float values between 0 and 1). This, and SQL profiler, and the IO stats, and the CPU stats, have lead me to believe that it consumes fewer resources. I also only ever see you’re query in sys.dm_exec_query_memory_grants (even when the data is in cache). I have never seen either of them exceed the allocated memory (of course, that might not always be the case, depending on several factors).
I know the stats can be subjective sometimes so I’ll take what I see with a pinch of salt. I’m sure you’re right, you usually are. Just keep in mind we’re all here to help and learn. I know you have a lot of knowledge and experience to share with everyone, and many here wish to learn it. It isn’t always necessary to insult people to get your point across.
SQL guy and Houston Magician
December 11, 2006 at 4:26 pm
Correct me if I'm wrong, but I thing using Sergiy's solution of
SET ROWCOUNT @RowsToGet
SELECT Number
FROM Numbers
ORDER BY NEW_ID()
SET ROWCOUNT 0
would contradict the RANDOMability (if that's a word ) of the solution. You see, this solution will ENSURE that no "random" number gets duplicated since the table contains (pressumably) only a single instance of every number (say 1 to 1 million). Using "ORDER BY NEW_ID()" only randomizes the order of the rows.
December 11, 2006 at 4:38 pm
Original request was :
> Can anyone suggest a quick way to create 100,000 random numbers without using a cursor or loop ?
100,000 random numbers means 100,000 numbers, not a random number repeated 100,000 times.
I believe repeating values is the case they did not think about.
_____________
Code for TallyGenerator
December 11, 2006 at 4:57 pm
Picture it this way...
Suppose I only want 10 random numbers of any number from 1 to 10. Your solution will give me the numbers 1 to 10 in a random order, NOT giving me a random number for every row.
Further, if I follow your solution, for the tenth row, I will have no other option because the first nine rows already consumed the other nine numbers. Now what's so random about that?
December 11, 2006 at 5:07 pm
Actually, that’s a good point Ronald. The original poster mentioned an acceptable range of numbers that is encapsulated in his UDF. It is possible—and therefore must be considered—that his acceptable range of numbers is smaller than the total number of numbers required. If this were the case, duplicated would be required.
Obviously I don’t know what the true requirements are because the poster never divulged any details of his UDF (other than the results need to conform to the Poisson distribution and that there is an acceptable range of numbers)
SQL guy and Houston Magician
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply