100,000 random numbers without cursor

Question

Post reply

100,000 random numbers without cursor

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

I see your point... but I think there's a limit to bulletproofing the code .

use master

go

select top 100000 rand(checksum(newid()))

...

The only editing I would suggest would be to remove the 3rd cross join because it is just not required in this case!

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

Both fair points.

Here's something interesting though. Run it a couple of times with the 1 cross-join. Now uncomment the other 8 joins to syscomments. See any meaningful change in io reads etc ?

Set Statistics io on

go

Set statistics time on

go

dbcc dropcleanbuffers

go

if object_id('tempdb.dbo.#test') is not null drop table #test

Select top 100000 NewId() As id

into #test

From master.dbo.syscomments as c1

cross join master.dbo.syscomments as c2

/*

cross join master.dbo.syscomments as c3

cross join master.dbo.syscomments as c4

cross join master.dbo.syscomments as c5

cross join master.dbo.syscomments as c6

cross join master.dbo.syscomments as c7

cross join master.dbo.syscomments as c8

cross join master.dbo.syscomments as c9

cross join master.dbo.syscomments as c10

*/

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

The scans go FROM 2 to 10 which is expected but the IO barely go up. Very nice to see that .

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 4

It sounds like you're running SQL2005 Ninja. In 2005, I see pretty decent IO performance. Running the query in 2000 shows similar results, but scans go from 46 to 54 and reads go from 51884 to 51908.

Why not just avoid the IO cost altogether? Doing everything in memory is faster.

SQL guy and Houston Magician

Sergiy SSC Guru Points: 110208 More actions · Answer 5

Because it's not a user application. It's a SERVER.

It's a single machine serving potentially thousands of users.

And if every one of them will consume 10% of memory 11th user will kill the server.

I know the company where they decided to move business logic code to SP without changing approaches. Their server went down in seconds after deployment.

If you are stepping into the world of databases you need to learn couple of rules of this world and obey those rules.

_____________
Code for TallyGenerator

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 6

I agree that memory consideration is a very important ase are hundreds of other issues that must be considered when desiging a database or writing TSQL. Perhaps my statement was overly glib. Perhaps you're taking my statement out of context. Etiher way, I should probably clarify my earlier point.

I am merely talking about the code samples posted on this thread. We are discussing cross joining system tables to generate N rows. My question, why add the additional overhead in the first place? The alternatives suggested are faster and consume FEWER resources (Memory, CPU, and IO).

Why 52000 reads (416MB of data that will find it's way into the buffer cache) when you can do fewer reads. If you're running sql server 2005, why do 50 reads? If you can do fewer reads, consume less memory and use less CPU time, doesn't that sounds like the best solution?

SQL guy and Houston Magician

Sergiy SSC Guru Points: 110208 More actions · Answer 7

Both approaches are wrong.

Both take resourses DEDICATED to each running process. Both use calculation on fly, both use hardcoded rules.

But if you have single table storing all possible numbers and select subset of those numbers you:

1) use SHARED resource, common for all users;

2) can change returning values without changing code;

3) don't repeat the same calculations every time you need the data;

4) and because of 3) use minimum of resourse, just to pass data to the application.

_____________
Code for TallyGenerator

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 8

I’ll say this first:

I’ve reread the thread and, out of the three solutions, your solution actually seems to solve the problem by creating whole random numbers that are equally distributed without a costly UDF. There is also the issue of hardcoded rules. If you want more than 100k rows with my function, a code change will be required.

I also agree that a shared resource is a good thing because, in many cases, allows an application to scale better. I think numbers tables are great and, perhaps, I make too frequent use of them. With your solution, one rowset will exist in the buffer cache and not exist in multiple sessions and that seems (to me, anyway) that it would scale better.

I will argue this:

All three solutions require calculation, be cross joining, be it sorting N rows, whatever. All of these calculations have cost. (These calculations are repeated in every single call)

Furthermore, I disagree to your right/wrong black/white view of SQL Server. I’m sure you know that you tune T-SQL based on how it will be used. How often will this be called? A dozen times a day, a dozen times a second? Will the numbers table need to be loaded into cache every time it’s run? What are the constraints on the system? What else is running in the batch? Will the query need additional memory? How many rows are actually needed? 100,000? Depending on the circumstances, either solution could be correct. Perhaps we both should consider several factors before declaring one solution or another as universally correct.

I must also point out that in all of my tests (running once, in loops, in multiple threads on 2000 and 2005), my code outperformed yours (even after rewriting to generate whole integers rather than float values between 0 and 1). This, and SQL profiler, and the IO stats, and the CPU stats, have lead me to believe that it consumes fewer resources. I also only ever see you’re query in sys.dm_exec_query_memory_grants (even when the data is in cache). I have never seen either of them exceed the allocated memory (of course, that might not always be the case, depending on several factors).

I know the stats can be subjective sometimes so I’ll take what I see with a pinch of salt. I’m sure you’re right, you usually are. Just keep in mind we’re all here to help and learn. I know you have a lot of knowledge and experience to share with everyone, and many here wish to learn it. It isn’t always necessary to insult people to get your point across.

SQL guy and Houston Magician

Ronald San Juan SSCrazy Points: 2937 More actions · Answer 9

Correct me if I'm wrong, but I thing using Sergiy's solution of

SET ROWCOUNT @RowsToGet

SELECT Number

FROM Numbers

ORDER BY NEW_ID()

SET ROWCOUNT 0

would contradict the RANDOMability (if that's a word ) of the solution. You see, this solution will ENSURE that no "random" number gets duplicated since the table contains (pressumably) only a single instance of every number (say 1 to 1 million). Using "ORDER BY NEW_ID()" only randomizes the order of the rows.

Ronald San Juan | SQL DBA
ID 710124 ~ Code "Northwind"

Sergiy SSC Guru Points: 110208 More actions · Answer 10

Original request was :

> Can anyone suggest a quick way to create 100,000 random numbers without using a cursor or loop ?

100,000 random numbers means 100,000 numbers, not a random number repeated 100,000 times.

I believe repeating values is the case they did not think about.

_____________
Code for TallyGenerator

Ronald San Juan SSCrazy Points: 2937 More actions · Answer 11

Picture it this way...

Suppose I only want 10 random numbers of any number from 1 to 10. Your solution will give me the numbers 1 to 10 in a random order, NOT giving me a random number for every row.

Further, if I follow your solution, for the tenth row, I will have no other option because the first nine rows already consumed the other nine numbers. Now what's so random about that?

Ronald San Juan | SQL DBA
ID 710124 ~ Code "Northwind"

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 12

Actually, that’s a good point Ronald. The original poster mentioned an acceptable range of numbers that is encapsulated in his UDF. It is possible—and therefore must be considered—that his acceptable range of numbers is smaller than the total number of numbers required. If this were the case, duplicated would be required.

Obviously I don’t know what the true requirements are because the poster never divulged any details of his UDF (other than the results need to conform to the Poisson distribution and that there is an acceptable range of numbers)

SQL guy and Houston Magician