Random updates from another table

Question

Post reply

Random updates from another table

_simon_

SSCrazy

Points: 2329
More actions
January 26, 2011 at 8:26 am

#233406

My tables:
if object_id('tempdb.sys.#Table1') is not null drop table #Table1
if object_id('tempdb.sys.#Table2') is not null drop table #Table2
create table #Table1 (Id int identity(1,1), c1 int, c2 int, c3 int, someParam char(1))
create table #Table2 (c1 int, c2 int, c3 int)
insert into #Table1 (someParam) select 'A' union all select 'B' union all select 'A' union all select 'A' union all select 'B' union all select 'A'
insert into #Table2 select 1, 2, 3 union select 1, 3, 2
select * from #Table1
select * from #Table2
And they look like this:
Table1:
1NULLNULLNULLA
2NULLNULLNULLB
3NULLNULLNULLA
4NULLNULLNULLA
5NULLNULLNULLB
6NULLNULLNULLA
Table2:
123
132
What is the best method to update Table1 (where someParam = 'A') with the values from Table2 in random order? So for example I want to get this:
1132A
2NULLNULLNULLB
3123A
4132A
5NULLNULLNULLB
6123A
I could use cursor, but I don't like it.
I could also multiply values in Table2 so that there would be exactly the same number as values in Table1 where someParam = 'A'. Then I would add Identity column to Table2 and new column to Table1 with the same values as Identity in Table 2. From there on the update is just a simple join...
I don't like either options - any other ideas?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Vedran Kesegic SSCrazy Eights Points: 9302 More actions · Answer 1

Why update? You could select exactly what you need and optionally store that via insert.

Update is a low performer.

Since you didn't specify what is random here, let's assume you have the table with a list of possible values:

create table #Values( Value int )

insert into #Values( Value ) select 1

union all select 2

union all select 5

union all select 10

union all select 500

You want to generate a table of three columns containing these values.

And let's assume you want that no value can appear twice in a single row.

Here is the solution:

with RandomlyPicked as

(

select p.* -- I can use star because columns are defined in inner query

from

(

select top 3

v.Value,

PutIntoColumn = ROW_NUMBER() over( order by NEWID() ) -- randomly ordered

from #values v

) vv

pivot

(max(vv.Value)

for vv.PutIntoColumn in ([1],[2],[3])

) p

)

select r.*

from RandomlyPicked r

cross join (select top 10 x=object_id from sys.all_objects) t

And here is the result:

123

10012

25100

10012

2510

512

10052

1105

2101

10021

10015

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

_simon_ SSCrazy Points: 2329 More actions · Answer 2

Thanks Vedran, but this isn't quite the solution I was looking for.

Some other facts:

- Table1 is created and populated from another application, only columns c1, c2, c3 (in reality up to 15) are null

- I create Table2 based on some business rules (all permutation from some numbers, all combinations from some numbers, including characters...)

- This is one-time job, which means, that the update will happen only once

Recap: I have a big Table1 (for example 10.000.000 rows) and a smaller Table2 (50 rows). I have to update a subset of the Table1 (for example 2.000.000 rows) with random values from Table2.

Vedran Kesegic SSCrazy Eights Points: 9302 More actions · Answer 3

From the smaller table make a subquery widening it with a random ordinal number as shown in example before (row_number() over(order by newid()) construct).

If you have unique integer (maybe PK?) on larger table, join with smaller table with join predicate modulo (x % y) of row count of smaller table+1 that equals to random ordinal number from widened smaller table subquery.

If performance is important to you, it will be faster to insert radnomly generated rows and other rows from larger table into new larger table, create indexes etc, than update 20% of the rows of the larger table. I'm not going to give you the finished script now, you will learn better if you do it yourself 😉

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

_simon_ SSCrazy Points: 2329 More actions · Answer 4

Thanks Vedran, the trick with row_number and cross join works great 🙂 I did it this way:

;with temp as (

select top 1000 row_number() over (order by checksum(newid())) Id, t1.c1, t1.c2, t1.c3

from #Table2 t1 cross join sys.all_objects t2

order by Id

)

update t1

set c1 = t2.c1, c2 = t2.c2, c3 = t2.c3

from #Table1 t1 join temp t2 on (t1.Id = t2.Id)

where t1.someParam = 'A'

The number in top clause in CTE has to be same as number of rows in the big table (#Table1).

I'm just not sure how/why you wanted to join both tables with mod function... Can you elaborate please?

Vedran Kesegic SSCrazy Eights Points: 9302 More actions · Answer 5

Modulo was because smaller table has less rows than big table.

Lets say N = number of rows in the smaller table.

Smaller table is widened in subquery with random unique ID ranging from 1 to N.

To associate each row of big table to exactly one row of a small table, (unique int of a big table % N) + 1

would give you the number in the range 1 to N, and you could join each row of big table to smaller table.

You overcome that problem with multiplying rows of small table by joining with sys.all_objects and cutting with TOP

at exactly the same number as rows in big table, but I'm not sure that join with all_objects is perfect here.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

_simon_ SSCrazy Points: 2329 More actions · Answer 6

OK, you probably mean something like this:

;with temp as (

select row_number() over (order by checksum(newid())) Id, c1, c2, c3

from #Table2

)

select *

from #Table1 t1 join temp t2 on ((t1.Id % (select count(*) from #Table2)) + 1 = t2.Id )

where t1.someParam = 'A'

I like the logic, but the final result could be 'semi' random - for example if I have only two values in the small table, then all odd records in the big table would be same and all even ones also. Or did I misunderstood you?

The cross join with sys.all_objects works great, but gets increasingly slower when more records are needed - then you have to cross join again...

PS - I resolved this with the help of a table function, which returns numbers from 1 to N, like this:

declare @numberOfRecordsToUpdate int, @crossJoinTableSize int

set @numberOfRecordsToUpdate = 1000000

set @crossJoinTableSize = (@numberOfRecordsToUpdate / (select count(*) from #Table2))

;with temp as (

select top(@numberOfRecordsToUpdate) row_number() over (order by checksum(newid())) Id, t1.c1, t1.c2, t1.c3

from #Table2 t1 cross join f_table_numbers(1, @crossJoinTableSize)

order by Id

)

update t1

set c1 = t2.c1, c2 = t2.c2, c3 = t2.c3

from #Table1 t1 join temp t2 on (t1.Id = t2.Id)

where t1.someParam = 'A'