Random updates from another table

  • My tables:

    if object_id('tempdb.sys.#Table1') is not null drop table #Table1

    if object_id('tempdb.sys.#Table2') is not null drop table #Table2

    create table #Table1 (Id int identity(1,1), c1 int, c2 int, c3 int, someParam char(1))

    create table #Table2 (c1 int, c2 int, c3 int)

    insert into #Table1 (someParam) select 'A' union all select 'B' union all select 'A' union all select 'A' union all select 'B' union all select 'A'

    insert into #Table2 select 1, 2, 3 union select 1, 3, 2

    select * from #Table1

    select * from #Table2

    And they look like this:

    Table1:

    1NULLNULLNULLA

    2NULLNULLNULLB

    3NULLNULLNULLA

    4NULLNULLNULLA

    5NULLNULLNULLB

    6NULLNULLNULLA

    Table2:

    123

    132

    What is the best method to update Table1 (where someParam = 'A') with the values from Table2 in random order? So for example I want to get this:

    1132A

    2NULLNULLNULLB

    3123A

    4132A

    5NULLNULLNULLB

    6123A

    I could use cursor, but I don't like it.

    I could also multiply values in Table2 so that there would be exactly the same number as values in Table1 where someParam = 'A'. Then I would add Identity column to Table2 and new column to Table1 with the same values as Identity in Table 2. From there on the update is just a simple join...

    I don't like either options - any other ideas?

  • Why update? You could select exactly what you need and optionally store that via insert.

    Update is a low performer.

    Since you didn't specify what is random here, let's assume you have the table with a list of possible values:

    create table #Values( Value int )

    insert into #Values( Value ) select 1

    union all select 2

    union all select 5

    union all select 10

    union all select 500

    You want to generate a table of three columns containing these values.

    And let's assume you want that no value can appear twice in a single row.

    Here is the solution:

    with RandomlyPicked as

    (

    select p.* -- I can use star because columns are defined in inner query

    from

    (

    select top 3

    v.Value,

    PutIntoColumn = ROW_NUMBER() over( order by NEWID() ) -- randomly ordered

    from #values v

    ) vv

    pivot

    (max(vv.Value)

    for vv.PutIntoColumn in ([1],[2],[3])

    ) p

    )

    select r.*

    from RandomlyPicked r

    cross join (select top 10 x=object_id from sys.all_objects) t

    And here is the result:

    123

    10012

    25100

    10012

    2510

    512

    10052

    1105

    2101

    10021

    10015

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • Thanks Vedran, but this isn't quite the solution I was looking for.

    Some other facts:

    - Table1 is created and populated from another application, only columns c1, c2, c3 (in reality up to 15) are null

    - I create Table2 based on some business rules (all permutation from some numbers, all combinations from some numbers, including characters...)

    - This is one-time job, which means, that the update will happen only once

    Recap: I have a big Table1 (for example 10.000.000 rows) and a smaller Table2 (50 rows). I have to update a subset of the Table1 (for example 2.000.000 rows) with random values from Table2.

  • From the smaller table make a subquery widening it with a random ordinal number as shown in example before (row_number() over(order by newid()) construct).

    If you have unique integer (maybe PK?) on larger table, join with smaller table with join predicate modulo (x % y) of row count of smaller table+1 that equals to random ordinal number from widened smaller table subquery.

    If performance is important to you, it will be faster to insert radnomly generated rows and other rows from larger table into new larger table, create indexes etc, than update 20% of the rows of the larger table. I'm not going to give you the finished script now, you will learn better if you do it yourself 😉

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • Thanks Vedran, the trick with row_number and cross join works great 🙂 I did it this way:

    ;with temp as (

    select top 1000 row_number() over (order by checksum(newid())) Id, t1.c1, t1.c2, t1.c3

    from #Table2 t1 cross join sys.all_objects t2

    order by Id

    )

    update t1

    set c1 = t2.c1, c2 = t2.c2, c3 = t2.c3

    from #Table1 t1 join temp t2 on (t1.Id = t2.Id)

    where t1.someParam = 'A'

    The number in top clause in CTE has to be same as number of rows in the big table (#Table1).

    I'm just not sure how/why you wanted to join both tables with mod function... Can you elaborate please?

  • Modulo was because smaller table has less rows than big table.

    Lets say N = number of rows in the smaller table.

    Smaller table is widened in subquery with random unique ID ranging from 1 to N.

    To associate each row of big table to exactly one row of a small table, (unique int of a big table % N) + 1

    would give you the number in the range 1 to N, and you could join each row of big table to smaller table.

    You overcome that problem with multiplying rows of small table by joining with sys.all_objects and cutting with TOP

    at exactly the same number as rows in big table, but I'm not sure that join with all_objects is perfect here.

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • OK, you probably mean something like this:

    ;with temp as (

    select row_number() over (order by checksum(newid())) Id, c1, c2, c3

    from #Table2

    )

    select *

    from #Table1 t1 join temp t2 on ((t1.Id % (select count(*) from #Table2)) + 1 = t2.Id )

    where t1.someParam = 'A'

    I like the logic, but the final result could be 'semi' random - for example if I have only two values in the small table, then all odd records in the big table would be same and all even ones also. Or did I misunderstood you?

    The cross join with sys.all_objects works great, but gets increasingly slower when more records are needed - then you have to cross join again...

    PS - I resolved this with the help of a table function, which returns numbers from 1 to N, like this:

    declare @numberOfRecordsToUpdate int, @crossJoinTableSize int

    set @numberOfRecordsToUpdate = 1000000

    set @crossJoinTableSize = (@numberOfRecordsToUpdate / (select count(*) from #Table2))

    ;with temp as (

    select top(@numberOfRecordsToUpdate) row_number() over (order by checksum(newid())) Id, t1.c1, t1.c2, t1.c3

    from #Table2 t1 cross join f_table_numbers(1, @crossJoinTableSize)

    order by Id

    )

    update t1

    set c1 = t2.c1, c2 = t2.c2, c3 = t2.c3

    from #Table1 t1 join temp t2 on (t1.Id = t2.Id)

    where t1.someParam = 'A'

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply