Increment query based off two tables

  • Hello!

    I have two tables with this info:

    TABLE 1

    COL1 COL2 COL3

    AAA BBB CCC

    QQQ WWW EEE

    AAA SSS DDD

    WWW EEE RRR

    BBB BBB BBB

    TABLE 2

    COL1 COL2 COL3 COL4

    b b b 343

    a a a 344

    c c c 345

    d d d 346

    e e e 347

    I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:

    COL1 COL2 COL3 COL4

    b b b 343

    a a a 344

    c c c 345

    d d d 346

    e e e 347

    AAA BBB CCC 348

    QQQ WWW EEE 349

    AAA SSS DDD 350

    WWW EEE RRR 351

    BBB BBB BBB 352

    I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

  • imba215 (2/17/2014)


    Hello!

    I have two tables with this info:

    TABLE 1

    COL1 COL2 COL3

    AAA BBB CCC

    QQQ WWW EEE

    AAA SSS DDD

    WWW EEE RRR

    BBB BBB BBB

    TABLE 2

    COL1 COL2 COL3 COL4

    b b b 343

    a a a 344

    c c c 345

    d d d 346

    e e e 347

    I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:

    COL1 COL2 COL3 COL4

    b b b 343

    a a a 344

    c c c 345

    d d d 346

    e e e 347

    AAA BBB CCC 348

    QQQ WWW EEE 349

    AAA SSS DDD 350

    WWW EEE RRR 351

    BBB BBB BBB 352

    I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

    Insert into [Table 1] (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3

    , (Select max(Col4) from [Table 1]) + ROW_NUMBER() over (Order by [Col1])

    from [Table 2]

    Russel Loski, MCSE Business Intelligence, Data Platform

  • Something like this?

    Insert TableB (col1, col2, col3, col4)

    select col1, col2, col3, ROW_NUMBER() over (order by newid()) + (select MAX(col4) from tableb)

    from tablea

    --edit--

    Looks like Russel was typing almost the exact same thing as I was...only faster. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thank you so much!

  • Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:

    declare @MaxId int

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

    Test script if you're interested:

    -- create some test tables and data

    create table #TableA (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10))

    create table #TableB (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10),Col4 int)

    declare @i int

    set @i = 1

    -- there are better ways to do this, but this is easy to understand

    while @i <=100000

    begin

    insert #TableB values ('a','b','c',@i)

    -- only add rows to A for every 10th row added to B

    if @i % 10 = 0

    begin

    insert #TableA values ('AAA','BBB','CCC')

    end

    set @i = @i + 1

    end

    select count(*) from #TableA

    -- 10000

    select count(*) from #TableB

    -- 100000

    set statistics io on

    -----------------------

    -- test without indexes

    -----------------------

    -- DON'T DO THIS ON A PRODUCTION SYSTEM

    dbcc dropcleanbuffers

    dbcc freeproccache

    -- correlated subquery version

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 10349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    -- clear the newly added data

    delete #TableB where Col4 > 10000000

    dbcc dropcleanbuffers

    dbcc freeproccache

    -- pre calculated id version

    declare @MaxId int

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableB'. Scan count 0, logical reads 10037, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    set statistics io off

    -- clear the newly added data

    delete #TableB where Col4 > 100000

    ------------------

    -- test with index

    ------------------

    create index ix1 on #TableB(col4)

    dbcc dropcleanbuffers

    dbcc freeproccache

    -- correlated subquery version

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 30194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    -- clear the newly added data

    delete #TableB where Col4 > 100000

    dbcc dropcleanbuffers

    dbcc freeproccache

    -- pre calculated id version

    declare @MaxId int

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableB'. Scan count 0, logical reads 30191, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    ----------------------------

    -- test with clustered index

    ----------------------------

    delete #TableB where Col4 > 100000

    drop index #TableB.ix1

    create clustered index ix1 on #TableB(col4)

    dbcc dropcleanbuffers

    dbcc freeproccache

    -- correlated subquery version

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 21551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    -- clear the newly added data

    delete #TableB where Col4 > 100000

    -- pre calculated id version

    declare @MaxId int

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    /*

    Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableB'. Scan count 0, logical reads 21548, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    set statistics io off

    ------------

    -- test time

    ------------

    -- run inserts 100 times

    create table #results (i int, t char(1), s datetime, e datetime, diff int)

    declare @s-2 datetime

    , @e datetime

    declare @i int

    set @i = 1

    while @i <= 100

    begin

    declare @MaxId int

    set @s-2 = getdate()

    -- variable version

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    set @e = getdate()

    insert #results select @i,'V', @s-2, @e, datediff(ms,@s,@e)

    delete #TableB where Col4 > 100000

    dbcc dropcleanbuffers

    dbcc freeproccache

    set @s-2 = getdate()

    -- correlated subquery version

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

    from #TableA

    set @e = getdate()

    insert #results select @i,'C', @s-2, @e, datediff(ms,@s,@e)

    delete #TableB where Col4 > 100000

    dbcc dropcleanbuffers

    dbcc freeproccache

    set @i = @i + 1

    end

    -- show the result

    select * from #results

  • irobertson (2/20/2014)


    Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:

    declare @MaxId int

    select @MaxId = max(col4) from #TableB

    insert #TableB (Col1, Col2, Col3, Col4)

    select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

    from #TableA

    But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

    I have seen cases where putting the aggregate into a local variable instead of doing it in the main query does indeed help. It depends on whether the optimizer is smart enough to realize that's something it only needs to do once.

    I would not have even considered a loop for this, mainly because of my mantra (below).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply