Increment query based off two tables

Question

Increment query based off two tables

John215

SSC Eights!

Points: 871
More actions
February 17, 2014 at 9:33 am

#285792

Hello!
I have two tables with this info:
TABLE 1
COL1 COL2 COL3
AAA BBB CCC
QQQ WWW EEE
AAA SSS DDD
WWW EEE RRR
BBB BBB BBB
TABLE 2
COL1 COL2 COL3 COL4
b b b 343
a a a 344
c c c 345
d d d 346
e e e 347
I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:
COL1 COL2 COL3 COL4
b b b 343
a a a 344
c c c 345
d d d 346
e e e 347
AAA BBB CCC 348
QQQ WWW EEE 349
AAA SSS DDD 350
WWW EEE RRR 351
BBB BBB BBB 352
I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Russel Loski SSCertifiable Points: 7059 More actions · Answer 1

imba215 (2/17/2014)
Hello!
I have two tables with this info:
TABLE 1
COL1 COL2 COL3
AAA BBB CCC
QQQ WWW EEE
AAA SSS DDD
WWW EEE RRR
BBB BBB BBB
TABLE 2
COL1 COL2 COL3 COL4
b b b 343
a a a 344
c c c 345
d d d 346
e e e 347
I want to insert TABLE 1 into TABLE 2 with a query that will auto increment to COL4 looking like this:
COL1 COL2 COL3 COL4
b b b 343
a a a 344
c c c 345
d d d 346
e e e 347
AAA BBB CCC 348
QQQ WWW EEE 349
AAA SSS DDD 350
WWW EEE RRR 351
BBB BBB BBB 352
I know this can be done easily by just altering the column to have an auto-increment datatype, but I currently cannot do that at this moment. Any help on this would be awesome.

Insert into [Table 1] (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3

, (Select max(Col4) from [Table 1]) + ROW_NUMBER() over (Order by [Col1])

from [Table 2]

Russel Loski, MCSE Business Intelligence, Data Platform

Sean Lange SSC Guru Points: 286590 More actions · Answer 2

Something like this?

Insert TableB (col1, col2, col3, col4)

select col1, col2, col3, ROW_NUMBER() over (order by newid()) + (select MAX(col4) from tableb)

from tablea

--edit--

Looks like Russel was typing almost the exact same thing as I was...only faster. 😉

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

John215 SSC Eights! Points: 871 More actions · Answer 3

John215

SSC Eights!

Points: 871

February 17, 2014 at 2:42 pm

#1689664

Thank you so much!

irobertson Hall of Fame Points: 3695 More actions · Answer 4

Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

Test script if you're interested:

-- create some test tables and data

create table #TableA (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10))

create table #TableB (Col1 varchar(10),Col2 varchar(10),Col3 varchar(10),Col4 int)

declare @i int

set @i = 1

-- there are better ways to do this, but this is easy to understand

while @i <=100000

begin

insert #TableB values ('a','b','c',@i)

-- only add rows to A for every 10th row added to B

if @i % 10 = 0

begin

insert #TableA values ('AAA','BBB','CCC')

end

set @i = @i + 1

end

select count(*) from #TableA

-- 10000

select count(*) from #TableB

-- 100000

set statistics io on

-----------------------

-- test without indexes

-----------------------

-- DON'T DO THIS ON A PRODUCTION SYSTEM

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 10349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 10000000

dbcc dropcleanbuffers

dbcc freeproccache

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 349, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 10037, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

set statistics io off

-- clear the newly added data

delete #TableB where Col4 > 100000

------------------

-- test with index

------------------

create index ix1 on #TableB(col4)

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 30194, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 100000

dbcc dropcleanbuffers

dbcc freeproccache

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 30191, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

----------------------------

-- test with clustered index

----------------------------

delete #TableB where Col4 > 100000

drop index #TableB.ix1

create clustered index ix1 on #TableB(col4)

dbcc dropcleanbuffers

dbcc freeproccache

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 21551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

-- clear the newly added data

delete #TableB where Col4 > 100000

-- pre calculated id version

declare @MaxId int

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

/*

Table '#TableB'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableB'. Scan count 0, logical reads 21548, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TableA'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

*/

set statistics io off

------------

-- test time

------------

-- run inserts 100 times

create table #results (i int, t char(1), s datetime, e datetime, diff int)

declare @s-2 datetime

, @e datetime

declare @i int

set @i = 1

while @i <= 100

begin

declare @MaxId int

set @s-2 = getdate()

-- variable version

select @MaxId = max(col4) from #TableB

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId

from #TableA

set @e = getdate()

insert #results select @i,'V', @s-2, @e, datediff(ms,@s,@e)

delete #TableB where Col4 > 100000

dbcc dropcleanbuffers

dbcc freeproccache

set @s-2 = getdate()

-- correlated subquery version

insert #TableB (Col1, Col2, Col3, Col4)

select Col1, Col2, Col3, row_number() over (order by newid()) + (select max(Col4) from #TableB)

from #TableA

set @e = getdate()

insert #results select @i,'C', @s-2, @e, datediff(ms,@s,@e)

delete #TableB where Col4 > 100000

dbcc dropcleanbuffers

dbcc freeproccache

set @i = @i + 1

end

-- show the result

select * from #results

Dwain Camps SSC Guru Points: 86978 More actions · Answer 5

irobertson (2/20/2014)
Interesting this. Intuitively, my head says that pre-calculating the max value should be more efficient when compared with using a correlated subquery:
declare @MaxId int
select @MaxId = max(col4) from #TableB
insert #TableB (Col1, Col2, Col3, Col4)
select Col1, Col2, Col3, row_number() over (order by newid()) + @MaxId
from #TableA
But, testing says otherwise. The variable based approach is significantly worse. I wonder why this is. It's interesting. I guess I'm (simplistically perhaps) expecting the correlated query to have to execute for each row of the main query. But it clearly doesn't.

I have seen cases where putting the aggregate into a local variable instead of doing it in the main query does indeed help. It depends on whether the optimizer is smart enough to realize that's something it only needs to do once.

I would not have even considered a loop for this, mainly because of my mantra (below).

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St