November 30, 2006 at 3:56 am
I have 2 tables (more than 10 million rows each).
table1 has 3 columns
a - datatype - varchar, b - varchar, c - float
there is no primary key or index
there is data in column a,b & all the rows of column c have null value
table2 has same 3 columns
a - varchar, b - varchar, c - varchar
there is no primary key or index
there is data in column a,b,c...
I have a simple procedure to put the c value of table2 into c value of table1.
The procedure looks like this
------------------------------------------------------------
create procedure my_procedure as
begin
declare @a nvarchar(255),@b nvarchar(255)
declare @C nvarchar(255)
declare c1 cursor dynamic
for select a,b,c from table2
open c1
fetch next from c1 into @a,@b,@c
while(@@fetch_status = 0)
begin
update table1
set c = @C
where a = @a and
b = @b-2
print @C
print @a
fetch next from c1 into @a,@b,@c
if @@error <> 0
print 'exception 1 ' + @@error
end
close c1
deallocate c1
end
------------------------------------------------------------
It takes arround 17 hrs to update 2000 records, is there a way to optimize this procedure code? (I dont have the rights to add keys to table or index..etc. only code changes)
Its actually data transfer of column c from table1 to table2
November 30, 2006 at 5:38 am
Hi Srinidhi,
you are already receiving responses in the original topic where you mentioned the same question.
http://www.sqlservercentral.com/forums/shwmessage.aspx?forumid=8&messageid=326295&p=2
And I agree with Gila that there is little use trying to rewrite the code unless your table with 10M rows is properly indexed. Gila has already posted a set-based solution for your update (based on the info you supplied) which will run quick if the tables are indexed.
You said you can't change indexes - but you can at least look at them, I suppose... If necessary indexes are missing, contact the person who can create indexes and have an index added where necessary.
November 30, 2006 at 8:04 am
Why do you need to use a cursor?
The big 'C' is a big No No in SQL Server.
December 1, 2006 at 10:27 am
here is a solution that takes 2 minutes and 23 seconds for 1 million records. Obviously the time taken will be exponential for 10 such sets, because the scan size will increase with every insert.
I can visualize situations where indexing is not feasible, wherein you may want to simultaneously use the same set of tables for updation as well as sequential read access
Other than that, I am not sure why your DBA is not indexing the tables. Please send him Steve Jones' editorial from yesterdays newsletter.
--Step 1
--======
if exists(select * from sysobjects where name = 't1')
drop table t1
create table t1(f1 varchar(10), f2 varchar(10), f3 float)
if exists(select * from sysobjects where name = 't2')
drop table t2
create table t2(f1 varchar(10), f2 varchar(10), f3 varchar(6))
--Step 2
--======
declare @count int
set @count = 1
while @count <= 1000000
begin
insert into t1 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),null)
set @count = @count + 1
end
--Step 3
--======
declare @count int
set @count = 1
while @count <= 1000000
begin
insert into t2 values(cast(@count as varchar(10)),cast(@count + 1 as varchar(10)),'99.999')
set @count = @count + 1
end
--Step 4
--======
--time taken for 1 million records - 2 minutess and 23 seconds
update t1
set t1.f3 = cast(t2.f3 as float)
from t2
where t1.f1 = t2.f1
and t2.f2 = t2.f2
December 1, 2006 at 11:31 am
You can also think of using query to retrieve all records of interest .. see sample below
select t1.a, t1.b, t2.c
from table1 t1 <left> join table2 t2 on
t1.a = t2.a and t1.b = t2.b
<Use left join if there is no 1:1 between the tables otherwise strict join will be effective>
Use BCP to bulk copy the results of query to a file..
Truncate table table1 and then using BULK Insert you can upload the contents of file to the table table1.
All the best..
Ignas Chilewa
Dublin, OH
December 5, 2006 at 5:10 am
Now that i have created index of type clustered the preformance of my procedure should increase. But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. Let me explain the problem very clearly.
There are 2 tables:
Table1
--------------------------------------------------------
a(varchar) | b(varchar) | c(varchar) | d(float)
---------------------------------------------------------
has value | has value | null | has value
---------------------------------------------------------
There are arround 13 million records in this table.
The combination of a and b is unique.
Table2
------------------------------------------------
a(varchar) | b(varchar) | d(float)
------------------------------------------------
has value | has value | has value
-------------------------------------------------
There are arround 13 million records in this table.
The combination of a and b is unique.
I have created Clustered index on the column a of both tables(note: column a has more repeating values)
I have to transfer the value in column d in Table2 to column c of Table1
December 5, 2006 at 8:44 am
this should still work
--Step 4
--======
--time taken for 1 million records - 2 minutess and 23 seconds
update t1
set t1.f3 = cast(t2.f3 as float)
from t2
where t1.f1 = t2.f1
and t2.f2 = t2.f2
December 5, 2006 at 4:34 pm
"But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "
You're saying that there are only about 2000 common rows on the millions of records?
This should be faster...
First, Create a composite index on a and b for both tables.
------------
-- Assumming b has less repeating values (reverse the columns otherwise)
Create index IDX_1 on Table1(b,a)
GO
Create index IDX_1 on Table2(b,a)
GO
-- Use #Temp table to store matching rows (about 2000 only)
Insert into #Temp
Select T1.a, T1.b
From Table1 T1
Join Table2 T2
ON T1.a = T2.a and T1.b = T2.b
-- use #Temp to limit the selection to be updated
Update T1
Set T1.c = T2.d
From Table1 T1
Join Table2 T2
ON T1.a = T2.a and T1.b = T2.b
Join #Temp T
ON T1.a = T.a and T1.b = T.b
GO
drop table #Temp
-- Note: I'm assumming that the execution plan for Jambu's statement will use a Hash Match algorithm since columns a and b are varchars. This will fill up your memory, thereby resulting to the slow response time.
December 6, 2006 at 9:36 am
actually
"But still i am afraid to run the procedure as it took 17 hours to update 2000 matching records. "
i stopped executing the procedure in between!
Thanks !!!!!!!!!
creating the index has definitly increased the speed....now the update is happening in minutes!...
Now that i have created clustered index....i will let u know the exact time taken asap..
FYI....I created index INDX1 only on column a ....
Will the creating of index INDX2 on column b improve the performance further?
December 6, 2006 at 9:46 am
yes indexing on the second column will increase speed.
In fact I concatenated both the columns and created a new column in the table and used that new column for the comparison, and it was way faster than the two column indexing.
I also removed the conversion, because I saw there is an implicit conversion happening for f3
December 7, 2006 at 12:49 am
creating a clustered index on column a took 14 hrs to update arround 1.5 million rows.
The message appears when created a index on (a,b): (a has more repeating values)
"Warning! The maximum key length is 900 bytes.
The index '
IDX_1' has maximum length of 1020 bytes.
For some combination of large values, the insert/update operation will fail."
now created a index on 'b' and clustered index on 'a' of both the tables
December 7, 2006 at 7:57 am
Srinidhi,
Sorry to get into sermon mode. But...
The job at hand is one of huge volume.
To get to a viable solution, we need to understand
the data in and out. Till your last post I did not
have any clue that field1 and field2 together could be
1020 bytes. Your first post declares 255 byte variables, and
I thought that is what it is.
SQL Server Index length is 900 bytes. You cannot have a
varchar(901) and declare it as a PK or try to index such a column.
Even now it is not sure what is the size of your varchar fields
Is it more than 900. In that case even your latest solution could
fail because there is no guarantee that such large size data may
not come in at some point in time.
Because you did not get the warning when you indexed each field indivudually
I am assuming the field lengths are less than or equal to 900
Some of the possible alternatives I may look for are:
1. Can I use the files as is rather than updating every time.
2. If all files are needed for queries, can We look at Horizontal Partitioning
as an alternative. SQL 2000 Views help/optimize in a nice way for Horizontal
partitioning.
3. Can I use some kind of a distributed mechanism.
If it is a real world issue you are facing in your company, please try to discuss it
with some db gurus who have experience handling such voluminous data.
This does not seem to be a situation where we can apply some adhoc solution.
It could prove disastrous. These are the kinds of situations (where each iteration
of the test could cost you a day) Steve says could cost us our jobs as DBAs.
It is a nice problem you brought up. For me it looks like an application for mainframes.
Are you on a mainframe by any chance. I am not even sure
if SQL Server has a mainframe version.
I wish I could help you. But it is taking so much time to test each time, and I have
my own job to do.
But keep updating your experiences. It is an excellent problem.
December 7, 2006 at 2:38 pm
Your problem may not just be in SQL Server. It shouldn't take 14hrs for 1.5million rows only. Can you give us your server specifications please? I'm thinking you might be low on resources (i.e. memory).
My post on using a temp table should help out. It allows you to use the tempdb to store the common values instead of keeping them in memory when SQL server performs the join operation.
December 7, 2006 at 2:41 pm
i agree with Ronald San Juan
There is something weird going on in your env
December 19, 2006 at 12:00 am
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply