September 9, 2004 at 11:41 am
Hello.
I have a table with 14+ mil rows.
I've appended a column to the table BACKUP_DATE (smalldatetime).
All BACKUP_DATE's are NULL.
I wish to update the first 14mil to say, 1/1/2004
I wish to update all remaining records to say 2/2/2004
The latter I can do UPDATE where BACKUP_DATE is null.
How best can I go about the first update?
No way to use the an index.
You pick the poision (VB.NET, C# or T-SQL from QUERY ANALYZER).
Thanks!
September 9, 2004 at 12:05 pm
OMG! This is probably NOT the best way of doing it but I was desperate
I added an IDENTITY column to the table @ last check, Enterprise manager is still populating the column
MUHHAHAHAHAHA!
Sure hope it doesn't crash the server!!!
Anyway, still looking for the optimum solution
September 10, 2004 at 2:51 am
last effort at typing not saved for some reason so here goes for a 2nd time...
you need to manage the size of the transaction log...a WHILE loop should suffice.
recordsupd = 1
while recordsupd > 0
begin trans
set rowcount 10000
update tablex
set cola = y
where colb = condx
recordsupd = @@recordcount
commit trans
loop
something like the above should control the size of your transactinn log....search here for other examples....adjust the 10000 to a number of your liking.
also....it's best to avoid using EM to database design/data changes....the GUI is uncontrollable....using DDL/DML statements in QA gives more control.
September 10, 2004 at 4:02 am
Shouldn't the statement
recordsupd = @@recordcount
be
recordsupd = @rowcount
Pete
September 10, 2004 at 4:02 am
oops mistype, I meant
recordsupd = @@rowcount
DOH!
Pete
September 10, 2004 at 4:05 am
You have a table WITHOUT an Identity column ? Is there a Primary key at all ?
Query Analyzer with a WHERE clause would be the way to go. If your data is sorted (cause of your primary key) then pick the row you want to update till, and use it in your WHERE.. as in:
UPDATE MyTable SET BACKUP_DATE = '1/1/2004' WHERE PrimaryKeyColumns < The14MillionRowPrimaryKeyValues
Julian Kuiters
juliankuiters.id.au
September 10, 2004 at 4:21 am
peter..."recordsupd = @@recordcount"...written in haste!!!
wanted to introduce the principle....not concerned (this time) to solve 100% of the problem.
September 10, 2004 at 8:38 am
OMG! I just lost my first reply as well!
I had a clustered index on a field that is hit often and produces a sizable recordet. From that sense, the index seemed appropriate. However, I believe it interlaced the physical inserts so that the sequential insert order was destroyed
Not sure why you mention the trans log as I'd just prefer to turn it off to speed performance.
Thanks for the QA tip... I depend on EM because I'm a programmer 'trying' to be a SQL gurur but alas, my bytes show through
I do appreciate the responses from all parties concerned!
Does it look like I'm good to go?
How would I go about processing all 14mil?
Do I do an outer loop in addition to the one below?
recordsupd = 1
while recordsupd > 0
begin trans
set rowcount 10000
update tablex
set cola = y
where colb = condx
recordsupd = @@rowcount
commit trans
loop
September 10, 2004 at 9:01 am
code as is should be enough...
basically the update keeps getting called until there are no more records to be updated....with at most 10000 records being updated in a batch.
try it on a smallish file!!!...just to see the technique working....with a smaller number instead of the 10000....(a number i picked off the top of my head)
September 10, 2004 at 9:29 am
Yes, the plan is to process a small table first then to move on to bigger issues.
However, Assuming I wanted to update a 20,000 row table and chose to break the update into 2 passes, would I require an outer loop to wrap the inner?
Forgive my VB coding below as I am not versed enough in T-SQL...
for recordsupd = 1 to 20000 step 10000
rc = recordsupd + 10000
while recordsupd > 0
begin trans
set rowcount rc
update tablex
set cola = y
where colb = condx
recordsupd = @@rowcount
commit trans
loop
next
September 10, 2004 at 9:37 am
no...just the inner loop is required.
try it...with a small number/small dataset!!!
don't worry about the vb style....that's my environment too.
September 10, 2004 at 9:40 am
i think you are misunderstanding the nature of "set rowcount".
it doesn't refer to specific records...it limits the number of records being processed by all subsequent select/update/delete statements on the connection until it's value is adjusted/cleared(to)
see bol for details/examples.
September 10, 2004 at 9:47 am
As long as your where clause restricts your result set to the first 14 million records, that will work. Otherwise it will update all records in the table in whatever increment you have chosen.
Here's another solution if you're not worried about ordering, it updates the table in 100,000 record batches. If ordering is an issue and you can use a PK column, replace the where clause below with one like Julian's above with a slight modification 'where PKColumn <= 14Millionth_Record_Value and BACKUP_DATE is not null'.
set rowcount 100000
declare @i as int
set @i = 0
while @i < 140
begin
update TABLE_NAME
set BACKUP_DATE = '1/1/2004'
where BACKUP_DATE is null
set @i = @i + 1
end
If you aren't worried about the T-Log, you could use Julian's solution.
No matter which solution you choose, you should probably schedule to run during off peak hours to avoid any performance problems.
Hope this helps.
Jarret
September 10, 2004 at 11:16 am
Jaret:
Your solution is wonderful but from the start, my requirement WAS to consider the update sequentially.
I must update the FIRST or TOP 14 mil rows and then update what's left over using a geneal UPDATE statement.
Andrew:
As I recall when using a transaction log, records are pushed there BEFORE they hit the table to facilitate COMMIT/ROLLBACK.
If the log is turned off, can I still use your solution?
Julian:
Your solution was what I had in mind as I attempted to add an INDENTY column to the table. However, SQL SERVER 'choked' (undestandably) and after locking user out for perhaps a few hours, finaly returned control WITHOUT having added my IDENTITY column.
This has been a wonderful learning experience for me and I'm thankful for all of your input.
I believe that I must pursue an alternative course of action if I'm to beat my deadline.
By this, I'm reloading the table WITH a BACKUP_DATE
Of course, I left the old table in place in case a workable solution "popped up'.
If I've missed it, my apologies.
It seems that all solutions somehow depend on conditions to be 'just right' and not sure they are.
Please however, correct me if wrong
September 10, 2004 at 11:57 am
The solution I gave will update 14 million records since it is in the while loop (140 iterations of 100,000 records). However, the "first" 14 million records can not be guaranteed since there is no primary key. It would all depend on how SQL would return the data to you.
If I may ask, how are you to determine the "top" 14 million records?
Good luck!!!
Jarret
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply