Fastest way to blank all records in one char field

Question

Fastest way to blank all records in one char field

Indianrock

SSC-Insane

Points: 20333
More actions
October 30, 2012 at 3:15 pm

#393496

We have a table with hundreds of thousands of social security numbers, which we want to blank. Table has 60 million records. I'm thinking changing the field from char(9) to char(1) then back again might be quick and just leave one digit in there. Quicker than an update statement? I'm running the script below on a test copy but expect it to take a very long time.
use dba_test
go
SET DEADLOCK_PRIORITY LOW
while (SELECT count(*) FROM [history] WHERE owner1ssn<>'')>0
begin
select top 500 idcode,owner1ssn into #tempdelete from otts_history where owner1ssn<>''
begin transaction
begin try
-- blank owner1ssn
update [history] set owner1ssn='' where idcode in (select idcode from #tempdelete)
commit
IF OBJECT_ID('tempdb..#tempdelete') IS NOT NULL
drop table #tempdelete
WAITFOR DELAY '00:00:10'
end try
begin catch
rollback
continue
end catch
end
go

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

What's the situation on idexes for the owner1ssn column? Do you have one and is it unique? Also, what is the PK of the table? That could turn out to be important depending on which method you use for deletion.

Last but not least, does the column have any foreign keys on it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

Also, here's what's going to happen if you try to change the column to CHAR(1) from T-SQL...

Msg 8152, Level 16, State 14, Line 1

String or binary data would be truncated.

The statement has been terminated.

If you do it in the designer mode from SSMS, it will copy the entire table which, for 60 million rows, will take a long time possbily blow your log file sky high.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Indianrock SSC-Insane Points: 20333 More actions · Answer 3

No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

Indianrock (10/30/2012)
No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.

There's a more insidious problem with that. The reason why dropping the column appears to run so quickly is that it doesn't actually drop the column or remove any of the data. It simply makes it unavailable to queries and the like. The data is still there and could be extracted fairly easily (even from a backup file) by someone intent on doing so. In order to get rid of the actual data, you'd have to rebuild the clustered index. If there is no clustered index, you'd have to build one and then drop it (although building a narrow unique clustered index could help the other indexes a whole lot)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Indianrock SSC-Insane Points: 20333 More actions · Answer 5

Thanks Jeff. There is a clustered primary key so I'll rebuild that. Since this will take a while I should probably drop all of the columns in question first ( 2 SSN fields, 2 date of birth fields and 2 driver's license fields -- only SSN has significant population ) , recreate them, then rebuild the clustered index. I have to do this on several DEV sql servers which have copies of our prod database, then on production. Yes having this data all over the dev environments is a big problem, and Safenet encryption is on its way, although a large part of the problem will be wiped out by what we're discussing here.

I think I can safely ignore the fact that the newly created columns are allowing nulls since this table contains data converted from our legacy environment, is never written to and rarely read. Plus, our application does not display the fields in question. The only other possible issue is that when the fields are recreated they wind up "at the bottom" of the column order for the table. Theoretically that shouldn't be an issue but I'll discuss it with our application developers.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

There may be another problem with doing that. If any of the apps have inserts without the "insert column list", the columns will literally have changed position as well as ordinal number and the inserts will either insert into incorrect columns or cause the apps to fail if they are written in such a manner.

For production and considering all of the unknowns, it may very well be easier and, certainly, safer to build a crawler to do the updates a bit faster than the one you currently have.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Indianrock SSC-Insane Points: 20333 More actions · Answer 7

Thanks again. I think I'll hold off on changing the production database until this last conversion from legacy finishes November 30th. After that there will never be another insert into the table in question.

Most of our application code generates sql via a .NET ORM and stored procedures are not used for the most part.

After dropping, recreating the columns and rebuilding the clustered index on my test system, I logged into our browser-based application in the environment using that database and no problems pulling up records in that table.

thanks,

Randy

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Good enough, Randy. Just wanted to make sure you knew all of the things that could go wrong so you could test them.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)