October 30, 2012 at 3:15 pm
We have a table with hundreds of thousands of social security numbers, which we want to blank. Table has 60 million records. I'm thinking changing the field from char(9) to char(1) then back again might be quick and just leave one digit in there. Quicker than an update statement? I'm running the script below on a test copy but expect it to take a very long time.
use dba_test
go
SET DEADLOCK_PRIORITY LOW
while (SELECT count(*) FROM [history] WHERE owner1ssn<>'')>0
begin
select top 500 idcode,owner1ssn into #tempdelete from otts_history where owner1ssn<>''
begin transaction
begin try
-- blank owner1ssn
update [history] set owner1ssn='' where idcode in (select idcode from #tempdelete)
commit
IF OBJECT_ID('tempdb..#tempdelete') IS NOT NULL
drop table #tempdelete
WAITFOR DELAY '00:00:10'
end try
begin catch
rollback
continue
end catch
end
go
October 30, 2012 at 5:22 pm
What's the situation on idexes for the owner1ssn column? Do you have one and is it unique? Also, what is the PK of the table? That could turn out to be important depending on which method you use for deletion.
Last but not least, does the column have any foreign keys on it?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 5:32 pm
Also, here's what's going to happen if you try to change the column to CHAR(1) from T-SQL...
Msg 8152, Level 16, State 14, Line 1
String or binary data would be truncated.
The statement has been terminated.
If you do it in the designer mode from SSMS, it will copy the entire table which, for 60 million rows, will take a long time possbily blow your log file sky high.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 6:17 pm
No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.
October 31, 2012 at 8:15 am
Indianrock (10/30/2012)
No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.
There's a more insidious problem with that. The reason why dropping the column appears to run so quickly is that it doesn't actually drop the column or remove any of the data. It simply makes it unavailable to queries and the like. The data is still there and could be extracted fairly easily (even from a backup file) by someone intent on doing so. In order to get rid of the actual data, you'd have to rebuild the clustered index. If there is no clustered index, you'd have to build one and then drop it (although building a narrow unique clustered index could help the other indexes a whole lot)
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2012 at 8:49 am
Thanks Jeff. There is a clustered primary key so I'll rebuild that. Since this will take a while I should probably drop all of the columns in question first ( 2 SSN fields, 2 date of birth fields and 2 driver's license fields -- only SSN has significant population ) , recreate them, then rebuild the clustered index. I have to do this on several DEV sql servers which have copies of our prod database, then on production. Yes having this data all over the dev environments is a big problem, and Safenet encryption is on its way, although a large part of the problem will be wiped out by what we're discussing here.
I think I can safely ignore the fact that the newly created columns are allowing nulls since this table contains data converted from our legacy environment, is never written to and rarely read. Plus, our application does not display the fields in question. The only other possible issue is that when the fields are recreated they wind up "at the bottom" of the column order for the table. Theoretically that shouldn't be an issue but I'll discuss it with our application developers.
October 31, 2012 at 2:38 pm
There may be another problem with doing that. If any of the apps have inserts without the "insert column list", the columns will literally have changed position as well as ordinal number and the inserts will either insert into incorrect columns or cause the apps to fail if they are written in such a manner.
For production and considering all of the unknowns, it may very well be easier and, certainly, safer to build a crawler to do the updates a bit faster than the one you currently have.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 31, 2012 at 3:47 pm
Thanks again. I think I'll hold off on changing the production database until this last conversion from legacy finishes November 30th. After that there will never be another insert into the table in question.
Most of our application code generates sql via a .NET ORM and stored procedures are not used for the most part.
After dropping, recreating the columns and rebuilding the clustered index on my test system, I logged into our browser-based application in the environment using that database and no problems pulling up records in that table.
thanks,
Randy
October 31, 2012 at 7:10 pm
Good enough, Randy. Just wanted to make sure you knew all of the things that could go wrong so you could test them.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply