Fastest way to blank all records in one char field

  • We have a table with hundreds of thousands of social security numbers, which we want to blank. Table has 60 million records. I'm thinking changing the field from char(9) to char(1) then back again might be quick and just leave one digit in there. Quicker than an update statement? I'm running the script below on a test copy but expect it to take a very long time.

    use dba_test

    go

    SET DEADLOCK_PRIORITY LOW

    while (SELECT count(*) FROM [history] WHERE owner1ssn<>'')>0

    begin

    select top 500 idcode,owner1ssn into #tempdelete from otts_history where owner1ssn<>''

    begin transaction

    begin try

    -- blank owner1ssn

    update [history] set owner1ssn='' where idcode in (select idcode from #tempdelete)

    commit

    IF OBJECT_ID('tempdb..#tempdelete') IS NOT NULL

    drop table #tempdelete

    WAITFOR DELAY '00:00:10'

    end try

    begin catch

    rollback

    continue

    end catch

    end

    go

  • What's the situation on idexes for the owner1ssn column? Do you have one and is it unique? Also, what is the PK of the table? That could turn out to be important depending on which method you use for deletion.

    Last but not least, does the column have any foreign keys on it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also, here's what's going to happen if you try to change the column to CHAR(1) from T-SQL...

    Msg 8152, Level 16, State 14, Line 1

    String or binary data would be truncated.

    The statement has been terminated.

    If you do it in the designer mode from SSMS, it will copy the entire table which, for 60 million rows, will take a long time possbily blow your log file sky high.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.

  • Indianrock (10/30/2012)


    No indexes on that field. I decided to try just dropping the column and recreating it. Only issue was that it had to be recreated with NULLs allowed so I'd have to set all records to "" (blank) before an alter table command could be used to set it to Not NULL.

    There's a more insidious problem with that. The reason why dropping the column appears to run so quickly is that it doesn't actually drop the column or remove any of the data. It simply makes it unavailable to queries and the like. The data is still there and could be extracted fairly easily (even from a backup file) by someone intent on doing so. In order to get rid of the actual data, you'd have to rebuild the clustered index. If there is no clustered index, you'd have to build one and then drop it (although building a narrow unique clustered index could help the other indexes a whole lot)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff. There is a clustered primary key so I'll rebuild that. Since this will take a while I should probably drop all of the columns in question first ( 2 SSN fields, 2 date of birth fields and 2 driver's license fields -- only SSN has significant population ) , recreate them, then rebuild the clustered index. I have to do this on several DEV sql servers which have copies of our prod database, then on production. Yes having this data all over the dev environments is a big problem, and Safenet encryption is on its way, although a large part of the problem will be wiped out by what we're discussing here.

    I think I can safely ignore the fact that the newly created columns are allowing nulls since this table contains data converted from our legacy environment, is never written to and rarely read. Plus, our application does not display the fields in question. The only other possible issue is that when the fields are recreated they wind up "at the bottom" of the column order for the table. Theoretically that shouldn't be an issue but I'll discuss it with our application developers.

  • There may be another problem with doing that. If any of the apps have inserts without the "insert column list", the columns will literally have changed position as well as ordinal number and the inserts will either insert into incorrect columns or cause the apps to fail if they are written in such a manner.

    For production and considering all of the unknowns, it may very well be easier and, certainly, safer to build a crawler to do the updates a bit faster than the one you currently have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks again. I think I'll hold off on changing the production database until this last conversion from legacy finishes November 30th. After that there will never be another insert into the table in question.

    Most of our application code generates sql via a .NET ORM and stored procedures are not used for the most part.

    After dropping, recreating the columns and rebuilding the clustered index on my test system, I logged into our browser-based application in the environment using that database and no problems pulling up records in that table.

    thanks,

    Randy

  • Good enough, Randy. Just wanted to make sure you knew all of the things that could go wrong so you could test them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply