Clean up junk data

  • Hi All,

    I have a server which has 30 dbs on it.  In each database, I have the same table structure called Customer.  The problem with this tables in all 30 db is that it contains a lot of junk data.  My task is to create a store procedure and schedule this sp to run every night to clean up junk data.  Right now, I have to go into each database, delete the junk data in the Customer table by running a delete query which costs me a lot of time.  My Customer table looks like this

    CustomerId    Email                   Zip         Website               FName      LastName

    1                 test1@mail.com     92201     http://www.test1.com    Andy        Smith

    2                 null                      null        null                     null          null

    3                 test2@mail2.com    null        null                     null          null               

    4                 null                      93305     null                     null          null

    My clean data will contains no null data for all of the fields.  So, my table will look good if there is no CustomerId#2.  My delete query like this:

    delete from customer where customerid not in (select customerid from customer where email is not null or zip is not null or website is not null or fname is not null or lastname is not null).

    Is that possible for me to achieve this task?  I'd like to have a sp that loop through each db, perform the delete query against the Customer table in each db so that I don't have to manually delete data from each table one by one.  Thanks million.

    Minh Vu

  • you want to delete items where all the items are null, so use that logic instead of NOT IN and OR statements:

    create procedure pr_cleanup

    as

    begin

      if exists(select name from sysobjects where name='customer' and xtype='U') 

        delete from customer where email is null and zip is null and website is null and fname is null and lastname is null

    end

    go

    --now to do that for all databases, you might use the proc sp_msforeachdb as a scheduled job to run every day:

    exec sp_msforeachdb pr_cleanup

     

    you could avoid this issue altogether by adding a constraint that does not permit email to be null for example, but you would want to run that by the developers of the app, else they might get a lot of run time errors because the constraint prevents a db insert when the value is left blank:

    ALTER TABLE CUSTOMERS ALTER COLUMN EMAIL VARCHAR(100) NOT NULL

    alternatively, you should make the developers add code that prevents garbage data from being entered in the first place, by testing if therequired values are of sufficient length or other validations.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • What is wrong with this?

    delete from customer where email is null and zip is null and website is null and fname is null and lname is null

    That avoids the subquery. Then you can put that in a procedure to loop through the databases and perform the deletes, as you say.

  • The developers should have put a validation check to begin with so this cleanup operation wouldn't have been needed...as Lowell said - the "not null" constraint should definitely be added as soon as the cleanup sweep is done so at least this one table wouldn't have to be touched again...







    **ASCII stupid question, get a stupid ANSI !!!**

  • I believe the original poster wanted to allow any column to be NULL, but not all of them. Looking at the sample data they only want to lose CustomerID 2, and keep the others. That could not be enforced with a NOT NULL constraint.

  • Paul i was assuming he would use a loop to go thru all databases...and assumed that some of the databases were not related to his app.

    rather than letting the cursor fail when it finds a database that does not contain a table called customer, i thought the subquery to check for the existance of the table was appropriate.

     if exists(select name from sysobjects where name='customer' and xtype='U') 

        delete from customer where email is null and zip is null and website is null and fname is null and lastname is null

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Point taken Paul...however validations could still have been put in place to ensure this!

    "XML is like violence...If it doesn't solve your problem, you are not using enough of it" - Lol







    **ASCII stupid question, get a stupid ANSI !!!**

  • "XML is like violence...If it doesn't solve your problem, you are not using enough of it"

    exactly .

  • Sorry, Lowell, I completely mis-read your post originally (too fast), plus my reply was directed at the original poster.

    Sushila - agree, just wanted to make sure we were all understanding correctly.

  • Thank you so much for all of the input. The only problem I had is sp_msforeachdb. 

    When I run sp_msforeachdb myprocedure, the result occurred with the current db that I was on only.  It did not create the result for all of the dbs as I wished.  Is there any tricks behind the scene?

    Minh

  • Here's a link to where you can find more info...

     

    http://www.mssqlcity.com/FAQ/Devel/sp_msforeachdb.htm







    **ASCII stupid question, get a stupid ANSI !!!**

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply