Data Cleansing

  • Hi,

    What is mean by Data Cleansing ? Give me some approaches to handle Data Cleansing ? 

    Tell me the step by step processs to do data cleansing ? or give me some tips about Data Cleansing .

    Regards

    Karthik

    karthik

  • There is no universal answer to this, but basically data cleansing is getting the individual pieces of data to look the way you want, consistently. With that said, it typically doesn't mean that the data is correct, just that it look the same as the other data in the same columns.

    Cleansing of data could mean anything from formatting phone-numbers consistently to splitting a single address field out into its multiple components, with just about everything else imaginable in between. It can even include dictionary lookups, custom table lookups, etc. At some places, it's perfectly acceptable to hire people to do this manually, while other situations will need an automated solution that might run daily, or even more often.

    One approach to handle the process is to find out what the business wants cleaned. Other times, you'll be able to see the problem with the data yourself, and might proactively take care of it, assuming that that is acceptable in your environment.

    In other words, you seem to be looking for a solution to a problem that may or may not exist, and if it does, may or may not be well-defined. Determine if there are issues, and then determine what they are, and you'll be able to tackle them at that point.

  • Presumably you are doing this in the context of a data warehouse. If so, you should be aware that this is almost always the most complicated and time intensive part of the process. As already noted, this is everything from eliminating duplicate information to standardizing the meaning of data to correcting incorrectly entered information. If not done properly, however, you risk providing the decision makers with false information. Ensure you allocate time to do this properly.

  • I'm thinking it's a test question...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I usually use a toothbrush and shoe polish.  If you work your way from heavy polish to the finer grades, you will have very shiny data when you are done.

  • I prefer Comet - it is chlorinated.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • a relevant google result:

    http://en.wikipedia.org/wiki/Data_cleansing

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply