Dirty data

  • wat is dirty data. how many ways is there if i want remove the dirty data in sql server 2005

  • Best way is to make sure only clean data goes in. 😎

    The probability of survival is inversely proportional to the angle of arrival.

  • yeah, there's no built in way to determine what data is "clean" or "dirty", it's a decision based on analysis...and one man's definition of dirty is not the same as anothers.

    If you can give us a concrete example of "dirty data" in the form of CREATE TABLE #Example / INSERT INTO #EXAMPLE, we can offer suggestions if you tell us what is "wrong" witht eh data.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • can u explain any example with dirty data,

    thats very helpfull for me

  • that's the point...it's a business decision...some examples:

    1. Data was entered in all capital letters(SMITH) biz rule says it should be proper cased.

    2. duplicate data exists for the same email address(sam smith with ssmith@gmail and samual smith with same email ssmith@gmail )

    3. address information has city name but no zip code (have to look it up?)

    4. similarly, address information has zip code but no city name.

    5. email address that were collected might be fake, becasue they do not follow the email address rules: email 'asdf' does not follow the pattern of [something]@[something].[valid tld]

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Dirty data could include:

    text files that put alphanumeric data into a column that would import to a numeric destination.

    typos in the years (i.e. 2009 becomes 3009 by mistake). Really bad typos can cause the year to go outside the range allowed by the data type.

    duplicate address records because someone mistyped one of the fields.

  • Are you asking about dirty data or dirty reads?

  • "Dirty data" doesn't have a precise definition. One definition of it is data inside the system in an uncommitted state. It's in the middle of a transaction and therefore, dirty. But that's not one that you have to do anything to "clean up." When the transaction commits or rolls back, the "dirty" data goes away.

    You may be talking about incorrect data. Meaning the data entry was 'Tulsa, New York' instead of 'Tulsa, Oklahoma' or 'NY' instead of 'NYC.' In this case, you have to do two things, first, design the database such that only the correct information can be put into the database. Second, fix the data. This can be a manual and tedious process. But for some data, say addresses, there are services that can validate an address.

    Other than that, it really depends on what you mean by "dirty"

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • mupparaju78 (7/20/2010)


    wat is dirty data. how many ways is there if i want remove the dirty data in sql server 2005

    Firstly... would you mind in defining "dirty data"?

    Secondly... as a DBA you shouldn't be concerned about this, user community owns the data; DBA does not owns it. DBA may be called in to help if "the business" finds out something went wrong with the data.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • As the others have expressed, there are many ways to interpret dirty data, but only one way to find it - analysis. You need to understand the business requirements, technical requirements of the application, and the data in the database.

    Data that does not match business requirements or intended application use can be viewed as dirty data. Data that cannot be uniquely identified could be considered dirty data.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply