July 20, 2010 at 7:44 am
wat is dirty data. how many ways is there if i want remove the dirty data in sql server 2005
July 20, 2010 at 8:20 am
Best way is to make sure only clean data goes in. 😎
The probability of survival is inversely proportional to the angle of arrival.
July 20, 2010 at 8:26 am
yeah, there's no built in way to determine what data is "clean" or "dirty", it's a decision based on analysis...and one man's definition of dirty is not the same as anothers.
If you can give us a concrete example of "dirty data" in the form of CREATE TABLE #Example / INSERT INTO #EXAMPLE, we can offer suggestions if you tell us what is "wrong" witht eh data.
Lowell
July 20, 2010 at 8:42 am
can u explain any example with dirty data,
thats very helpfull for me
July 20, 2010 at 8:52 am
that's the point...it's a business decision...some examples:
1. Data was entered in all capital letters(SMITH) biz rule says it should be proper cased.
2. duplicate data exists for the same email address(sam smith with ssmith@gmail and samual smith with same email ssmith@gmail )
3. address information has city name but no zip code (have to look it up?)
4. similarly, address information has zip code but no city name.
5. email address that were collected might be fake, becasue they do not follow the email address rules: email 'asdf' does not follow the pattern of [something]@[something].[valid tld]
Lowell
July 20, 2010 at 8:52 am
Dirty data could include:
text files that put alphanumeric data into a column that would import to a numeric destination.
typos in the years (i.e. 2009 becomes 3009 by mistake). Really bad typos can cause the year to go outside the range allowed by the data type.
duplicate address records because someone mistyped one of the fields.
July 20, 2010 at 9:16 am
Are you asking about dirty data or dirty reads?
July 20, 2010 at 9:40 am
"Dirty data" doesn't have a precise definition. One definition of it is data inside the system in an uncommitted state. It's in the middle of a transaction and therefore, dirty. But that's not one that you have to do anything to "clean up." When the transaction commits or rolls back, the "dirty" data goes away.
You may be talking about incorrect data. Meaning the data entry was 'Tulsa, New York' instead of 'Tulsa, Oklahoma' or 'NY' instead of 'NYC.' In this case, you have to do two things, first, design the database such that only the correct information can be put into the database. Second, fix the data. This can be a manual and tedious process. But for some data, say addresses, there are services that can validate an address.
Other than that, it really depends on what you mean by "dirty"
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 20, 2010 at 9:47 am
mupparaju78 (7/20/2010)
wat is dirty data. how many ways is there if i want remove the dirty data in sql server 2005
Firstly... would you mind in defining "dirty data"?
Secondly... as a DBA you shouldn't be concerned about this, user community owns the data; DBA does not owns it. DBA may be called in to help if "the business" finds out something went wrong with the data.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 20, 2010 at 12:34 pm
As the others have expressed, there are many ways to interpret dirty data, but only one way to find it - analysis. You need to understand the business requirements, technical requirements of the application, and the data in the database.
Data that does not match business requirements or intended application use can be viewed as dirty data. Data that cannot be uniquely identified could be considered dirty data.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply