August 1, 2006 at 4:04 pm
Dirty Data
I saw an interesting article on Dirty Data and a calculator to determine its cost for a business. It opens with a look at the large IT budgets many companies have to support a myriad of applications, but most of the applications cannot "talk" to each other in a meaningful way because of one problem.
The data is not standardized.
It's a problem DBAs have faced for years, and one which me must intimately work against as we look to transfer data between systems with complex scripts and mappings designed to try and get data to match up in the world. And it's one of the reasons that DTS was so heavily used and why Integration Services seems to be the most popular part of SQL Server 2005.
Developing standard data definitions is hard. Especially when you have distributed teams and multiple developers and DBAs working on systems. Even if you could get everyone to agree on the various data points, even getting them implemented in all new systems would be hard.
People want to develop things their own way. Being told that you need to track certain pieces of information in a standard way often means more work than it seems necessary for many small systems. And that means that people will take shortcuts and not implement things the same way as elsewhere.
Which becomes a problem when those little systems grow up and start getting used widely in the enterprise.
The first step seems to be a complete overhaul of your databases to ensure that the metadata, the "data about the data", is in synch. Which means mapping firstname/lastname/Customername/fullname and similar fields together somehow. It means implementing standards and then empowering someone to follow up and ensure they are being implemented. And it will mean having vendors customize some software or abandoning it when it doesn't fit your model.
Or it can be a complex metadata modeling scheme in SQL Server 2005 and Integration Services that stores the meanings behind the various fields and the mappings between them as well as facilitating the extraction and transformation of that data between systems.
It's a large battle, but a great time to be a SQL Server DBA and SSIS expert.
Steve Jones
August 2, 2006 at 1:54 am
hi all of you,
In our department (around 60 individuals, outsourcing resources included) every other minute everyone do as please, it’s hard to try standardize anything when you only get templates (.DOC) well-formed and filled but when you see the code of our applications…the never ending history. I am talking about T-Sql procedures, vb functions, asp pages and critical things as how they treat the data into its development framework...treatment is sometimes not very careful.
Sometimes, I’m feel as a policeman, pursuing people and explanining why there are “do” and “donts” in IT universe. In any case, either DBA or boss won't reach that their resources do the things like they would like.
In regard to SSIS, nowadays, when lots of developers aren’t totally familiarised yet with DTS technology you get them involved in SSIS.jezz,,
Yes, a large battle.
Moreover, we are facing others "drawbacks" as Microsoft responsibles. Organize all the information with the other DBA's... Oracle guys. We've got DTS which pull data from oracle or at the reverse way.
August 2, 2006 at 6:28 am
One of the strangest factors I find today are the number of clients who actually don't know their own data requirements, and can't impose any standards at all because they can't determine what they are. I don't remember this a decade ago, and even 5 years back recollect it was an uncommon experience to have a client say, "We're not sure if we need that data." Recently, I find dozens of them express blank looks when you query them on which data are of primary importance, and/or must be in a standard form.
I actually had one client in long-standing whose new DBA actually replied to a question about our concern for standard forms with a lecture about data types, after which I chuckled and had to explain we were conscious of the data types in the design, but concerned that the expressed design wasn't going to integrate easily with the payroll system that was in place (the alphanumeric key that identified the worker was too short). And that's not a shot at DBAs at all, just an observation that a real problem today seems to be that there is an inconsistency in the business processes themselves that have created a situation where they mange such an enormous data stream they are often not managing the key elements. No one seems to be communicating internally in many clients that just ten years ago had a strong understanding of their valuable data.
I suspect that as time goes by, more of this data overload will mask the quality data, and the internal communication failures will lead to even more dependence on post-development integration services. It's sad, though, because it is so easily avoided.
August 2, 2006 at 6:47 am
That's why meta data, measure data quality and data profiling are part of building a quality data warehouse. However, the management only wants to see the results (the actual data warehouse), so many just skipped those steps. In March 2006 TWDI report, it talked about taking data quality to the enterprise through data governance.
To clean up the data is not just IT business, it is part of the users' responsibility because they are the one who understand the data. However the business just does not understand. Every time there is a problem with the data, it automatically blames the IT department !
August 2, 2006 at 8:09 am
I find the problems with data quality aren't technical, but human. The biggest obstacles you'll face with data quality amelioration are emotional.
August 2, 2006 at 9:14 am
It’s interesting that you posit that Integration Services might the most popular part of SQL Server 2005. It reminds me that I haven’t blogged my favorite quote from TechReady3 yet. It was Donald Farmer in one of the SSIS sessions when he was quizzed about pricing model and competitive positioning. He said, “I don’t think of Integration Services as a free addition to SQL Server. I explain it to customers this way. I tell them that they should buy this really cool ETL tool... and get a free database.” Donald is great guy and very funny. Having spent a lot of time with him and Grant on this most recent project, I can say that’s the not the funniest thing he’s said lately – it’s just the funniest one that I can repeat in public.
But I digressed, one of the biggest criticisms of SSIS that I’ve heard of late is that so few data quality widgets ship ITB. We have some great partners in the data quality space, so there are ways to address those issues. Especially now that Microsoft and Oracle have invaded the ETL space, the pure plays seem to be trying to escalate/transform themselves into enterprise data management vendors instead of vanilla ETL products. Although, my experience is that Mr. Hirsch is right – data quality is an emotional/business issue that imposes some of the most bizarre requirements upon an otherwise straightforward BI solution.
August 2, 2006 at 11:52 am
Does anyone know how to get to the Calculator referenced in the article? It says you have to be a memeber of baseline and I tried to sign up but it says you have to be a memeber of Ziff-Davis online to sign up for baseline and I can't figure out how to sign up for that one.
I really want to show this article (& the calculator) to management where I work because I know they don't realize how much it;s costing us to keep doing business the way we do were there is little to no standardization and the like.
Thanks
Kindest Regards,
Just say No to Facebook!August 2, 2006 at 3:33 pm
August 3, 2006 at 6:43 am
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply