June 20, 2007 at 4:07 pm
Hello Guys
I have a problem and I am currently looking for a way out, at the moment I receive external data feed files from a client, the data being received comes through literally every 5 minutes in text files, and there is an ETL application that was written to load the text files onto the database.
Now here is the problem, the text files coming from the client involves a lorge amount of data, just to give you an idea we are talking of over 8 million rows a day, the database hasnt been normalised, so data is being duplicated everywhere, I am looking to develop a reporting solution for clients based on the feed files that the ETL application loads, I have head about the great features of SQL 2005 and below are my questions:
Consideration
The live feed system cannot be normalised for now
We cannot use SQL server 2005 replication as this is just not feasible in our case
Questions
What new features of SQL 2005 would really benefit me ?
Can i use XML for data transfer between the new reporting system and the live system ?
Are there any books out there that would be og great benefit ?
Is it possible to write an application that will take a data set, normalise it, then spit it out in XML format, then i can have another ETL application on the client side that translate the XML into table format.
June 20, 2007 at 10:40 pm
Did you read SSIS or Analysis services in SQL 2005. Might ne they will be useful for your requirement.
Cheers,
Sugeshkumar Rajendran
SQL Server MVP
http://sugeshkr.blogspot.com
June 21, 2007 at 12:01 am
The live feed system cannot be normalised for now
Is "the feed system" a sql DBA task? Instead I would seriously think to implement unique keys on either surrogate key fields of the sql tables involved OR on a certain set of fields that are corresponding to the probably composite primary keys in the source. You may need to implement error handling within your ETL application for cases when your sql db unique keys reject the duplicates from the source.
What new features of SQL 2005 would really benefit me ?
There are many. You can create complicated things as sending new loaded data into a different database using Service Broker as SQL Replication would not suit your situation. Or you can run reports over the same "once at 5min-populated" database if you would use an optimistic isolation level: read in BOL about snapshot isolation level (2 types, one requiring re-coding for error handling - read try/catch procedure). Other new improvement is database snapshot - BOL explain pretty much good how this actually works; All of these have pros and cons, but the important thing is that a solution exists.
Then read about DMVs for monitoring your machine and instance resources. They are really GOLD.
Good luck.
June 21, 2007 at 7:08 am
Hi Guys
Thanks for your replies, I have read about the technologies before, let me start with SSIS:
SSIS
I can use SSIS without any problems, but the issues her is that SSIS wouldnt solve my problem, but rather can be part of my ETL solution.
Service Broker
From my understanding, i.e from what I have read, Service broker is really good for communication between 2 endpoints, i.e creating a contract etc for the transfer of data, which is really cool, but I am not sure how this will benefit me, at the moment, the way replication is being done is by the good old flat files, live creates flat file feeds, and reporting system loads them, can someone please explain to me, how service broker will make this more effficient or rather a justification for using it? From what I read, the classical example is a database/web application, where the delivery of messages is very important, although delivery might be late due to system failure, but final delivery is still guaranteed when the system is back up. In my case, its just a database (live) ==> database (reporting)
Is "the feed system" a sql DBA task? Instead I would seriously think to implement unique keys on either surrogate key fields of the sql tables involved OR on a certain set of fields that are corresponding to the probably composite primary keys in the source. You may need to implement error handling within your ETL application for cases when your sql db unique keys reject the duplicates from the source.
The statement above sounds like the sort of advice that I need, the only problem is that I do not fully understand the explanation above in italics, can you please further highlight me on this, and also when creating a communication/replication model between the 2 systems, is it better to try and normalise data from the source, so that its light as its being sent to reporting, or rather write an ETL application that spits our denormalised data, and the receiving side ETL would normalise and load the data ?
Lastly, I have also read about the capabilities of XML, will it be of any good in this case ?
June 21, 2007 at 7:31 pm
In your first post you said that you have an ETL app that loads data from text files (source) into a sql database and the way this app is written causes you to get duplicates. My answer was referring to basic datawarehouse design and modelling concepts: your sql db must have surrogate keys defined on the tables involved in the ETL process. The initial fields source must have primary keys defined which probably are clustered indexes as well (so that we are talking here about physical order defined over your initial feed source, not the flat files that are most probably an intermediate data format) and most probably (again) define the unicity of the source data values at the source level. Knowing this information about your data source then you should build unique keys over the correspondent fields in destination table on your sql db.
Then you said you cannot implement replication; service broker would be an alternative to this - a service broker application is mainly xml code.
June 21, 2007 at 10:11 pm
Thanks Michaela for your post, I dont mean duplicates in that way (not duplicates in one table), what I actually mean is that thesame data is being stored onto several other tables, so these text files do get into other tables as well. Like storing address details more than once
So the idea is to normalise the database, and have just one table rather that over 6 in which each text file gets into.
June 25, 2007 at 4:44 am
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply