Designing a ETL Application to load large amounts of Data

  • Hi All,

     

    I am thinking of designing a ETL (Extraction, Transformation and Loading) application, the idea is to design a loader to insert data into database tables from flat files, I am not quite sure as to was strategy to go with here, I have around 5 million records of data split over separate flat files, and I need to design the ETL application to load data for a period of 4 months (150m rows a month x 4 = 600m rows in 4 months), so the question is, would it be better to design daily feed tables or to have one massive table and design the ETL application to keep inserting into the table.

     

    The problem is that the data due to the large amount of data, its is proving difficult to decide whether to create daily/weekly tables or have the massive table, I am thinking that with a massive table, maintenance will be very difficult, please advice.


    Kindest Regards,

    John Burchel (Trainee Developer)

  • Personally I suggest in the short term you just pop it in the one table, if it becomes difficult later you can simply add constraints to the underlying table and slap a view over them.  Oh yes, and BCP is your friend...

     

     

  • I think one table is better, but you should be running Enterprise edition and perhaps looking at partitioned tables. Looks like one to your application, but the server handles the split among tables.

  • If you have this amount of data it is possible that there are older and newer files. I would ask 2 questions before designing:

    - Do we need all data to be active? Do we have obsolete data? Then we can load obsolete data on a separate filegroups on separate drives and create a distributed design when the older data are searched only if the record is not found in the newer data

    - Do all files have the same structure? Over time new fields may be added to the files.

    I would also talk to the users and the business owners about how the data will be used. Then the design may be more clear based on the needs. I would not create daily tables, but I may considering creating yearly tables if one department will work with this year's data and another department will work with historical data.

    Regards,Yelena Varsha

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply