I am building a DW, part of it involves normalizing some transactional type data. Keep in mind the two input files contain 130ish, and 90+ million rows.
From the 130 million rows, I need to build a common table selecting the distinct set of about 20 different fields. Then link these back to the remaining data from each row. The same process will be needed for the second input file, with the same 20ish fields.
First I have to load this data, then I will need to process updates on a monthly basis.
Any good ideas, I have the data in flat files by year ??