Practical Data Migration Strategies
As organizations mature their IT infrastructure must change to keep up with new applications and technologies. Many times this includes moving data housed in one technology (legacy) to the upgraded or new platform that is being implemented. Depending upon what you are trying to accomplish there are several common steps that can help you be successful when performing data migration.
Step 1 – Data Collection
You begin a data migration project by sourcing your data that you are interested in moving. Tasks within this step include selecting required data from all available data sources, overall data storage needs and connections to the current data.
Often when doing a data migration project the legacy system will contain many tables and columns that are not needed in the new system. To understand what data to migrate you must select the critical data from all available data. Often times the application support team or DBA can start the process by highlighting tables that they know are or are not used in the current system. Create a template that allows you to know what tables exists and of those tables which ones are candidates for migration. In addition, if you have time in your project, you should
document at the column level what columns contain data that are required for migration as well. This template is the basis for your metadata that will be referenced for the migration project. It is also the baseline information for business experts to assist in reviewing to help decide the best data to migrate.
After defining what tables and columns are potential candidates for data migration. You need to determine the overall size required to store this information. Begin by looking on the legacy system at how much storage is currently be utilized for the required data tables. This information will be important to help size the new system that will store the data. Finally you need to understand and document all connections to your legacy data. What accounts have permissions and what queries are they running? In the new platform do these accounts need the same access or can you eliminate some access and better understand your access and security controls? Are there some jobs or other systems that only connect to the data on a monthly or quarterly timeframe? After completing these three tasks you are ready to move on to step 2.
Step 2 – Data Analysis
Now that you have collected and organized your metadata to determine what data are candidates to be migrated. You need to do some analysis on the data leveraging the Subject Matter Experts (SMEs) from your business. Work with the SMEs to document what are critical data elements and tables to the business. Perhaps some of the information that was listed as required has data quality issues that are unknown to the business users. How much of the data has null values or a high number of distinct values? One columns that are numeric. What are the minimum and maximum values? Do those values make sense for the data that you are reviewing? What are the most common values that occur in a column? Are there code tables that have values that are not used in the application? These are common questions that should be asked when looking at the data to migrate. To help your focus, look closer at any columns that are critical to business functions and perform closer analysis on them. Fields that are required but not critical should have at least the same basic analysis performed so you have a common baseline. Upon completing the analysis on your data you can move to step 3.
Step 3 – Data Cleansing
After performing the data analysis again work with the (SMEs) to determine what data to clean and to what level. It can take a significant investment to hit 100% data quality. What percentage is “good enough” to clean the data to enable your business to make accurate decisions? For instance, address correction can be done using lookup information from the post office. Knowing that you only have 1 customer at an address and not 3 instances of the same customer at the same address is not as easy. How to determine what data to clean and to what extent will vary by the type of system you are migrating and what you are leveraging the data for. If you are
implementing a new CRM system then having customer data accurate and de-duplicated is a high priority. However, if you are implementing a new manufacturing system that only ships to an address then your data cleansing needs are different. This step can take a lot of effort and time. Leverage Data Quality tools to speed up the process of correcting and cleaning your data. Once you are comfortable with your data quality you can then move to step 4.
Step 4 – Data Movement
There are a variety of ways to move data from legacy sources to new platforms. During this phase you should layout where your data is being moved and in what methods. If you are doing an application migration, you may need to pull data from legacy and store it in new tables that the application can load via its API calls or data load techniques. If you are moving from one database technology to another you can leverage ETL tools or SQL scripts to pull the data from the source system and load into the new system. Regardless of the method you use to move your data. Include in the process, steps that create record metadata for each table that you are migrating. Have the system push this information to a central audit table that stores the table name, record count, beginning date for the migration, end date for the migration. Having these fields will enable you to know exactly how many records were pulled for the table and create an audit log that you can leverage in step 5.
Step 5 – Data Validation
In the data validation step you are ensuring the records that you wanted to migrate have been moved successfully. This would include record counts and data quality enhancements. Start by looking at the output from step 4’s central audit table to review record counts. To check data quality review the data in the new system to see that any corrections to the data during the data quality step was executed as expected. One method to make this process easier is to leverage the information and code you used in the Data Analysis step. Data validation will require many of the same types of queries such as distinct counts, minimum and maximum values that you performed on the legacy system. Also, if the migration was into an application. You need to have unit or functional testing performed to ensure the information was loaded correctly for the system to work correctly.
Summary
Within an organization the data and applications will change as the organization matures. In order to keep up with these changes IT needs to have a process to handle the migration of data from legacy systems to newer applications. Every data migration project is different in size, scope and complexity, however, all projects can leverage the five data migration process steps to ensure they are successful.
Slideshare Link: http://www.slideshare.net/derekwilson74/practical-data-migration-strategies