It is a technique of combining or gathering data from various sources and are hence accumulated at a single location called data warehouse. Now why this is necessary?
Well, the concept of data warehouse has been derived from the very fact that data of any form is scattered over a wide range of sources (within an organisation). Here we are talking about data of a particular organisation that may be related to one another in some form or the other. ETL helps to draw a picture in our minds about the associations or relationships that exists between the various components of a business like the relationship between sales, marketing campaigns, production.
Let us now talk a little bit about the essential components required for the purpose of integrating information.
When one thinks about preparing a checklist, few key points are worth consideration - CPU, memory and I/O.
The above three components are essential from performance perspective.
So the first and foremost task is to plan and organise activities to be conducted in the process of data migration.
Determine scope of the project to define the process of data gathering.
Assess the complexities involved in the process of data mining, like the nature of the data to be gathered, the source of data, structure of data - if it is stored in files, or in a share system, format of data and so on.
Make sure data to be accumulated must serve the ultimate purpose. Measure the quality of data and apply the required rules to fetch only the desired range of data.
Implement the data mining procedure. Post data mining, testing should be done to ensure that the data gathered are as per data requirement specifications.
Now a big decision or consideration becomes important after executing data migration plan from a business perspective. If a business plans to shut down the data centres previously used for storing migrated data, then a wider perspective is needed for other resources which have been part of it. The resources could be manpower, hardware, the infrastructure and so on.
After data has been moved from previous source to the destination, the ETL checklist requires a quick check from consultants to make sure things are moving in the right direction.
Verify compatibility of data with the current system to see if there is any variation in the usage and accessibility of the system.
Developing ways to access source and target systems. For instance, using an SDK for accessing data from the source to the target system. This helps to filter metadata too.
Identify domain experts to enable development of mapping mechanism.
Devising ways to interpret source data.
Refining data. This means the data gathered from the source should be filtered with the available mechanism to remove unnecessary or unwanted information.
Understanding and deploying systems used at client site, by which we mean what kind of file system or storage mechanism is used to store, reference and access information at the target system.
Perform unit and integration testing. Here testing is done by amalgamating the different tools, system specifications, source data, target system and verifying they work in symphony.
Aligning the tested system with the current workflow by assigning roles and responsibilities to the team members.
Asses the performance for a while to check if any bug or fault is encountered and recording the same for further amendments.
A strong support system should be developed in order to answer queries and get feedbacks in appropriate scenarios.
Cross check access or connectivity aspects so as to take preventive measures to avoid any hindrance to work in progress.
After all the relevant steps are taken, a criteria is to be set which sets the sign off stage or simply a successful summation of ETL activity.