ETL is the most important term when it comes to database and Data Warehousing. ETL stands for Extract, Transform and Load. ETL covers a process of how the data are loaded from the source system to the data warehouse. ETL can be called as the pipeline between two databases or system which enables the flow of data properly.
What is ETL ?
ETL – Extract, Transform and Load. As the name suggests, ETL is a combination of three different database processes combined into a single tool. ETL enables integration and analysis of the data stored in different databases and heterogeneous formats. Data is collected from multiple sources (extraction), the data is reformatted and cleansed for operational needs (transformation). Finally, it is loaded into a target database,data warehouse or a data mart to be analyzed.
Extract, Load, Transform (ELT) is a data integration process for transferring raw data from a source system to a target database and then preparing the information for downstream uses.
Process in ETL :
- Extract – Aim of Extract process is to do the data extraction from the source system and make it available for further processing.Usually data is collected from different source systems that may have a different format so the extraction must convert the data into a format suitable for transformation processing. Extraction process should be designed with proper care so that it does not affects the source system in terms or performance, response time or any kind of locking.
- Transform – Aim of Transform process is to apply a series of rules or functions to transform the data from source to the target.Various processes used for transformation are conversion, joining data from several processes , clearing the duplicates, standardizing, filtering, sorting, translating and looking up or verifying if the data sources are inconsistent.
- Load – Aim of Load process is to load the extracted and transformed data to a target database.
ETL Issues :
The Data and Technology have a large influence on ETL approach. Below listed are the major data and technology issues which occurs during ETL :
Data Issue Include –
- Quality Of Data
- How similar are the source and target data structures?
- What kind of dependencies exist in the data ?
- How is metadata used ?
- How complex are the data relationships ?
Technology Issue Include –
- What is the volume and frequency of load ?
- How much disk space will be needed ?
- How will be ETL processes be scheduled ?
ETL Tools :
Usage of ETL process has increased significantly in market. This has resulted in availability of lot of ETL tools in market. Some of the most commonly used ETL tools available in the market are :
- Informatica – Power Center
- Ab -Initio
- SAS – Data Integration Studio
- Pentaho – Data Integration
- Oracle – Data Integrator
This was a brief overview about ETL – Extract, Transform and Load.I believe this will help in having a better understanding about ETL.
Thanks for reading.