The transformation phase of ETL processes applies a series of business rules to the extracted data.
In ETL processes , after the data has been extracted from the source or sources, the second phase begins: transformation. The transformation phase of an ETL process consists of applying a series of functions or business rules to the extracted data to convert it into data that will then be loaded into the new source.
Why is a transformation process necessary?
To understand the need for a transformation process, we must take into account that an ETL process handles various sources, some of them from outside the organization itself: stock market information from a website outside the company, any type of Internet download, an Office package, etc. This variety of databases, sometimes from several countries, with different languages and different units of measurement, makes it impossible or difficult to make comparisons if conversions and fantuan database are not carried out beforehand. Hence the need for transformation processes.
ETL Processes: The Basis of Business Intelligence
Transformation actions
The most common actions or processes are:
Data reformatting.
Unit conversion. For example, converting miles to kilometers per hour or vice versa. This is very common when extracting data from countries with different metric units. Another example would be converting different currencies (pounds, euros, etc.) into a single standard value.
Selecting columns to load later. For example, making columns with null values not load.
Aggregating columns. Adding a column with the origin of certain cars would be an example.
Split a column into several columns. This action is very useful, for example, to separate into three columns, one for the name and two for the surname, the identification of a person that was previously in a single field.
Translate codes. For example, if the source stores an “H” for men and an “M” for women, give the necessary instructions so that the destination stores a “1” for men and a “2” for women.
Get new calculated values.
Join data from multiple sources.
Lookups. This is when data is taken and compared with other types of data, cross-referencing information. For example, capturing a customer code from a database and cross-referencing it with another database of loans granted to find out whether or not said customer benefits from that loan.
Pivoting. A process similar to lookups but with a greater degree of complexity, since data from different sources are crossed.
ETL Processes: Transformation. What Does It Consist Of?
-
- Posts: 1324
- Joined: Tue Dec 24, 2024 4:27 am