This ETL system is designed using Pentaho to perform multiple tasks sequentially in one execution. The main task is to transfer data tables between databases (SQL Server-to-SQL Server or MySQL-to-SQL Server). In each task, the system can perform other subtasks besides data transfer.
Subtasks that the system can perform in each task:
- Data transfer (SQL Server-to-SQL Server or MySQL-to-SQL Server)
- Generating a file (in xls, xlsx, csv, or txt formats)
- Sending the generated file to an email
- Sending the generated file to SFTP
- Scheduling the process to be executed daily or monthly
- Showing the process status (and logging if errors occurred)
The ETL system consists of 3 jobs and 12 transformations. And three tables are used in this ETL system, namely konfig_etl, log_etl, and ms_param. They are located in the ‘Tables’ folder. The database used to store these three tables is the SQL Server database.
Used to store the configuration of data transfer like the source connection, the target connection, the script to retrieve the source data, and other commands to perform other subtasks mentioned above.
Table preview
Field descriptions
No | Field | Data Type | Description |
---|---|---|---|
1 | id_etl | int | Unique id number of konfig_etl table |
2 | ip_source | varchar | Source database IP |
3 | db_source | varchar | Source database name |
4 | user_source | varchar | Source database username |
5 | pass_source | varchar | Encrypted source database password (if exists) |
6 | schema_source | varchar | Schema name in the source database (if exists) |
7 | tablename_source | varchar | Table name in the source database |
8 | ip_target | varchar | Target database IP |
9 | db_target | varchar | Target database name |
10 | user_target | varchar | Target database username |
11 | pass_target | varchar | Encrypted target database password (if exists) |
12 | schema_target | varchar | Schema name in the target database (if exists) |
13 | tablename_target | varchar | Table name in the source database |
14 | script | text | Script to retrieve the source data (it can be queries or a stored procedure) |
15 | flag_delete | varchar | Flag for target data deletion |
16 | condition_delete | text | This field is filled with a deletion query if flag_delete is ‘Y’. For example: |
17 | start_date | datetime | The time when the execution of the current ETL id is started |
18 | end_date | datetime | The time when the execution of the current ETL id is completed |
19 | status | int | ETL execution status |
20 | keterangan | text | ETL execution status description |
21 | flag_aktif | int | Flag to activate or deactivate the current ETL id |
22 | tanggal | int | Flag for scheduling the execution |
23 | flag_generate_file | int | Flag for generating file |
24 | flag_email | int | Flag for sending the generated file to email |
25 | flag_sftp | int | Flag for sending the generated file to SFTP |
26 | flag_db_source | int | Flag for data transfer process |
Used to store logs from the process of konfig_etl table.
Table preview
Field descriptions
No | Field | Data Type | Description |
---|---|---|---|
1 | id | int | Unique id number of log_etl table |
2 | id_etl | int | ETL id in konfig_etl table |
3 | start_date | datetime | The time when the execution of current ETL id is started |
4 | end_date | datetime | The time when the execution of current ETL id is completed |
5 | status | int | ETL execution status |
6 | keterangan | text | ETL execution status description |
Used to store the configuration parameters for generating a file, sending the generated file to email, and sending the generated file to SFTP.
Table preview
- The filling of the ms_param table must be done as shown in the table preview.
Field descriptions
No | Field | Data Type | Description |
---|---|---|---|
1 | id | int | Unique id number of ms_param table |
2 | tgl_create | date | The date when the current id is created |
3 | user_create | varchar | The username when the current id is created |
4 | kode | varchar | The code name |
5 | nama | varchar | The name of the process |
6 | deskripsi | varchar | Description of the process (optional) |
7 | group1 | varchar | Consists of a maximum of two parameter names separated by a semicolon |
8 | group2 | varchar | Consists of a maximum of two parameter names separated by a semicolon |
9 | group3 | varchar | Consists of a maximum of two parameter names separated by a semicolon |
10 | nilai1 | varchar | Consists of the values in group1 field separated by semicolon |
11 | nilai2 | nvarchar | Consists of the values in group2 field separated by semicolon |
12 | nilai3 | varchar | Consists of the values in group3 field separated by semicolon |
Note:
The encrypted password is encrypted by the fnEncrypt function and then will be decrypted by the fnDecrypt function in the ETL system. These two functions aren't included in this repository so you have to create your own fnEncrypt and fnDecrypt functions before executing the ETL system.