Skip to content

This is an ETL system designed using Pentaho to transfer multiple data tables between servers in one execution. In each transfer data process, the system can perform other tasks, such as generating a file, and sending generated file to email and/or SFTP.

Notifications You must be signed in to change notification settings

ratminurisnaini/Data-Migration-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 

Repository files navigation

Data-Migration-System

This ETL system is designed using Pentaho to perform multiple tasks sequentially in one execution. The main task is to transfer data tables between databases (SQL Server-to-SQL Server or MySQL-to-SQL Server). In each task, the system can perform other subtasks besides data transfer.

Subtasks that the system can perform in each task:

  • Data transfer (SQL Server-to-SQL Server or MySQL-to-SQL Server)
  • Generating a file (in xls, xlsx, csv, or txt formats)
  • Sending the generated file to an email
  • Sending the generated file to SFTP
  • Scheduling the process to be executed daily or monthly
  • Showing the process status (and logging if errors occurred)

ETL System Simple Flowchart

Flowchart

The ETL system consists of 3 jobs and 12 transformations. And three tables are used in this ETL system, namely konfig_etl, log_etl, and ms_param. They are located in the ‘Tables’ folder. The database used to store these three tables is the SQL Server database.

Tables

1. konfig_etl

Used to store the configuration of data transfer like the source connection, the target connection, the script to retrieve the source data, and other commands to perform other subtasks mentioned above.

Table preview

konfig_etl (1) . . .

konfig_etl (2) . . .

konfig_etl (3)

Field descriptions

No Field Data Type Description
1 id_etl int Unique id number of konfig_etl table
2 ip_source varchar Source database IP
3 db_source varchar Source database name
4 user_source varchar Source database username
5 pass_source varchar Encrypted source database password (if exists)
6 schema_source varchar Schema name in the source database (if exists)
7 tablename_source varchar Table name in the source database
8 ip_target varchar Target database IP
9 db_target varchar Target database name
10 user_target varchar Target database username
11 pass_target varchar Encrypted target database password (if exists)
12 schema_target varchar Schema name in the target database (if exists)
13 tablename_target varchar Table name in the source database
14 script text Script to retrieve the source data (it can be queries or a stored procedure)
15 flag_delete varchar

Flag for target data deletion
Values: Y | N
Y: Perform deletion
N: Do not perform deletion

16 condition_delete text

This field is filled with a deletion query if flag_delete is ‘Y’. For example:
delete from tableA where date < ‘2024-08-31’

17 start_date datetime The time when the execution of the current ETL id is started
18 end_date datetime The time when the execution of the current ETL id is completed
19 status int

ETL execution status
Values: 0 | 1 | 2
0: Indicates that the current ETL ID has not been executed yet
1: Indicates a successful execution of the current ETL ID process
2: Indicates a failed execution of the current ETL ID process

20 keterangan text ETL execution status description
21 flag_aktif int

Flag to activate or deactivate the current ETL id
Values: 0 | 1
0: Deactivate current ETL id
1: Activate current ETL id

22 tanggal int

Flag for scheduling the execution
Values: null | (1-31)
null: the current ETL id will be executed daily
(1-31): indicates the date number so the current ETL id will be executed monthly on the date specified in the tanggal field

23 flag_generate_file int

Flag for generating file
Values: null | ms_param id
null: don’t generate a file
ms_param id: generate a file with file configuration in ms_param table with specified id in the flag_generate_file field

24 flag_email int

Flag for sending the generated file to email
Values: null | ms_param id
null: don’t sending the generated file to email
ms_param id: sending the generated file with email configuration in ms_param table with specified id in the flag_email field

25 flag_sftp int

Flag for sending the generated file to SFTP
Values: null | ms_param id
null: don’t sending the generated file to SFTP
ms_param id: sending the generated file with SFTP configuration in ms_param table with specified id in the flag_SFTP field

26 flag_db_source int

Flag for data transfer process
Values: 0 | 1
0: the transfer will be performed from SQL Server database to SQL Server database
1: the transfer will be performed from MySQL database to SQL Server database

2. log_etl Table

Used to store logs from the process of konfig_etl table.

Table preview

log_etl

Field descriptions

No Field Data Type Description
1 id int Unique id number of log_etl table
2 id_etl int ETL id in konfig_etl table
3 start_date datetime The time when the execution of current ETL id is started
4 end_date datetime The time when the execution of current ETL id is completed
5 status int ETL execution status
6 keterangan text ETL execution status description

3. ms_param Table

Used to store the configuration parameters for generating a file, sending the generated file to email, and sending the generated file to SFTP.

Table preview

ms_param (1) . . .

ms_param (2)

  • The filling of the ms_param table must be done as shown in the table preview.

Field descriptions

No Field Data Type Description
1 id int Unique id number of ms_param table
2 tgl_create date The date when the current id is created
3 user_create varchar The username when the current id is created
4 kode varchar The code name
5 nama varchar The name of the process
6 deskripsi varchar Description of the process (optional)
7 group1 varchar Consists of a maximum of two parameter names separated by a semicolon
8 group2 varchar Consists of a maximum of two parameter names separated by a semicolon
9 group3 varchar Consists of a maximum of two parameter names separated by a semicolon
10 nilai1 varchar Consists of the values in group1 field separated by semicolon
11 nilai2 nvarchar Consists of the values in group2 field separated by semicolon
12 nilai3 varchar Consists of the values in group3 field separated by semicolon

Note:

The encrypted password is encrypted by the fnEncrypt function and then will be decrypted by the fnDecrypt function in the ETL system. These two functions aren't included in this repository so you have to create your own fnEncrypt and fnDecrypt functions before executing the ETL system.

About

This is an ETL system designed using Pentaho to transfer multiple data tables between servers in one execution. In each transfer data process, the system can perform other tasks, such as generating a file, and sending generated file to email and/or SFTP.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages