-
Notifications
You must be signed in to change notification settings - Fork 360
How to Implement Hot Cold Data Mixed Computing with esProc
Separating hot and cold data into different databases makes performing mixed queries quite troublesome. Many databases lack support for such queries; those with partial support offer limited capabilities and often require extensive data replication, resulting in low efficiency. Separately retrieving the cold and hot data and performing calculations within Java would be overly complex. esProc, a database-independent computing engine, supports various data sources, provides rich computation libraries, and can be embedded in Java, making it ideally suited for performing hot-cold database hybrid computation in applications.
First, download esProc (Standard Edition recommended): https://www.esproc.com/download-esproc
After installation, configure the database connection. Here, we use MySQL as an example.
Begin by placing the MySQL JDBC driver package in the directory [esProc installation directory]\common\jdbc (this is similar for other databases).
Then, start the esProc IDE. From the menu bar, select Tool > Connect to Data Source and configure a standard MySQL JDBC connection.
After confirming the settings, test the connection by clicking “Connect.” If the hotdb data source we just configured turns pink, the connection is successful.
Next, we’ll attempt to perform hot-cold data mixed computations in esProc.
In a business scenario, order data of the current year and the mutable data of last year, is classified as hot data and stored in the hotdb database. Older order data before the last year, which no longer changes, is classified as cold data and placed in the colddb database. The table structures of the two databases are identical. Data is separated into hot and cold based on time (mutability). Now, we want to query data that may span both databases, how to implement it using esProc?
We’ve already configured the hotdb connection. Now we configure a colddb connection in a similar way.
In the esProc IDE, write a script for mixed data computation to achieve the following: query order data within a specified date range and aggregate the order amounts by payment method.
First, add two date parameters, sdate, edate, to the script to filter order data by date.
Script:
The script queries data from the hot database (A3) and the cold database (A4) respectively based on the input parameters. To avoid the slowness of retrieving all details, SQL first performs an initial aggregation. The data is then merged and re-aggregated in A5. The entire process is straightforward.
Press Ctrl+F9 to execute the script. The execution result of each cell (step) can be viewed on the right side of the IDE. Here, it indicates the final calculation result of A5.
Considering that the queried data may reside only in one database, additional conditions can be added to reduce access to the database.
In this example, both databases are MySQL (homogeneous). If heterogeneous databases are involved, it only need to modify the database connection and the corresponding SQL statements.
Now let’s integrate esProc into a Java application.
Locate the esProc JDBC-related jars, esproc-bin-xxxx.jar and icu4j_60.3.jar, in the [esProc installation directory]\esProc\lib directory.
Place the two jars in the classpath of the Java development environment, and the MySQL driver package in the application.
Next, locate the esProc configuration file, raqsoftConfig.xml, in the [esProc installation directory]\esProc\config directory. Also, place it on the classpath of the Java development environment. Open raqsoftConfig.xml and you should see the two data sources configured earlier (add them yourself if they are not present).
Write Java code to call the SPL script hot_cold_query.splx:
public class HotColdQuery {
public static void main(String[] args) {
String driver = "com.esproc.jdbc.InternalDriver";
String url = "jdbc:esproc:local://";
try {
Class.forName(driver);
Connection conn = DriverManager.getConnection(url);
PreparedStatement st =conn.prepareCall("call hot_cold_query(?,?)");
st.setObject(1, "2022-01-01");
st.setObject(2, "2025-12-31");
st.execute();
ResultSet rs = st.getResultSet();
while(rs.next()) {
System.out.println(
rs.getString("payment_method") + "\t" +
rs.getDouble("total_amount"));
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
Using JDBC is completely consistent with database usage. Calling SPL scripts is also similar to accessing stored procedures. Here, the call is call hot_cold_query(?,?) with parameters passed accordingly.
The execution results are shown below:
This simple case of grouping and aggregation demonstrates how esProc performs hot-cold mixed computation within a Java application. However, esProc’s computational capabilities extend far beyond this basic illustration, particularly excelling in complex calculations. esProc treats all data sources equally; any accessible source can be integrated into mixed computations, regardless of whether the sources are homogeneous or heterogeneous. This means that cold data can be stored in any form (e.g., a file system), and esProc can still perform mixed computations involving both hot and cold data.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code