2
2
3
3
# AlpineGP
4
4
5
- _ AlpineGP_ is a Python library that helps to build algorithms that can identify _ symbolic_ models
6
- of _ physical systems_ starting from data. It performs ** symbolic regression** using a
7
- _ strongly-typed genetic programming_ approach implemented in the [ ` DEAP ` ] ( https://github.com/alucantonio/DEAP )
8
- library. As a natural language for expressing physical models, it leverages the
9
- ** discrete calculus** framework
10
- defined and implemented in the library [ ` dctkit ` ] ( https://github.com/alucantonio/dctkit ) .
5
+ _ AlpineGP_ is a Python library that solves ** symbolic regression** problems using
6
+ _ Genetic Programming_ . It provides a high-level interface to the
7
+ [ ` DEAP ` ] ( https://github.com/alucantonio/DEAP ) library and leverages the high-performance
8
+ distributed computing functionalities provided by the [ ` ray ` ] ( https://www.ray.io ) library.
9
+
10
+ Beside solving classical symbolic regression problems involving algebraic equations
11
+ (see, for example, the benchmark problems contained in the
12
+ [ SRBench] ( https://github.com/cavalab/srbench )
13
+ repository), _ AlpineGP_ is specifically design to help identifying _ interpretable_ ,
14
+ _ symbolic_ models of _ physical systems_ starting from data. To this aim, it exploits as a natural and
15
+ effective language to express physical models (i.e., conservation laws) a ** discrete
16
+ calculus** framework, including tools from discrete differential geometry and discrete
17
+ exterior calculus, defined and implemented in the library
18
+ [ ` dctkit ` ] ( https://github.com/alucantonio/dctkit ) .
11
19
12
20
_ AlpineGP_ has been introduced in the paper [ _ Discovering interpretable physical models
13
21
with symbolic regression and discrete exterior calculus_ ] ( https://iopscience.iop.org/article/10.1088/2632-2153/ad1af2 ) ,
@@ -51,40 +59,55 @@ $ tox -e docs
51
59
52
60
Setting up a symbolic regression problem in _ AlpineGP_ involves several key steps:
53
61
54
- 1 . Define the function that computes the prediction associated to an _ individual_ (model expression tree).
55
- Its arguments are a _ function_ obtained by parsing the individual tree and possibly other
56
- parameters (datasets to compare the individual with). It returns both an _ error metric_ between
57
- the prediction and the data and the prediction itself.
62
+ 1 . Define the function that computes the prediction associated to an _ individual_
63
+ (model expression tree). Its arguments may be a _ function_ obtained by parsing the
64
+ individual tree and possibly other parameters, such as the dataset needed to evaluate
65
+ the model. It returns both an _ error metric_ between the prediction and the data and
66
+ the prediction itself.
58
67
``` python
59
- def eval_MSE_sol (individual : Callable, D : Dataset ):
68
+ def eval_MSE_sol (individual , dataset ):
60
69
61
70
# ...
62
71
return MSE , prediction
63
72
```
64
73
65
- 2 . Define the functions that return the ** prediction** and the ** fitness**
74
+ 1 . Define the functions that return the ** prediction** and the ** fitness**
66
75
associated to an individual. These functions ** must** have the same
67
- arguments. The first argument is ** always** the ` Callable ` that represents the
68
- individual tree. The functions ** must** be decorated with ` ray.remote ` to support
76
+ arguments. In particular:
77
+ - the first argument is ** always** the batch of trees to be evaluated by the
78
+ current worker;
79
+ - the second argument ** must** be the ` toolbox ` object used to compile the
80
+ individual trees into callable functions;
81
+ - the third argument ** must** be the dataset needed for the evaluation of the
82
+ individuals.
83
+ Both functions ** must** be decorated with ` ray.remote ` to support
69
84
distributed evaluation (multiprocessing).
70
85
``` python
71
86
@ray.remote
72
- def predict (individual : Callable, indlen : int , D : Dataset, penalty : float ) -> float :
87
+ def predict (trees , toolbox , data ) :
73
88
74
- _, pred = eval_MSE_sol(individual, D )
89
+ callables = compile_individuals(toolbox, trees )
75
90
76
- return pred
91
+ preds = [None ]* len (trees)
92
+
93
+ for i, ind in enumerate (callables):
94
+ _, preds[i] = eval_MSE_sol(ind, data)
95
+
96
+ return preds
77
97
78
98
@ray.remote
79
- def fitness (individual : Callable, length : int , D : Dataset, penalty : float ) -> Tuple[float , ]:
99
+ def fitness (trees , toolbox , true_data ):
100
+ callables = compile_individuals(toolbox, trees)
80
101
81
- MSE , _ = eval_MSE_sol(individual, D )
102
+ fitnesses = [ None ] * len (trees )
82
103
83
- # add penalty on length of the tree to promote simpler solutions
84
- fitness = MSE + penalty* length
104
+ for i, ind in enumerate (callables):
105
+ MSE , _ = eval_MSE_sol(ind, data)
106
+
107
+ # each fitness MUST be a tuple (required by DEAP)
108
+ fitnesses[i] = (MSE ,)
85
109
86
- # return value MUST be a tuple
87
- return fitness,
110
+ return fitnesses
88
111
```
89
112
90
113
3 . Set and solve the symbolic regression problem.
@@ -110,22 +133,19 @@ common_params = {'penalty': penalty}
110
133
# create the Symbolic Regression Problem object
111
134
gpsr = gps.GPSymbolicRegressor(pset = pset, fitness = fitness.remote,
112
135
predict_func = predict.remote, common_data = common_params,
113
- feature_extractors = [len ],
114
136
print_log = True ,
115
137
config_file_data = config_file_data)
116
138
117
- # define training Dataset object (to be used for model fitting)
139
+ # wrap tensors corresponding to train and test data into Dataset objects (to be passed to
140
+ # fit and predict methods)
118
141
train_data = Dataset(" D" , X_train, y_train)
142
+ test_data = Dataset(" D" , X_test, y_test)
119
143
120
144
# solve the symbolic regression problem
121
145
gpsr.fit(train_data)
122
146
123
- # recover the solution associated to the best individual among all the populations
124
- u_best = gpsr.predict(train_data)
125
-
126
- # plot the solution
127
- # ...
128
- # ...
147
+ # compute the prediction on the test dataset given by the best model found during the SR
148
+ pred_test = gpsr.predict(test_data)
129
149
```
130
150
131
151
A complete example notebook can be found in the ` examples ` directory.
0 commit comments