Skip to content

Commit 2549faf

Browse files
committed
Updated README.
1 parent 8299bc1 commit 2549faf

File tree

1 file changed

+51
-31
lines changed

1 file changed

+51
-31
lines changed

Diff for: README.md

+51-31
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,20 @@
22

33
# AlpineGP
44

5-
_AlpineGP_ is a Python library that helps to build algorithms that can identify _symbolic_ models
6-
of _physical systems_ starting from data. It performs **symbolic regression** using a
7-
_strongly-typed genetic programming_ approach implemented in the [`DEAP`](https://github.com/alucantonio/DEAP)
8-
library. As a natural language for expressing physical models, it leverages the
9-
**discrete calculus** framework
10-
defined and implemented in the library [`dctkit`](https://github.com/alucantonio/dctkit).
5+
_AlpineGP_ is a Python library that solves **symbolic regression** problems using
6+
_Genetic Programming_. It provides a high-level interface to the
7+
[`DEAP`](https://github.com/alucantonio/DEAP) library and leverages the high-performance
8+
distributed computing functionalities provided by the [`ray`](https://www.ray.io) library.
9+
10+
Beside solving classical symbolic regression problems involving algebraic equations
11+
(see, for example, the benchmark problems contained in the
12+
[SRBench](https://github.com/cavalab/srbench)
13+
repository), _AlpineGP_ is specifically design to help identifying _interpretable_,
14+
_symbolic_ models of _physical systems_ starting from data. To this aim, it exploits as a natural and
15+
effective language to express physical models (i.e., conservation laws) a **discrete
16+
calculus** framework, including tools from discrete differential geometry and discrete
17+
exterior calculus, defined and implemented in the library
18+
[`dctkit`](https://github.com/alucantonio/dctkit).
1119

1220
_AlpineGP_ has been introduced in the paper [_Discovering interpretable physical models
1321
with symbolic regression and discrete exterior calculus_](https://iopscience.iop.org/article/10.1088/2632-2153/ad1af2),
@@ -51,40 +59,55 @@ $ tox -e docs
5159

5260
Setting up a symbolic regression problem in _AlpineGP_ involves several key steps:
5361

54-
1. Define the function that computes the prediction associated to an _individual_ (model expression tree).
55-
Its arguments are a _function_ obtained by parsing the individual tree and possibly other
56-
parameters (datasets to compare the individual with). It returns both an _error metric_ between
57-
the prediction and the data and the prediction itself.
62+
1. Define the function that computes the prediction associated to an _individual_
63+
(model expression tree). Its arguments may be a _function_ obtained by parsing the
64+
individual tree and possibly other parameters, such as the dataset needed to evaluate
65+
the model. It returns both an _error metric_ between the prediction and the data and
66+
the prediction itself.
5867
```python
59-
def eval_MSE_sol(individual: Callable, D: Dataset):
68+
def eval_MSE_sol(individual, dataset):
6069

6170
# ...
6271
return MSE, prediction
6372
```
6473

65-
2. Define the functions that return the **prediction** and the **fitness**
74+
1. Define the functions that return the **prediction** and the **fitness**
6675
associated to an individual. These functions **must** have the same
67-
arguments. The first argument is **always** the `Callable` that represents the
68-
individual tree. The functions **must** be decorated with `ray.remote` to support
76+
arguments. In particular:
77+
- the first argument is **always** the batch of trees to be evaluated by the
78+
current worker;
79+
- the second argument **must** be the `toolbox` object used to compile the
80+
individual trees into callable functions;
81+
- the third argument **must** be the dataset needed for the evaluation of the
82+
individuals.
83+
Both functions **must** be decorated with `ray.remote` to support
6984
distributed evaluation (multiprocessing).
7085
```python
7186
@ray.remote
72-
def predict(individual: Callable, indlen: int, D: Dataset, penalty: float) -> float:
87+
def predict(trees, toolbox, data):
7388

74-
_, pred = eval_MSE_sol(individual, D)
89+
callables = compile_individuals(toolbox, trees)
7590

76-
return pred
91+
preds = [None]*len(trees)
92+
93+
for i, ind in enumerate(callables):
94+
_, preds[i] = eval_MSE_sol(ind, data)
95+
96+
return preds
7797

7898
@ray.remote
79-
def fitness(individual: Callable, length: int, D: Dataset, penalty: float) -> Tuple[float, ]:
99+
def fitness(trees, toolbox, true_data):
100+
callables = compile_individuals(toolbox, trees)
80101

81-
MSE, _ = eval_MSE_sol(individual, D)
102+
fitnesses = [None]*len(trees)
82103

83-
# add penalty on length of the tree to promote simpler solutions
84-
fitness = MSE + penalty*length
104+
for i, ind in enumerate(callables):
105+
MSE, _ = eval_MSE_sol(ind, data)
106+
107+
# each fitness MUST be a tuple (required by DEAP)
108+
fitnesses[i] = (MSE,)
85109

86-
# return value MUST be a tuple
87-
return fitness,
110+
return fitnesses
88111
```
89112

90113
3. Set and solve the symbolic regression problem.
@@ -110,22 +133,19 @@ common_params = {'penalty': penalty}
110133
# create the Symbolic Regression Problem object
111134
gpsr = gps.GPSymbolicRegressor(pset=pset, fitness=fitness.remote,
112135
predict_func=predict.remote, common_data=common_params,
113-
feature_extractors=[len],
114136
print_log=True,
115137
config_file_data=config_file_data)
116138

117-
# define training Dataset object (to be used for model fitting)
139+
# wrap tensors corresponding to train and test data into Dataset objects (to be passed to
140+
# fit and predict methods)
118141
train_data = Dataset("D", X_train, y_train)
142+
test_data = Dataset("D", X_test, y_test)
119143

120144
# solve the symbolic regression problem
121145
gpsr.fit(train_data)
122146

123-
# recover the solution associated to the best individual among all the populations
124-
u_best = gpsr.predict(train_data)
125-
126-
# plot the solution
127-
# ...
128-
# ...
147+
# compute the prediction on the test dataset given by the best model found during the SR
148+
pred_test = gpsr.predict(test_data)
129149
```
130150

131151
A complete example notebook can be found in the `examples` directory.

0 commit comments

Comments
 (0)