You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Successfully initialize PIE grid solver with openmp backend
163
163
# of vars: 17227
164
164
Iter 5000, abs error [ 5.187172 6.701462 11.020264]
@@ -168,22 +168,22 @@ Successfully write image to result.jpg
168
168
169
169
### Parallelization Strategy
170
170
171
-
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/openmp/equ.cc), it first groups the pixels into two folds by `(i+j)%2`, then parallelizes per-pixel iteration inside a group in each step. This strategy can utilize the thread-local accessment.
171
+
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/openmp/equ.cc), it first groups the pixels into two folds by `(i+j)%2`, then parallelizes per-pixel iteration inside a group in each step. This strategy can utilize the thread-local accessment.
172
172
173
-
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/openmp/grid.cc), it parallelizes per-grid iteration in each step, where the grid size is `(grid_x, grid_y)`. It simply iterates all pixels in each grid.
173
+
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/openmp/grid.cc), it parallelizes per-grid iteration in each step, where the grid size is `(grid_x, grid_y)`. It simply iterates all pixels in each grid.
174
174
175
175
## MPI
176
176
177
177
To run with MPI backend, you need to install both mpicc and mpi4py (`pip install mpi4py`).
178
178
179
-
Different from other methods, you need to use `mpiexec` or `mpirun` to launch MPI service instead of directly calling `pie` program. `-np` option is to indicate the number of process it will launch.
179
+
Different from other methods, you need to use `mpiexec` or `mpirun` to launch MPI service instead of directly calling `fpie` program. `-np` option is to indicate the number of process it will launch.
180
180
181
181
Apart from that, you need to specify the synchronization interval for MPI backend with `--mpi-sync-interval`. If this number is too small, it will cause a large amount of overhead of synchronization; however, if it is too large, the quality of solution drops down dramatically.
182
182
183
183
MPI EquSolver and GridSolver don't have any other arguments because of the parallelization strategy we used, see the next section.
Successfully initialize PIE grid solver with mpi backend
197
197
# of vars: 17227
198
198
Iter 5000, abs error [204.41124 215.00548 296.4441 ]
@@ -204,9 +204,9 @@ Successfully write image to result.jpg
204
204
205
205
MPI cannot use share-memory program model, so that we need to reduce the amount of data for communication. Each process is only responsible for a part of computation, and synchronized with other process per `mpi_sync_interval` steps.
206
206
207
-
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/mpi/equ.cc), it's hard to say which part of the data should be exchanged to other process, since it relabels all pixels at the very beginning of this process. We use `MPI_Bcast` to force sync all data.
207
+
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/mpi/equ.cc), it's hard to say which part of the data should be exchanged to other process, since it relabels all pixels at the very beginning of this process. We use `MPI_Bcast` to force sync all data.
208
208
209
-
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/mpi/grid.cc), we use line partition: process `i` exchanges its first and last line data with process `i-1` and `i+1` separately. This strategy has a continuous memory layout to exchange, thus has less overhead comparing with block partition.
209
+
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/mpi/grid.cc), we use line partition: process `i` exchanges its first and last line data with process `i-1` and `i+1` separately. This strategy has a continuous memory layout to exchange, thus has less overhead comparing with block partition.
210
210
211
211
However, even if we don't use the synchronization in MPI (set `mpi_sync_interval` to be greater than the number of iteration), it is still slower than OpenMP and CUDA backends.
212
212
@@ -217,7 +217,7 @@ CUDA backend needs to specify the number of threads in one block it will use, wi
@@ -254,6 +254,6 @@ Successfully write image to result.jpg
254
254
255
255
The strategy used in CUDA backend is quite similar to OpenMP.
256
256
257
-
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/cuda/equ.cu), it performs equation-level parallelization.
257
+
For [EquSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/cuda/equ.cu), it performs equation-level parallelization.
258
258
259
-
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/pie/core/cuda/grid.cu), each grid with size `(grid_x, grid_y)` will be in the same block. A thread in a block performs iteration only for a single pixel.
259
+
For [GridSolver](https://github.com/Trinkle23897/Fast-Poisson-Image-Editing/blob/main/fpie/core/cuda/grid.cu), each grid with size `(grid_x, grid_y)` will be in the same block. A thread in a block performs iteration only for a single pixel.
0 commit comments