|
1 | 1 | # ParallelProcessingTools.jl
|
2 | 2 |
|
3 |
| -[](https://travis-ci.com/oschulz/ParallelProcessingTools.jl) |
| 3 | +[](https://oschulz.github.io/ParallelProcessingTools.jl/stable) |
| 4 | +[](https://oschulz.github.io/ParallelProcessingTools.jl/dev) |
| 5 | +[](LICENSE.md) |
| 6 | +[](https://travis-ci.com/oschulz/ParallelProcessingTools.jl) |
| 7 | +[](https://ci.appveyor.com/project/oschulz/ParallelProcessingTools-jl) |
4 | 8 | [](https://codecov.io/gh/oschulz/ParallelProcessingTools.jl)
|
5 | 9 |
|
6 |
| -This Julia package provides some tools to ease multithreaded and distributed programming, especially for more complex use cases and when using multiple processes with multiple threads on each process. |
| 10 | +This Julia package provides some tools to ease multithreaded and distributed programming, especially for more complex use cases and when using multiple processes with multiple threads on each process. It also provides functions and macros designed to ease the transition to the new multi-threading model introduced in Julia v1.3. |
7 | 11 |
|
8 |
| -This package follows the SPMD (Single Program Multiple Data) paradigm (like, e.g MPI, Cuda, OpenCL and |
9 |
| -`DistributedArrays.SPMD`): Run the same code on every execution unit (process or thread) and make the code responsible for figuring out which part of the data it should process. This differs from the approach of `Base.Threads.@threads` and `Distributed.@distributed`. SPMD is more appropriate for complex cases that the latter do not handle well (e.g. because some initial setup is required on each execution unit and/or iteration scheme over the data is more complex, control over SIMD processing is required, etc.). |
10 | 12 |
|
11 |
| -This package also implements thread-local variables and tooling to handle non-thread-safe code. |
| 13 | +## Documentation |
12 | 14 |
|
13 |
| -In addition, the package provides some functions and macros designed to ease the transition to the new multi-threading model introduced in Julia v1.3. |
14 |
| - |
15 |
| -Note: Some features may not work on Windows, currently. |
16 |
| - |
17 |
| - |
18 |
| -## Work partitions |
19 |
| - |
20 |
| -`workpart` partitions an `AbstractArray` across a a specified set of workers (i.e. processes or threads). E.g. |
21 |
| - |
22 |
| -```julia |
23 |
| -A = rand(100) |
24 |
| -workpart(A, 4:7, 5) == view(A, 26:50) |
25 |
| -``` |
26 |
| - |
27 |
| -returns a views into the array that worker `5` out of a set or workers `4:7` will be responsible for. The intended usage is |
28 |
| - |
29 |
| -```julia |
30 |
| -using Distributed, Base.Threads |
31 |
| -@everywhere data = rand(1000) |
32 |
| -@everywhere procsel = workers() |
33 |
| -@onprocs procsel begin |
34 |
| - sub_A = workpart(data, procsel, myid()) |
35 |
| - threadsel = allthreads() |
36 |
| - @onthreads threadsel begin |
37 |
| - # ... some initialization, create local buffers, etc. |
38 |
| - idxs = workpart(eachindex(sub_A), threadsel, threadid()) |
39 |
| - for i in idxs |
40 |
| - # ... A[i] ... |
41 |
| - end |
42 |
| - end |
43 |
| -end |
44 |
| -``` |
45 |
| - |
46 |
| -see below for a full example. |
47 |
| - |
48 |
| -If `data` is a `DistributedArrays.DArray`, then `DistributedArrays.localpart(data)` should be used instead of `workpart(data, workers(), myid())`. |
49 |
| - |
50 |
| - |
51 |
| -## Thread-safety |
52 |
| - |
53 |
| -Use `@critical` to mark non thread-safe code, e.g. for logging. For example |
54 |
| - |
55 |
| -```julia |
56 |
| -@onthreads allthreads() begin |
57 |
| - @critical @info Base.Threads.threadid() |
58 |
| -end |
59 |
| -``` |
60 |
| - |
61 |
| -would crash Julia without `@critical` because `@info` is not thread-safe. |
62 |
| - |
63 |
| -Note: This doesn't always work for multithreaded code on other processes yet. |
64 |
| - |
65 |
| - |
66 |
| -# Thread-local variables |
67 |
| - |
68 |
| -Thread-local variable can be created and initialized via |
69 |
| - |
70 |
| -```julia |
71 |
| -tl = ThreadLocal(0.0) |
72 |
| -``` |
73 |
| - |
74 |
| -The API is the similar to `Ref`: `tl[]` gets the value of `tl` for the current thread, `tl[] = 4.2` sets the value for the current thread. `getallvalues(tl)` returns the values for all threads as a vector, and can only be called from single-threaded code. |
75 |
| - |
76 |
| - |
77 |
| -# Multithreaded code execution |
78 |
| - |
79 |
| -The macro `@onthreads threadsel expr` will run the code in `expr` on the threads in `threadsel` (typically a range of thread IDs). For convenience, the package exports `allthreads() = 1:nthreads()`. Here's a simple example on how to use thread-local variables and `@onthreads` to sum up numbers in parallel: |
80 |
| - |
81 |
| -```julia |
82 |
| -tlsum = ThreadLocal(0.0) |
83 |
| -data = rand(100) |
84 |
| -@onthreads allthreads() begin |
85 |
| - tlsum[] = sum(workpart(data, allthreads(), Base.Threads.threadid())) |
86 |
| -end |
87 |
| -sum(getallvalues(tlsum)) ≈ sum(data) |
88 |
| -``` |
89 |
| - |
90 |
| -`@onthreads` forwards exceptions thrown by the code in `expr` to the caller (in contrast to, `Base.Threads.@threads`, that will currently print an exception but not forward it, so when using `@threads` program execution simply continues after a failure in multithreaded code). |
91 |
| - |
92 |
| -Note: Julia can currently run only one function on multiple threads at the same time (this restriction is likely to disappear in the the future). So even if `threadsel` does not include all threads, the rest of the threads will be idle but blocked and cannot be used to run other code in parallel. However, the ability to run on a subset of the available threads is still useful to measure the scaling behavior of multithreaded code (without restarting Julia with a different value for `$JULIA_NUM_THREADS`). |
93 |
| - |
94 |
| - |
95 |
| - |
96 |
| -# Multiprocess code execution |
97 |
| - |
98 |
| -The macro `@onprocs procsel expr` will run the code in `expr` on the processes in `procsel` (typically an |
99 |
| -array of process IDs). `@onprocs` returns a vector with the result of `expr` on each process and |
100 |
| -will wait until all the results are available (but may of course be wrapped in `@async`). A |
101 |
| -simple example to get the process ID on each worker process: |
102 |
| - |
103 |
| -```julia |
104 |
| -using Distributed |
105 |
| -addprocs(2) |
106 |
| -workers() == @onprocs workers() myid() |
107 |
| -``` |
108 |
| - |
109 |
| -Note: If the data can be expressed in terms of a `DistributedArrays.DArray`, it may be more appropriate and convenient to use the multiprocess execution tooling available in the package `DistributedArrays` (possibly combined with `ParallelProcessingTools.@onthreads`). |
110 |
| - |
111 |
| - |
112 |
| -# Creating multithreaded workers |
113 |
| - |
114 |
| -Julia currently doesn't provide an easy way to start multithreaded worker instances. `ParallelProcessingTools` provides a script `mtjulia.sh` (currently Linux-only) that will start Julia with `$JULIA_NUM_THREADS` set to a suitable value for each worker host (currently the number of physical processes on one NUMA node). `mtjulia_exe()` will return the absolute path to `mtjulia.sh`. So multithreaded workers can be spawned (via SSH) like this: |
115 |
| - |
116 |
| -```julia |
117 |
| -addprocs([hostname1, ...], exename = mtjulia_exe()) |
118 |
| -``` |
119 |
| - |
120 |
| - |
121 |
| -### Example use case: |
122 |
| - |
123 |
| -As a simple real-world use case, let's histogram distributed data on multiple processes and threads: |
124 |
| - |
125 |
| -Set up a cluster of multithreaded workers and load the required packages: |
126 |
| - |
127 |
| -```julia |
128 |
| -using Distributed, ParallelProcessingTools |
129 |
| -addprocs(["hostname1", ...], exename = mtjulia_exe()) |
130 |
| -@everywhere using ParallelProcessingTools, Base.Threads, |
131 |
| - DistributedArrays, Statistics, StatsBase |
132 |
| -``` |
133 |
| - |
134 |
| -Create some distributed data and check how the data is distributed: |
135 |
| - |
136 |
| -```julia |
137 |
| -data = drandn(10^8) |
138 |
| -procsel = procs(data) |
139 |
| -@onprocs procsel size(localpart(data)) |
140 |
| -``` |
141 |
| - |
142 |
| -Check the number of threads on each worker holding a part of the data: |
143 |
| - |
144 |
| -```julia |
145 |
| -@onprocs procsel nthreads() |
146 |
| -``` |
147 |
| - |
148 |
| -Create histograms in parallel on all threads of all workers and merge: |
149 |
| - |
150 |
| -```julia |
151 |
| -proc_hists = @onprocs procsel begin |
152 |
| - local_data = localpart(data) |
153 |
| - tl_hist = ThreadLocal(Histogram((-6:0.1:6,), :left)) |
154 |
| - @onthreads allthreads() begin |
155 |
| - data_for_this_thread = workpart(local_data, allthreads(), threadid()) |
156 |
| - append!(tl_hist[], data_for_this_thread) |
157 |
| - end |
158 |
| - merged_hist = merge(getallvalues(tl_hist)...) |
159 |
| -end |
160 |
| -final_hist = merge(proc_hists...) |
161 |
| -``` |
162 |
| - |
163 |
| -Check result: |
164 |
| - |
165 |
| -``` |
166 |
| -sum(final_hist.weights) ≈ length(data) |
167 |
| -
|
168 |
| -using Plots |
169 |
| -plot(final_hist) |
170 |
| -``` |
171 |
| - |
172 |
| -Note: This example is meant to show how to combine the features of this package. The multi-process part of this particular use case can be written in a simpler way using functionality from `DistributedArrays`. |
| 15 | +* [Documentation for stable version](https://oschulz.github.io/ParallelProcessingTools.jl/stable) |
| 16 | +* [Documentation for development version](https://oschulz.github.io/ParallelProcessingTools.jl/dev) |
0 commit comments