2
2
Design
3
3
========
4
4
5
- When a Matplotlib :obj: `~matplotlib.artist.Artist ` object in rendered via the `~matplotlib.artist.Artist.draw ` method the following
6
- steps happen (in spirit but maybe not exactly in code):
5
+
6
+ When a Matplotlib :obj: `~matplotlib.artist.Artist ` object in rendered via the
7
+ `~matplotlib.artist.Artist.draw ` method the following steps happen (in spirit
8
+ but maybe not exactly in code):
7
9
8
10
1. get the data
9
11
2. convert from unit-full to unit-less data
@@ -29,22 +31,23 @@ steps happen (in spirit but maybe not exactly in code):
29
31
target.
30
32
31
33
However, this clear structure is frequently elided and obscured in the
32
- Matplotlib code base: Step 3 is only present for *x * and *y * like data (encoded
33
- in the `~matplotlib.transforms.TransformNode ` objects) and color mapped data
34
- (implemented in the `.matplotlib.colors.ScalarMappable ` family of classes); the
35
- application of Step 2 is inconsistent (both in actual application and when it
36
- is applied) between artists; each ``Artist `` stores it's data in its own way
37
- (typically as numpy arrays).
34
+ Matplotlib code base: Step 3 is only present for *x * and *y * like data
35
+ (encapsulated in the `~matplotlib.transforms.TransformNode ` objects) and color
36
+ mapped data (encapsulated in the `.matplotlib.colors.ScalarMappable ` family of
37
+ classes); the application of Step 2 is inconsistent (both in actual application
38
+ and when it is applied) between artists; each ``Artist `` stores its data in
39
+ its own way (typically as numpy arrays).
38
40
39
41
With this view, we can understand the `~matplotlib.artist.Artist.draw ` methods
40
- to be very extensively `curried
41
- <https://en.wikipedia.org/wiki/Currying> `__ version of
42
- these function chains where the objects allow us to modify the arguments to the
43
- functions.
42
+ to be very extensively `curried <https://en.wikipedia.org/wiki/Currying >`__
43
+ version of these function chains where the objects allow us to modify the
44
+ arguments to the functions and the re-run them.
44
45
45
- The goal of this work is to bring this structure more the foreground in the internal of
46
- Matplotlib to make it easier to reason about, easier to extend, and easier to inject
47
- custom logic at each of the steps
46
+ The goal of this work is to bring this structure more to the foreground in the
47
+ internal structure of Matplotlib. By exposing this inherent structure
48
+ uniformity in the architecture of Matplotlib the library will be easier to
49
+ reason about and easier to extend by injecting custom logic at each of
50
+ the steps
48
51
49
52
A paper with the formal mathematical description of these ideas is in
50
53
preparation.
@@ -55,55 +58,66 @@ Data pipeline
55
58
Get the data (Step 1)
56
59
---------------------
57
60
58
- Currently, almost all ``Artist `` class store the data associated with them as
59
- attributes on the instances as `numpy.array ` objectss. On one hand, this can
60
- be very useful as historically data was frequently already in `numpy.array `
61
- objects and, if you know the right methods for *this * ``Artist `` you can access
62
- that state to update or query it. From a certain point of view, this is
63
- consistent with the scheme laid out above as ``self.x[:] `` is really
64
- ``self.x.__getitem__(slice()) `` which is (technically) a function call.
65
-
66
- However, this has several drawbacks. In most cases the data attributes on an
67
- ``Artist `` are closely linked -- the *x * and *y * on a
61
+ In this context "data" is post any data-to-data transformations or
62
+ aggregations. There is already extensive tooling and literature around that
63
+ aspect. By completely decoupling the aggregations pipeline from the
64
+ visualization process we are able to both simplify and generalize the problem.
65
+
66
+ Currently, almost all ``Artist `` classes store the data they are representing
67
+ as attributes on the instances as realized `numpy.array ` [# ]_ objects. On one
68
+ hand, this can be very useful as historically data was frequently already in
69
+ `numpy.array ` objects in the users' namespace. If you know the right methods
70
+ for *this * ``Artist ``, you can query or update the data without recreating the
71
+ Artist. This is technically consistent with the scheme laid out above if we
72
+ understand ``self.x[:] `` as ``self.x.__getitem__(slice()) `` which is a function
73
+ call.
74
+
75
+ However, this method of storing the data has several drawbacks. In most cases
76
+ the data attributes on an ``Artist `` are closely linked -- the *x * and *y * on a
68
77
`~matplotlib.lines.Line2D ` must be the same length -- and by storing them
69
- separately it is possible that they will get out of sync in problematic ways.
70
- Further, because the data is stored as materialized `` numpy `` arrays, there we
71
- must decide before draw time what the correct sampling of the data is. While
72
- there are some projects like `grave <https://networkx.org/grave/ >`__ that wrap
73
- richer objects or `mpl-modest-image
78
+ separately it is possible for them to become inconsistent in ways that noticed
79
+ until draw time [ # ]_. Further, because the data is stored as materialized
80
+ `` numpy `` arrays, we must decide before draw time what the correct sampling of
81
+ the data is. While there are some projects like `grave <https://networkx.ors
82
+ g/grave/> `__ that wrap richer objects or `mpl-modest-image
74
83
<https://github.com/ChrisBeaumont/mpl-modest-image> `__, `datashader
75
84
<https://datashader.org/getting_started/Interactivity.html#native-support-for-matplotlib> `__,
76
85
and `mpl-scatter-density <https://github.com/astrofrog/mpl-scatter-density >`__
77
- that dynamically re-sample the data these are niche libraries.
86
+ that dynamically re-sample the data, these libraries have had only limited
87
+ adoption.
78
88
79
- The first goal of this project is to bring support for draw-time resampleing to
80
- every Matplotlib ``Artist `` out of the box. The current approach is to move
81
- all of the data storage off of the ``Artist `` directly and into a (so-called)
82
- `~data_prototype.containers.DataContainer ` instance. The primary method on these objects
83
- is the `~data_prototype.containers.DataContainer.query ` method which has the signature ::
89
+ The first goal of this project is to bring support for draw-time resampling to
90
+ every Matplotlib ``Artist ``. The proposed approach is to move the data storage
91
+ of the ``Artist `` to be indirectly via a (so-called)
92
+ `~data_prototype.containers.DataContainer ` instance rather than directly. The
93
+ primary method on these objects is the
94
+ `~data_prototype.containers.DataContainer.query ` method which has the signature
95
+ ::
84
96
85
97
def query(
86
98
self,
87
- transform: _Transform,
99
+ /,
100
+ coord_transform: _MatplotlibTransform,
88
101
size: Tuple[int, int],
89
102
) -> Tuple[Dict[str, Any], Union[str, int]]:
90
103
91
104
The query is passed in:
92
105
93
- - A transform from "Axes" to "data" (using Matplotlib's names for the `various
94
- coordinate systems
95
- <https://matplotlib.org/stable/tutorials/advanced/transforms_tutorial.html> `__
96
- - A notion of how big the axes is in "pixels" to provide guidance on what the correct number
97
- of samples to return is.
106
+ - A *coord_transform * from "Axes fraction" to "data" (using Matplotlib's names
107
+ for the `coordinate systems
108
+ <https://matplotlib.org/stable/tutorials/advanced/transforms_tutorial.html> `__)
109
+ - A notion of how big the axes is in "pixels" to provide guidance on what the
110
+ correct number of samples to return is. For raster outputs this is literal
111
+ pixels but for vector backends it will have to be an effective resolution.
98
112
99
113
It will return:
100
114
101
- - A mapping of strings to things that is coercible (with the help of the
115
+ - A mapping of strings to things that are coercible (with the help of the
102
116
functions is steps 2 and 3) to a numpy array or types understandable by the
103
117
backends.
104
118
- A key that can be used for caching
105
119
106
- This function will be called at draw time by the ``Aritist `` to get the data to
120
+ This function will be called at draw time by the ``Artist `` to get the data to
107
121
be drawn. In the simplest cases
108
122
(e.g. `~data_prototype.containers.ArrayContainer ` and
109
123
`~data_prototype.containers.DataFrameContainer `) the ``query `` method ignores
@@ -124,15 +138,15 @@ visualization. This also opens up several interesting possibilities:
124
138
125
139
By accessing all of the data that is needed in draw in a single function call
126
140
the ``DataContainer `` instances can ensure that the data is coherent and
127
- consistent. This is important for applications like steaming where different
141
+ consistent. This is important for applications like streaming where different
128
142
parts of the data may be arriving at different rates and it would thus be the
129
143
``DataContainer ``'s responsibility to settle any race conditions and always
130
144
return aligned data to the ``Artist ``.
131
145
132
146
133
147
There is still some ambiguity as to what should be put in the data. For
134
148
example with `~matplotlib.lines.Line2D ` it is clear that the *x * and *y * data
135
- should be pulled from the ``DataConatiner ``, but things like *color * and
149
+ should be pulled from the ``DataContiner ``, but things like *color * and
136
150
*linewidth * are ambiguous. A later section will make the case that it should be
137
151
possible, but maybe not required, that these values be accessible in the data
138
152
context.
@@ -224,7 +238,7 @@ returns a cache key that it generates to the caller. The exact details of how
224
238
to generate that key are left to the ``DataContainer `` implementation, but if
225
239
the returned data changed, then the cache key must change. The cache key
226
240
should be computed from a combination of the ``DataContainers `` internal state,
227
- the transform and size passed in.
241
+ the coordinate transformation and size passed in.
228
242
229
243
The choice to return the data and cache key in one step, rather than be a two
230
244
step process is drive by simplicity and because the cache key is computed
@@ -239,3 +253,9 @@ management at the ``Artist`` layer. We also need to determine how many cache
239
253
layers to keep. Currently only the results of Step 3 are cached, but we may
240
254
want to additionally cache intermediate results after Step 2. The caching from
241
255
Step 1 is likely best left to the ``DataContainer `` instances.
256
+
257
+ .. [# ] Not strictly true, in some cases we also store the values in the data in
258
+ the container it came in with which may not be a `numpy.array `.
259
+ .. [# ] For example `matplotlib.lines.Line2D.set_xdata ` and
260
+ `matplotlib.lines.Line2D.set_ydata ` do not check the lengths of the
261
+ input at call time.
0 commit comments