Skip to content

Commit c827911

Browse files
committed
Merge pull request #629 from shoyer/tolerance
ENH: add tolerance argument to .sel and .reindex
2 parents a2fff56 + d915588 commit c827911

File tree

9 files changed

+162
-55
lines changed

9 files changed

+162
-55
lines changed

ci/requirements-py26.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,6 @@ dependencies:
1313
- unittest2
1414
- pip:
1515
- coveralls
16+
- cyordereddict
1617
- dask
1718
- h5netcdf

doc/indexing.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ Nearest neighbor lookups
214214

215215
The label based selection methods :py:meth:`~xray.Dataset.sel`,
216216
:py:meth:`~xray.Dataset.reindex` and :py:meth:`~xray.Dataset.reindex_like` all
217-
support a ``method`` keyword argument. The method parameter allows for
217+
support ``method`` and ``tolerance`` keyword argument. The method parameter allows for
218218
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
219219
``'backfill'`` or ``'nearest'``:
220220

@@ -225,8 +225,14 @@ enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
225225
data.sel(x=0.1, method='backfill')
226226
data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
227227
228+
Tolerance limits the maximum distance for valid matches with an inexact lookup:
229+
230+
.. ipython:: python
231+
232+
data.reindex(x=[1.1, 1.5], method='nearest', tolerance=0.2)
233+
228234
Using ``method='nearest'`` or a scalar argument with ``.sel()`` requires pandas
229-
version 0.16 or newer.
235+
version 0.16 or newer. Using ``tolerance`` requries pandas version 0.17 or newer.
230236

231237
The method parameter is not yet supported if any of the arguments
232238
to ``.sel()`` is a ``slice`` object:

doc/whats-new.rst

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,11 @@ What's New
1212
v0.6.1
1313
------
1414

15-
The minimum required version of dask for use with xray is now version 0.6.
15+
This pull request contains a number of bug and compatibility fixes, as well
16+
as enhancements to indexing, plotting and writing files to disk.
17+
18+
Note that the minimum required version of dask for use with xray is now
19+
version 0.6.
1620

1721
API Changes
1822
~~~~~~~~~~~
@@ -26,24 +30,41 @@ API Changes
2630
Enhancements
2731
~~~~~~~~~~~~
2832

33+
- :py:meth:`~xray.Dataset.sel` and :py:meth:`~xray.Dataset.reindex` now support
34+
the ``tolerance`` argument for controlling nearest-neighbor selection
35+
(:issue:`629`)::
36+
37+
.. ipython::
38+
:verbatim:
39+
40+
In [5]: array = xray.DataArray([1, 2, 3], dims='x')
41+
42+
In [6]: array.reindex(x=[0.9, 1.5], method='nearest', tolerance=0.2)
43+
Out[6]:
44+
<xray.DataArray (x: 2)>
45+
array([ 2., nan])
46+
Coordinates:
47+
* x (x) float64 0.9 1.5
48+
49+
This feature requires pandas v0.17 or newer.
50+
- Faceted plotting through :py:class:`~xray.plot.FacetGrid` and the
51+
:py:meth:`~xray.plot.plot` method.
52+
- New ``encoding`` argument in :py:meth:`~xray.Dataset.to_netcdf` for writing
53+
netCDF files with compression, as described in the new documentation
54+
section on :ref:`io.netcdf.writing_encoded`.
2955
- Add :py:attr:`~xray.Dataset.real` and :py:attr:`~xray.Dataset.imag`
3056
attributes to Dataset and DataArray (:issue:`553`).
3157
- More informative error message with :py:meth:`~xray.Dataset.from_dataframe`
3258
if the frame has duplicate columns.
3359
- xray now uses deterministic names for dask arrays it creates or opens from
3460
disk. This allows xray users to take advantage of dask's nascent support for
3561
caching intermediate computation results. See :issue:`555` for an example.
36-
- Faceted plotting through :py:class:`~xray.plot.FacetGrid` and the
37-
:py:meth:`~xray.plot.plot` method.
38-
- New ``encoding`` argument in :py:meth:`~xray.Dataset.to_netcdf` for writing
39-
netCDF files with compression, as described in the new documentation
40-
section on :ref:`io.netcdf.writing_encoded`.
4162

4263
Bug fixes
4364
~~~~~~~~~
4465

45-
- Forwards compatibility with the next pandas release of changes (v0.17.0).
46-
We were using some internal pandas routines for datetime conversion, which
66+
- Forwards compatibility with the latest pandas release (v0.17.0). We were
67+
using some internal pandas routines for datetime conversion, which
4768
unfortunately have now changed upstream (:issue:`569`).
4869
- Aggregation functions now correctly skip ``NaN`` for data for ``complex128``
4970
dtype (:issue:`554`).

xray/core/alignment.py

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import functools
22
import operator
3+
import pandas as pd
34
from collections import defaultdict
45

56
import numpy as np
@@ -99,7 +100,8 @@ def partial_align(*objects, **kwargs):
99100
return tuple(obj.reindex(copy=copy, **joined_indexes) for obj in objects)
100101

101102

102-
def reindex_variables(variables, indexes, indexers, method=None, copy=True):
103+
def reindex_variables(variables, indexes, indexers, method=None,
104+
tolerance=None, copy=True):
103105
"""Conform a dictionary of aligned variables onto a new set of variables,
104106
filling in missing values with NaN.
105107
@@ -123,6 +125,10 @@ def reindex_variables(variables, indexes, indexers, method=None, copy=True):
123125
* pad / ffill: propgate last valid index value forward
124126
* backfill / bfill: propagate next valid index value backward
125127
* nearest: use nearest valid index value
128+
tolerance : optional
129+
Maximum distance between original and new labels for inexact matches.
130+
The values of the index at the matching locations most satisfy the
131+
equation ``abs(index[indexer] - target) <= tolerance``.
126132
copy : bool, optional
127133
If `copy=True`, the returned dataset contains only copied
128134
variables. If `copy=False` and no reindexing is required then
@@ -137,11 +143,21 @@ def reindex_variables(variables, indexes, indexers, method=None, copy=True):
137143
to_indexers = {}
138144
to_shape = {}
139145
from_indexers = {}
146+
147+
# for compat with older versions of pandas that don't support tolerance
148+
get_indexer_kwargs = {}
149+
if tolerance is not None:
150+
if pd.__version__ < '0.17':
151+
raise NotImplementedError(
152+
'the tolerance argument requires pandas v0.17 or newer')
153+
get_indexer_kwargs['tolerance'] = tolerance
154+
140155
for name, index in iteritems(indexes):
141156
to_shape[name] = index.size
142157
if name in indexers:
143158
target = utils.safe_cast_to_index(indexers[name])
144-
indexer = index.get_indexer(target, method=method)
159+
indexer = index.get_indexer(target, method=method,
160+
**get_indexer_kwargs)
145161

146162
to_shape[name] = len(target)
147163
# Note pandas uses negative values from get_indexer to signify

xray/core/dataarray.py

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,7 @@ def isel(self, **indexers):
539539
ds = self._dataset.isel(**indexers)
540540
return self._with_replaced_dataset(ds)
541541

542-
def sel(self, method=None, **indexers):
542+
def sel(self, method=None, tolerance=None, **indexers):
543543
"""Return a new DataArray whose dataset is given by selecting
544544
index labels along the specified dimension(s).
545545
@@ -548,8 +548,8 @@ def sel(self, method=None, **indexers):
548548
Dataset.sel
549549
DataArray.isel
550550
"""
551-
return self.isel(**indexing.remap_label_indexers(self, indexers,
552-
method=method))
551+
return self.isel(**indexing.remap_label_indexers(
552+
self, indexers, method=method, tolerance=tolerance))
553553

554554
def isel_points(self, dim='points', **indexers):
555555
"""Return a new DataArray whose dataset is given by pointwise integer
@@ -562,18 +562,20 @@ def isel_points(self, dim='points', **indexers):
562562
ds = self._dataset.isel_points(dim=dim, **indexers)
563563
return self._with_replaced_dataset(ds)
564564

565-
def sel_points(self, dim='points', method=None, **indexers):
565+
def sel_points(self, dim='points', method=None, tolerance=None,
566+
**indexers):
566567
"""Return a new DataArray whose dataset is given by pointwise selection
567568
of index labels along the specified dimension(s).
568569
569570
See Also
570571
--------
571572
Dataset.sel_points
572573
"""
573-
ds = self._dataset.sel_points(dim=dim, method=method, **indexers)
574+
ds = self._dataset.sel_points(dim=dim, method=method,
575+
tolerance=tolerance, **indexers)
574576
return self._with_replaced_dataset(ds)
575577

576-
def reindex_like(self, other, method=None, copy=True):
578+
def reindex_like(self, other, method=None, tolerance=None, copy=True):
577579
"""Conform this object onto the indexes of another object, filling
578580
in missing values with NaN.
579581
@@ -594,6 +596,11 @@ def reindex_like(self, other, method=None, copy=True):
594596
* pad / ffill: propgate last valid index value forward
595597
* backfill / bfill: propagate next valid index value backward
596598
* nearest: use nearest valid index value (requires pandas>=0.16)
599+
tolerance : optional
600+
Maximum distance between original and new labels for inexact
601+
matches. The values of the index at the matching locations most
602+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
603+
Requires pandas>=0.17.
597604
copy : bool, optional
598605
If `copy=True`, the returned array's dataset contains only copied
599606
variables. If `copy=False` and no reindexing is required then
@@ -610,9 +617,10 @@ def reindex_like(self, other, method=None, copy=True):
610617
DataArray.reindex
611618
align
612619
"""
613-
return self.reindex(method=method, copy=copy, **other.indexes)
620+
return self.reindex(method=method, tolerance=tolerance, copy=copy,
621+
**other.indexes)
614622

615-
def reindex(self, method=None, copy=True, **indexers):
623+
def reindex(self, method=None, tolerance=None, copy=True, **indexers):
616624
"""Conform this object onto a new set of indexes, filling in
617625
missing values with NaN.
618626
@@ -630,6 +638,11 @@ def reindex(self, method=None, copy=True, **indexers):
630638
* pad / ffill: propgate last valid index value forward
631639
* backfill / bfill: propagate next valid index value backward
632640
* nearest: use nearest valid index value (requires pandas>=0.16)
641+
tolerance : optional
642+
Maximum distance between original and new labels for inexact
643+
matches. The values of the index at the matching locations most
644+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
645+
Requires pandas>=0.17.
633646
**indexers : dict
634647
Dictionary with keys given by dimension names and values given by
635648
arrays of coordinates tick labels. Any mis-matched coordinate values
@@ -647,7 +660,8 @@ def reindex(self, method=None, copy=True, **indexers):
647660
DataArray.reindex_like
648661
align
649662
"""
650-
ds = self._dataset.reindex(method=method, copy=copy, **indexers)
663+
ds = self._dataset.reindex(method=method, tolerance=tolerance,
664+
copy=copy, **indexers)
651665
return self._with_replaced_dataset(ds)
652666

653667
def rename(self, new_name_or_name_dict):

xray/core/dataset.py

Lines changed: 32 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1010,7 +1010,7 @@ def isel(self, **indexers):
10101010
variables[name] = var.isel(**var_indexers)
10111011
return self._replace_vars_and_dims(variables)
10121012

1013-
def sel(self, method=None, **indexers):
1013+
def sel(self, method=None, tolerance=None, **indexers):
10141014
"""Returns a new dataset with each array indexed by tick labels
10151015
along the specified dimension(s).
10161016
@@ -1036,6 +1036,11 @@ def sel(self, method=None, **indexers):
10361036
* pad / ffill: propgate last valid index value forward
10371037
* backfill / bfill: propagate next valid index value backward
10381038
* nearest: use nearest valid index value
1039+
tolerance : optional
1040+
Maximum distance between original and new labels for inexact
1041+
matches. The values of the index at the matching locations most
1042+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
1043+
Requires pandas>=0.17.
10391044
**indexers : {dim: indexer, ...}
10401045
Keyword arguments with names matching dimensions and values given
10411046
by scalars, slices or arrays of tick labels.
@@ -1056,8 +1061,8 @@ def sel(self, method=None, **indexers):
10561061
Dataset.isel_points
10571062
DataArray.sel
10581063
"""
1059-
return self.isel(**indexing.remap_label_indexers(self, indexers,
1060-
method=method))
1064+
return self.isel(**indexing.remap_label_indexers(
1065+
self, indexers, method=method, tolerance=tolerance))
10611066

10621067
def isel_points(self, dim='points', **indexers):
10631068
"""Returns a new dataset with each array indexed pointwise along the
@@ -1146,7 +1151,8 @@ def relevant_keys(mapping):
11461151
zip(*[v for k, v in indexers])]],
11471152
dim=dim, coords=coords, data_vars=data_vars)
11481153

1149-
def sel_points(self, dim='points', method=None, **indexers):
1154+
def sel_points(self, dim='points', method=None, tolerance=None,
1155+
**indexers):
11501156
"""Returns a new dataset with each array indexed pointwise by tick
11511157
labels along the specified dimension(s).
11521158
@@ -1172,6 +1178,11 @@ def sel_points(self, dim='points', method=None, **indexers):
11721178
* pad / ffill: propagate last valid index value forward
11731179
* backfill / bfill: propagate next valid index value backward
11741180
* nearest: use nearest valid index value
1181+
tolerance : optional
1182+
Maximum distance between original and new labels for inexact
1183+
matches. The values of the index at the matching locations most
1184+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
1185+
Requires pandas>=0.17.
11751186
**indexers : {dim: indexer, ...}
11761187
Keyword arguments with names matching dimensions and values given
11771188
by array-like objects. All indexers must be the same length and
@@ -1192,11 +1203,11 @@ def sel_points(self, dim='points', method=None, **indexers):
11921203
Dataset.isel_points
11931204
DataArray.sel_points
11941205
"""
1195-
pos_indexers = indexing.remap_label_indexers(self, indexers,
1196-
method=method)
1206+
pos_indexers = indexing.remap_label_indexers(
1207+
self, indexers, method=method, tolerance=tolerance)
11971208
return self.isel_points(dim=dim, **pos_indexers)
11981209

1199-
def reindex_like(self, other, method=None, copy=True):
1210+
def reindex_like(self, other, method=None, tolerance=None, copy=True):
12001211
"""Conform this object onto the indexes of another object, filling
12011212
in missing values with NaN.
12021213
@@ -1217,6 +1228,11 @@ def reindex_like(self, other, method=None, copy=True):
12171228
* pad / ffill: propagate last valid index value forward
12181229
* backfill / bfill: propagate next valid index value backward
12191230
* nearest: use nearest valid index value (requires pandas>=0.16)
1231+
tolerance : optional
1232+
Maximum distance between original and new labels for inexact
1233+
matches. The values of the index at the matching locations most
1234+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
1235+
Requires pandas>=0.17.
12201236
copy : bool, optional
12211237
If `copy=True`, the returned dataset contains only copied
12221238
variables. If `copy=False` and no reindexing is required then
@@ -1233,9 +1249,10 @@ def reindex_like(self, other, method=None, copy=True):
12331249
Dataset.reindex
12341250
align
12351251
"""
1236-
return self.reindex(method=method, copy=copy, **other.indexes)
1252+
return self.reindex(method=method, copy=copy, tolerance=tolerance,
1253+
**other.indexes)
12371254

1238-
def reindex(self, indexers=None, method=None, copy=True, **kw_indexers):
1255+
def reindex(self, indexers=None, method=None, tolerance=None, copy=True, **kw_indexers):
12391256
"""Conform this object onto a new set of indexes, filling in
12401257
missing values with NaN.
12411258
@@ -1254,6 +1271,11 @@ def reindex(self, indexers=None, method=None, copy=True, **kw_indexers):
12541271
* pad / ffill: propagate last valid index value forward
12551272
* backfill / bfill: propagate next valid index value backward
12561273
* nearest: use nearest valid index value (requires pandas>=0.16)
1274+
tolerance : optional
1275+
Maximum distance between original and new labels for inexact
1276+
matches. The values of the index at the matching locations most
1277+
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
1278+
Requires pandas>=0.17.
12571279
copy : bool, optional
12581280
If `copy=True`, the returned dataset contains only copied
12591281
variables. If `copy=False` and no reindexing is required then
@@ -1279,7 +1301,7 @@ def reindex(self, indexers=None, method=None, copy=True, **kw_indexers):
12791301
return self.copy(deep=True) if copy else self
12801302

12811303
variables = alignment.reindex_variables(
1282-
self.variables, self.indexes, indexers, method, copy=copy)
1304+
self.variables, self.indexes, indexers, method, tolerance, copy=copy)
12831305
return self._replace_vars_and_dims(variables)
12841306

12851307
def rename(self, name_dict, inplace=False):

0 commit comments

Comments
 (0)