1+ .. _data structures :
2+
13Data Structures
24===============
35
46.. ipython :: python
57 :suppress:
68
7- import numpy as np
8- np.random.seed(123456 )
9- np.set_printoptions(threshold = 10 )
10-
11- To get started, we will import numpy, pandas and xray:
12-
13- .. ipython :: python
14-
159 import numpy as np
1610 import pandas as pd
1711 import xray
12+ np.random.seed(123456 )
13+ np.set_printoptions(threshold = 10 )
1814
1915 DataArray
2016---------
@@ -31,10 +27,9 @@ multi-dimensional array. It has several key properties:
3127
3228xray uses ``dims `` and ``coords `` to enable its core metadata aware operations.
3329Dimensions provide names that xray uses instead of the ``axis `` argument found
34- in many numpy functions. Coordinates (particularly "index coordinates") enable
35- fast label based indexing and alignment, building on the functionality of the
36- ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
37- :py:class: `~pandas.Series `.
30+ in many numpy functions. Coordinates enable fast label based indexing and
31+ alignment, building on the functionality of the ``index `` found on a pandas
32+ :py:class: `~pandas.DataFrame ` or :py:class: `~pandas.Series `.
3833
3934DataArray objects also can have a ``name `` and can hold arbitrary metadata in
4035the form of their ``attrs `` property (an ordered dictionary). Names and
@@ -66,9 +61,9 @@ in with default values:
6661
6762 xray.DataArray(data)
6863
69- As you can see, dimension names and index coordinates, which label tick marks
70- along each dimension, are always present. This behavior is similar to pandas,
71- which fills in index values in the same way.
64+ As you can see, dimensions and coordinate arrays corresponding to each
65+ dimension are always present. This behavior is similar to pandas, which fills
66+ in index values in the same way.
7267
7368The data array constructor also supports supplying ``coords `` as a list of
7469``(dim, ticks[, attrs]) `` pairs with length equal to the number of dimensions:
@@ -80,7 +75,7 @@ The data array constructor also supports supplying ``coords`` as a list of
8075 Yet another option is to supply ``coords `` in the form of a dictionary where
8176the values are scaler values, 1D arrays or tuples (in the same form as the
8277`dataarray constructor `_). This form lets you supply other coordinates than
83- those used for indexing (more on these later):
78+ those corresponding to dimensions (more on these later):
8479
8580.. ipython :: python
8681
@@ -214,16 +209,14 @@ variables. Dictionary like access on a dataset will supply arrays found in
214209either category. However, the distinction does have important implications for
215210indexing and compution.
216211
217- Here is an example how we might structure a dataset for a weather forecast:
212+ Here is an example of how we might structure a dataset for a weather forecast:
218213
219214.. image :: _static/dataset-diagram.png
220215
221216In this example, it would be natural to call ``temperature `` and
222217``precipitation `` "variables" and all the other arrays "coordinates" because
223- they label the points along the dimensions. ``x ``, ``y `` and ``time `` are
224- index coordinates (used for alignment purposes), and ``latitude ``,
225- ``longitude `` and ``reference_time `` are other coordinates, not used for
226- indexing (see [1 ]_ for more background on this example).
218+ they label the points along the dimensions. (see [1 ]_ for more background on
219+ this example).
227220
228221.. _dataarray constructor :
229222
@@ -383,40 +376,46 @@ Another useful option is the ability to rename the variables in a dataset:
383376
384377 ds.rename({' temperature' : ' temp' , ' precipitation' : ' precip' })
385378
379+ .. _coordinates :
380+
386381Coordinates
387382-----------
388383
389- ``DataArray `` and ``Dataset `` objects store two types of arrays in their
390- ``coords `` attribute:
384+ Coordinates are ancilliary arrays stored for ``DataArray `` and ``Dataset ``
385+ objects in the ``coords `` attribute:
386+
387+ .. ipython :: python
388+
389+ ds.coords
391390
392- * "Index" coordinates are used for label based indexing and alignment, like the
393- ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
394- :py:class: `~pandas.Series `. Index coordinates must be one-dimensional, and
395- are (automatically) identified by arrays with a name equal to their (single)
396- dimension.
397- * "Other" coordinates are also intended to be descriptive of points along
398- dimensions, but xray makes no any direct use of them, beyond persisting
399- through operations when it can be done unambiguously. These coordinates can
400- have any number of dimensions.
391+ Unlike attributes, xray *does * interpret and persist coordinates in
392+ operations that transform xray objects.
401393
402- .. note ::
394+ One dimensional coordinates with a name equal to their sole dimension (marked
395+ by ``* `` when printing a dataset or data array) take on a special meaning in
396+ xray. They are used for label based indexing and alignment,
397+ like the ``index `` found on a pandas :py:class: `~pandas.DataFrame ` or
398+ :py:class: `~pandas.Series `. Indeed, these "dimension" coordinates use a
399+ :py:class: `pandas.Index ` internally to store their values.
403400
404- You cannot yet use a :py:class: `pandas.MultiIndex ` as a xray index
405- coordinate (:issue: `164 `).
401+ Other than for indexing, xray does not make any direct use of the values
402+ associated with coordinates. Coordinates with names not matching a dimension
403+ are not used for alignment or indexing, nor are they required to match when
404+ doing arithmetic (see :ref: `alignment and coordinates `).
406405
407406Converting to ``pandas.Index ``
408407~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
409408
410- To convert an index coordinate into an actual :py:class: ` pandas.Index `, use
411- the :py:meth: `~xray.DataArray.to_index ` method:
409+ To convert a coordinate (or any `` DataArray ``) into an actual
410+ :py:class: ` pandas.Index `, use the :py:meth: `~xray.DataArray.to_index ` method:
412411
413412.. ipython :: python
414413
415414 ds[' time' ].to_index()
416415
417416 A useful shortcut is the ``indexes `` property (on both ``DataArray `` and
418- ``Dataset ``), which lazily constructs a dictionary where the values are
419- ``Index `` objects:
417+ ``Dataset ``), which lazily constructs a dictionary whose keys are given by each
418+ dimension and whose the values are ``Index `` objects:
420419
421420.. ipython :: python
422421
@@ -436,18 +435,17 @@ variables, use the the :py:meth:`~xray.Dataset.set_coords` and
436435 ds.set_coords([' temperature' , ' precipitation' ])
437436 ds[' temperature' ].reset_coords(drop = True )
438437
439- Notice that these operations skip index coordinates.
440-
441- .. note ::
442-
443- We do not yet have a ``set_index `` method like pandas for manipulating
444- indexes. This is planned.
438+ Notice that these operations skip coordinates with names given by dimensions,
439+ as used for indexing. This mostly because we are not entirely sure how to
440+ design the interface around the fact that xray cannot store a coordinate and
441+ variable with the name but different values in the same dictionary. But we do
442+ recognize that supporting something like this would be useful.
445443
446444Converting into datasets
447445~~~~~~~~~~~~~~~~~~~~~~~~
448446
449- Coordinate objects also have a few useful methods, mostly for converting them
450- into dataset objects:
447+ `` Coordinates `` objects also have a few useful methods, mostly for converting
448+ them into dataset objects:
451449
452450.. ipython :: python
453451
0 commit comments