diff --git a/README.md b/README.md index 647c2ae1..5833f7dd 100644 --- a/README.md +++ b/README.md @@ -26,12 +26,10 @@ Read the documentation for more information: ## Requirements -Supports Python 3.10 - 3.14 and GDAL 3.6.x - 3.11.x. - -Reading to GeoDataFrames requires `geopandas>=0.12` with `shapely>=2`. - -Additionally, installing `pyarrow` in combination with GDAL 3.6+ enables -a further speed-up when specifying `use_arrow=True`. +- Python >= 3.10 +- GDAL >= 3.6 +- Reading to GeoDataFrames requires `geopandas>=0.12` and `shapely>=2`. Additionally, + installing `pyarrow` enables a further speed-up when specifying `use_arrow=True`. ## Installation diff --git a/docs/environment.yml b/docs/environment.yml index 0c3679c7..e1cd374f 100644 --- a/docs/environment.yml +++ b/docs/environment.yml @@ -2,12 +2,11 @@ name: pyogrio channels: - conda-forge dependencies: - - python==3.10.* - - gdal - - numpy==1.24.* - - numpydoc==1.1.* - - Cython==0.29.* - - docutils==0.16.* + - python=3.13 + - libgdal-core + - numpy + - numpydoc=1.9 + - docutils - myst-parser - pip - pip: diff --git a/docs/source/api.rst b/docs/source/api.rst index 105fbb3f..e0255581 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -1,3 +1,5 @@ +.. py:currentmodule:: pyogrio + API reference ============= @@ -5,7 +7,28 @@ Core ---- .. automodule:: pyogrio - :members: list_drivers, detect_write_driver, list_layers, read_bounds, read_info, set_gdal_config_options, get_gdal_config_option, vsi_listtree, vsi_rmtree, vsi_unlink, __gdal_version__, __gdal_version_string__ + :members: list_drivers, detect_write_driver, list_layers, read_bounds, read_info, set_gdal_config_options, get_gdal_config_option, vsi_listtree, vsi_rmtree, vsi_unlink + +.. + For the special attributes/dunder attributes, the inline docstrings weren't + picked up by autodoc, so they are documented explicitly here. + +.. py:attribute:: __version__ + + The pyogrio version (`str`). + +.. py:attribute:: __gdal_version__ + + The GDAL version used by pyogrio (`tuple` of `int`). + +.. py:attribute:: __gdal_version_string__ + + The GDAL version used by pyogrio (`str`). + +.. py:attribute:: __gdal_geos_version__ + + The version of GEOS used by GDAL (`tuple` of `int`). + GeoPandas integration --------------------- diff --git a/docs/source/install.md b/docs/source/install.md index af1ff813..fea515d5 100644 --- a/docs/source/install.md +++ b/docs/source/install.md @@ -2,12 +2,10 @@ ## Requirements -Supports Python 3.10 - 3.14 and GDAL 3.6.x - 3.11.x - -Reading to GeoDataFrames requires `geopandas>=0.12` with `shapely>=2`. - -Additionally, installing `pyarrow` in combination with GDAL 3.6+ enables -a further speed-up when specifying `use_arrow=True`. +- Python >= 3.10 +- GDAL >= 3.6 +- Reading to GeoDataFrames requires `geopandas>=0.12` and `shapely>=2`. Additionally, + installing `pyarrow` enables a further speed-up when specifying `use_arrow=True`. ## Installation @@ -48,10 +46,16 @@ most likely due to the installation process falling back to installing from the source distribution because the available wheels are not compatible with your platform. -The binary wheels available on PyPI include the core GDAL drivers (GeoJSON, -ESRI Shapefile, GPKG, FGB, OpenFileGDB, etc) but do not include more advanced -drivers such as LIBKML and Spatialite. If you need such drivers, we recommend -that you use conda-forge to install pyogrio as explained above. +Note that the GDAL version included in the binary wheels is not always the latest +version and is likely to be a different version than the system GDAL. Please use +{attr}`pyogrio.__gdal_version_string__` to get the GDAL version being used by +pyogrio. Also note that the wheels include the most common GDAL vector drivers +(GeoJSON, ESRI Shapefile, GPKG, FGB, OpenFileGDB, etc), but not all drivers. Use +{func}`pyogrio.list_drivers()` to list the drivers available in pyogrio. + +If you need drivers that are not included in the wheels, or if you need pyogrio +to use a newer version of GDAL, consider using `conda-forge` to install pyogrio as +explained above. ### Troubleshooting installation errors diff --git a/docs/source/introduction.md b/docs/source/introduction.md index b24862d4..3e60cae7 100644 --- a/docs/source/introduction.md +++ b/docs/source/introduction.md @@ -10,7 +10,7 @@ You can display the GDAL version that Pyogrio was compiled against by ## List available drivers -Use `pyogrio.list_drivers()` to list all available drivers in your installation +Use {func}`~pyogrio.list_drivers()` to list all available drivers in your installation of GDAL. However, just because a driver is listed does not mean that it is currently compatible with Pyogrio. @@ -53,7 +53,7 @@ The following drivers are known to be well-supported and tested in Pyogrio: ## List available layers -To list layers available in a data source: +To list layers available in a data source, use {func}`~pyogrio.list_layers()`: ```python >>> from pyogrio import list_layers @@ -68,9 +68,9 @@ be nonspatial. In this case, the geometry type will be `None`. ## Read basic information about a data layer -To list information about a data layer in a data source, use the name of the layer -or its index (0-based) within the data source. By default, this reads from the -first layer. +To list information about a data layer in a data source, use +{func}`~pyogrio.read_info()`. You can specify the name of the layer or its index +(0-based) within the data source. By default, this reads from the first layer. ```python >>> from pyogrio import read_info @@ -102,8 +102,9 @@ To read from a layer using name or index (the following are equivalent): ## Read a data layer into a GeoPandas GeoDataFrame -To read all features from a spatial data layer. By default, this operates on -the first layer unless `layer` is specified using layer name or index. +To read all features from a spatial data layer, use {func}`~pyogrio.read_dataframe()`. +By default, this operates on the first layer unless `layer` is specified using layer +name or index. ```python >>> from pyogrio import read_dataframe @@ -212,7 +213,7 @@ Note: the `bbox` values must be in the same CRS as the dataset. Note: if GEOS is present and used by GDAL, only geometries that intersect `bbox` will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect this bbox will be returned. -`pyogrio.__gdal_geos_version__` will be `None` if GEOS is not detected. +{func}`pyogrio.__gdal_geos_version__` will be `None` if GEOS is not detected. ## Filter records by a geometry @@ -238,7 +239,7 @@ need to convert it to a Shapely geometry before using `mask`. Note: if GEOS is present and used by GDAL, only geometries that intersect `mask` will be returned; if GEOS is not available or not used by GDAL, all geometries with bounding boxes that intersect the bounding box of `mask` will be returned. -`pyogrio.__gdal_geos_version__` will be `None` if GEOS is not detected. +{func}`pyogrio.__gdal_geos_version__` will be `None` if GEOS is not detected. ## Execute a sql query @@ -345,7 +346,8 @@ or a DBF file, directly into a Pandas `DataFrame`. ## Read feature bounds You can read the bounds of all or a subset of features in the dataset in order -to create a spatial index of features without reading all underlying geometries. +to create a spatial index of features without reading all underlying geometries +with {func}`~pyogrio.read_bounds()`. This is typically 2-3x faster than reading full feature data, but the main benefit is to avoid reading all feature data into memory for very large datasets. @@ -368,7 +370,7 @@ This function supports options to subset features from the dataset: ## Write a GeoPandas GeoDataFrame -You can write a `GeoDataFrame` `df` to a file as follows: +You can write a `GeoDataFrame` `df` to a file with {func}`~pyogrio.write_dataframe()`: ```python >>> from pyogrio import write_dataframe @@ -472,7 +474,7 @@ You can also read from a URL with this syntax: GDAL only supports datetimes at a millisecond resolution. Reading data will thus give at most millisecond resolution (`datetime64[ms]` data type). With pandas 2.0 -`pyogrio.read_dataframe()` will return datetime data as `datetime64[ms]` +{func}`~pyogrio.read_dataframe()` will return datetime data as `datetime64[ms]` correspondingly. For previous versions of pandas, `datetime64[ns]` is used as ms precision was not supported. When writing, only precision up to ms is retained. @@ -485,7 +487,7 @@ Timezone information is preserved where possible, however GDAL only represents time zones as UTC offsets, whilst pandas uses IANA time zones (via `pytz` or `zoneinfo`). This means that dataframes with columns containing multiple offsets (e.g. when switching from standard time to summer time) will be written correctly, -but when read via `pyogrio.read_dataframe()` will be returned as a UTC datetime +but when read via {func}`~pyogrio.read_dataframe()` will be returned as a UTC datetime column, as there is no way to reconstruct the original timezone from the individual offsets present. @@ -494,7 +496,7 @@ offsets present. It is possible to use dataset and layer creation options available for a given driver in GDAL (see the relevant [GDAL driver page](https://gdal.org/drivers/vector/index.html)). These -can be passed in as additional `kwargs` to `write_dataframe` or using +can be passed in as additional `kwargs` to {func}`~pyogrio.write_dataframe` or using dictionaries for dataset or layer-level options. Where possible, Pyogrio uses the metadata of the driver to determine if a diff --git a/docs/source/known_issues.md b/docs/source/known_issues.md index 1a26efac..ad57aacc 100644 --- a/docs/source/known_issues.md +++ b/docs/source/known_issues.md @@ -15,7 +15,8 @@ encountered, the following occurs: Note: detection of NULL or otherwise unset field values is limited to the subset of records that are read from the data layer, which means that reading different subsets of records may yield different data types for the same columns. You -can use `read_info()` to determine the original data types of each column. +can use {func}`~pyogrio.read_info()` to determine the original data types of each +column. ## No support for measured geometries @@ -105,9 +106,9 @@ We recommend the following to sidestep performance issues: ## Incorrect results when using a spatial filter and Arrow interface Due to [a bug in GDAL](https://github.com/OSGeo/gdal/issues/8347), when using -the Arrow interface (e.g., via `use_arrow` on `read_dataframe`) certain drivers -(e.g., GPKG, FlatGeobuf, Arrow, Parquet) returned features whose bounding boxes -intersected the bounding box specified by `bbox` or `mask` geometry instead of -those whose geometry intersected the `bbox` or `mask`. +the Arrow interface (e.g., via `use_arrow` on {func}`~pyogrio.read_dataframe`) +certain drivers (e.g., GPKG, FlatGeobuf, Arrow, Parquet) returned features whose +bounding boxes intersected the bounding box specified by `bbox` or `mask` geometry +instead of those whose geometry intersected the `bbox` or `mask`. A fix is expected in GDAL 3.8.0. diff --git a/pyogrio/__init__.py b/pyogrio/__init__.py index 17db8339..7ec02dda 100644 --- a/pyogrio/__init__.py +++ b/pyogrio/__init__.py @@ -1,4 +1,4 @@ -"""Vectorized vector I/O using OGR.""" +"""Bulk-oriented vector I/O using OGR.""" try: # we try importing shapely, to ensure it is imported (and it can load its