diff --git a/README.md b/README.md index 10b2875..1190eb6 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -f# Lisflood Utilities +# Lisflood Utilities This repository hosts source code of LISFLOOD utilities. Go to [Lisflood OS page](https://ec-jrc.github.io/lisflood/) for more information. @@ -15,64 +15,17 @@ Other useful resources | Lisflood Usecases | | https://github.com/ec-jrc/lisflood-usecases | -## Intro +## Intro and documentation Lisflood Utilities is a set of tools to help LISFLOOD users (or any users of PCRaster/NetCDF files) -to execute some mundane tasks that are necessary to operate lisflood. -Here's a list of utilities you can find in lisflood-utilities package. +to execute some mundane tasks that are necessary to operate LISFLOOD. -* __[pcr2nc](#pcr2nc)__ is a tool to convert PCRaster maps to NetCDF4 files. - - convert a single map into a NetCDF4 file - - convert a time series of maps into a NetCDF4 mapstack - - support for WGS84 and ETRS89 (LAEA) reference systems - - fine tuning of output files (compression, significant digits, etc.) - -* __[nc2pcr](#nc2pcr)__ is a tool to convert a NetCDF file into PCRaster maps. - - convert 2D variables in single PCRaster maps - - NetCDF4 mapstacks are not supported yet +The documentation about the available tools and how to use them can be found in the [Wiki](https://github.com/ec-jrc/lisflood-utilities/wiki) of this repository. -* __[cutmaps](#cutmaps)__ is a tool to cut NetCDF files in order to reduce size, using either - - a bounding box of coordinates - - a bounding box of matrix indices - - an existing boolean area mask - - a list of stations and a LDD ("local drain direction" in NetCDF or PCRaster format) - -> **Note**: PCRaster must be installed in the Conda environment. - -* __[compare](#compare)__ is a package containing a set of simple Python classes that helps to compare -NetCDF, PCRaster and TSS files. +## Installation -* __[thresholds](#thresholds)__ is a tool to compute the discharge return period thresholds from NetCDF4 file containing a discharge time series. +### Requisites -* __[water-demand-historic](#water-demand-historic)__ is a package allowing to generate sectoral (domestic, livestock, industry, and thermoelectric) water demand maps with monthly to yearly temporal steps for a range of past years, at the users’ defined spatial resolution and geographical extent. These maps are required by the LISFLOOD OS [water use module](https://ec-jrc.github.io/lisflood-model/2_18_stdLISFLOOD_water-use) - -* __[waterregions](#waterregions)__ is a package containing two scripts that allow to create and verify a water regions map, respectively. - -* __[gridding](#gridding)__ is a tool to interpolate meteo variables observations stored in text files containing (lon, lat, value) into grids. - - uses inverse distance interpolation - - input file names must use format: \YYYYMMDDHHMI_YYYYMMDDHHMISS.txt - - option to store all interpolated grids in a single NetCDF4 file - - option to store each interpolated grid in a GeoTIFF file - - output files are compressed - - grids are setup in the configuration folder and are defined by a dem.nc file - - meteo variables parameters are defined in the same configuration folder - -* __[cddmap](#cddmap)__ is a tool to generate correlation decay distance (CDD) maps starting from station timeseries - -* __[ncextract](#ncextract)__ is a tool to extract values from NetCDF4 (or GRIB) file(s) at specific coordinates. - -* __[catchstats](#catchstats)__ calculates catchment statistics (mean, sum, std, min, max...) from NetCDF4 files given masks created with [`cutmaps`](#cutmaps:-a-NetCDF-files-cookie-cutter). - -The package contains convenient classes for reading/writing: - -* PCRasterMap -* PCRasterReader -* NetCDFMap -* NetCDFWriter - -### Installation - -#### Requisites The easy way is to use conda environment as they incapsulate C dependencies as well, so you wouldn't need to install libraries. Otherwise, ensure you have properly installed the following software: @@ -81,7 +34,8 @@ Otherwise, ensure you have properly installed the following software: - GDAL C library and software - NetCDF4 C library -#### Install +### Install + If you use conda, create a new env (or use an existing one) and install gdal and lisflood-utilities: ```bash @@ -91,17 +45,16 @@ conda install -c conda-forge pcraster eccodes gdal pip install lisflood-utilities ``` -If you don't use conda but a straight python3 virtualenv: +If you don't use conda but a straight Python3 virtualenv: ```bash source /path/myenv/bin/activate pip install lisflood-utilities ``` -If GDAL library fails to install, ensure to install the same package version of the -C library you have on your system. You may also need to setup paths to gdal headers. +If GDAL library fails to install, ensure to install the same package version of the C library you have on your system. You may also need to setup paths to GDAL headers. -To check which version of GDAL libraries you have installed on your computer, use gdal-config +To check which version of GDAL libraries you have installed on your computer, use `gdal-config` ```bash sudo apt-get install libgdal-dev libgdal @@ -111,651 +64,16 @@ gdal-config --version # 3.0.1 pip install GDAL==3.0.1 ``` -Note: if you previously installed an older version of the lisflood-utilities, it is highly recommended to remove it before installing the newest version: +> **Note:** if you previously installed an older version of `lisflood-utilities`, it is highly recommended to remove it before installing the newest version: ```bash pip uninstall lisflood-utilities pip install -e./ ``` - -## pcr2nc - -### Usage - -> __Note:__ This guide assumes you have installed the program with pip tool. -> If you cloned the source code instead, just substitute the executable `pcr2nc` with `python pcr2nc_script.py` that is in the root folder of the cloned project. - -The tool takes three command line input arguments: - -- -i, --input: It can be a path to a single file, a folder or a unix-like widlcard expression like _/path/to/files/dis00*_ -- -o, --output_file: Path to the output nc file -- -m, --metadata: Path to a yaml or json file containing configuration for the NetCDF4 output file. - -Unless the input is a single file, the resulting NetCDF4 file will be a mapstack according to a time dimension. - -Input as a folder containing PCRaster maps. In this case, the folder must contain ONLY PCraster files and the output will be a mapstack. - -```bash -pcr2nc -i /path/to/input/ -o /path/to/output/out.nc -m ./nc_metadata.yaml -``` - -Input as a path to a single map. In this case, the output won't be a mapstack. - -```bash -pcr2nc -i /path/to/input/pcr.map -o /path/to/output/out.nc -m ./nc_metadata.yaml -``` - -Input as a _Unix style pathname pattern expansion_. The output will be a mapstack. __Note that in this case the input argument must be contained in double quotes!__ - -```bash -pcr2nc -i "/path/to/input/pcr00*" -o /path/to/output/out.nc -m ./nc_metadata.json -``` - -#### Writing metadata configuration file - -Format of resulting NetCDF4 file is configured into a metadata configuration file. This file can be written in YAML or JSON format. - -An example of a metadata configuration file is the following - -```yaml -variable: - shortname: dis - description: Discharge - longname: discharge - units: m3/s - compression: 9 - least_significant_digit: 2 -source: JRC Space, Security, Migration -reference: JRC Space, Security, Migration -geographical: - datum: WGS84 - variable_x_name: lon -  variable_y_name: lat -time: - calendar: proleptic_gregorian - units: days since 1996-01-01 - -``` - -#### Variable section - -In `variable` section you can configure metadata for the main variable: - -- `shortname`: A short name for the variable -- `longname`: The long name version -- `description`: A description for humans -- `units`: The units of the variable -- `compression`: Optional, integer number between 1 and 9, default 0 (no compression). If present the output nc file will be compressed at this level. -- `least_significant_digit`: Optional, integer number, default 2. From NetCDF4 documentation: - -> If your data only has a certain number of digits of precision -(say for example, it is temperature data that was measured with a precision -of 0.1 degrees), you can dramatically improve zlib compression by quantizing -(or truncating) the data using the least_significant_digit keyword argument -to createVariable. The least significant digit is the power of ten of the -smallest decimal place in the data that is a reliable value. -For example if the data has a precision of 0.1, -then setting least_significant_digit=1 will cause data the data to be -quantized using `numpy.around(scale*data)/scale`, where `scale = 2**bits`, -and bits is determined so that a precision of 0.1 is retained -(in this case bits=4). Effectively, this makes the compression 'lossy' -instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space. - -#### Source and reference - -`source` and `reference` add information for the institution that is providing the NetCDF4 file. - -#### Geographical section - -In the `geographical` section you can configure `datum` and name of the x and y variables. As `variable_x_name` and `variable_y_name` you should use 'lon' and 'lat' for geographical coordinates (e.g. WGS84) and 'x' and 'y' for projected coordinates (e.g. ETRS89). - -Currently, pcr2nc supports the following list for `datum`: - - * `WGS84` - * `ETRS89` - * `GISCO` - -#### Time section - -This section is optional and is only required if the output file is a mapstack (a timeseries of georeferenced 2D arrays) -In this section you have to configure `units` and `calendar`. - -- `units`: Can be one of the following strings (replacing placeholders with the actual date): - - `hours since YYYY-MM-DD HH:MM:SS` - - `days since YYYY-MM-DD` -- `calendar`: A recognized calendar identifier, like `proleptic_gregorian`, `gregorian` etc. - -## nc2pcr - -This tool converts single maps NetCDF (time dimension is not supported yet) to PCRaster format. - -### Usage - -```bash -nc2pcr -i /path/to/input/file.nc -o /path/to/output/out.map [-c /path/to/clone.map optional] -``` - -If input file is a LDD map, you must add the `-l` flag: - -```bash -nc2pcr -i /path/to/input/ldd.nc -o /path/to/output/ldd.map -l [-c /path/to/clone.map optional] -``` - -## cutmaps - -This tool cuts NetCDF files using either a mask, a bounding box, or a list of stations along with a LDD (local drain direction) map. - -### Usage - -The tool requires a series of arguments: - -* The area to be extracted can be defined in one of the following ways: - - `-m`, `--mask`: a mask map (either PCRaster or NetCDF format). - - `-i`, `--cuts_indices`: a bounding box defined by matrix indices in the form `-i imin imax jmin jmax` (the indices must be integers). - - `-c`, `--cuts`: a bounding box defined by coordinates in the form `-c xmin xmax ymin ymax` (the coordinates can be integer or floating point numbers; x = longitude, y = latitude). - - `-N`, `-stations`: a list of stations included in a tab separated text file. This approach requires a LDD (local drain direction) map as an extra input, defined with the argument `-l` (`-ldd`). -* The files to be cut may be defined in one of the following ways: - - `-f`, `--folder`: a folder containing NetCDF files. - - `-F`, `--file`: a single netCDF file to be cut. - - `-S`, `--subdir`: a directory containing a number of folders -* The resulting files will be saved in the folder defined by the argument `-o` ( `--outpath`). - -There are additional optional arguments - -* `-W`, `--overwrite`: it allows to overwrite results. -* `-C`, `--clonemap`: it can be used to define a clone map when the LDD input map (argument `-l`) is in NetCDF format. - -#### Examples - -**Using a mask** - -The following command will cut all NetCDF files inside a specific folder (argument `-f`) using a mask (argument `-m`). The mask is a boolean map (1 only in the area of interes) that `cutmaps` uses to create a bounding box. The resulting files will be written in the current folder (argument `-o`). - -```bash -cutmaps -m /workarea/Madeira/maps/MaskMap/Bacia_madeira.nc -f /workarea/Madeira/lai/ -o ./ -``` - -**Using indices** - -The following command cuts all the maps in an input folder (argument `-f`) using a bounding box defined by matrix indices (argument `-i`). Knowing your area of interest from your NetCDF files, you can determine indices of the array and pass them in the form `-i imin imax jmin jmax` (integer numbers). - -```bash -cutmaps -i "150 350 80 180" -f /workarea/Madeira/lai/ -o ./ -``` - -**Using coordinates** - -The following command cuts all the maps in an input directory containing several folders (argument `-S`) using a bounding box defined by coordinates (argument `-c`). The argument `-W` allows to overwrite pre-existing files in the output directory (argument `-o`): - -```bash -cutmaps -S /home/projects/lisflood-eu -c "4078546.12 4463723.85 811206.57 1587655.50" -o /Work/Tunisia/cutmaps -W -``` - -**Using station coordinates and a local drain direction map** - -The TXT file with stations must have a specific format as in the example below. Each row represents a stations, and it contains three columns separated by tabs that indicated the X and Y coordinates (or lon and lat) and the ID of the station. - -```text -4297500 1572500 1 -4292500 1557500 2 -4237500 1537500 3 -4312500 1482500 4 -4187500 1492500 5 -``` - -The following command will cut all the maps in a specific folder (`-f` argument) given a LDD map (`-l` argument) and the previous text file (`-N` argument), and save the results in a folder defined by the argument `-o`. - -```bash -cutmaps -f /home/projects/lisflood-eu -l ldd.map -N stations.txt -o /Work/Tunisia/cutmaps -``` - -If the LDD is in NetCDF format, it will be first converted into PCRaster format. - -```bash -cutmaps -f /home/projects/lisflood-eu -l ldd.nc -N stations.txt -o /Work/Tunisia/cutmaps -``` - -If you experience problems, you can try to pass a path to a PCRaster clone map using the `-C` argument. - -```bash -cutmaps -f /home/projects/lisflood-eu -l ldd.nc -C area.map -N stations.txt -o /Work/Tunisia/cutmaps -``` - -### Output - -Apart from the cut files in the output folder specified in the command prompt, `cutmaps` produces other outputs in the folder where the LDD map is stored: - -* _mask.map_ and _mask.nc_ for your area of interest, which may be needed in subsequent LISFLOOD/LISVAP executions. -* _outlets.map_ and _outlets.nc_ based on _stations.txt_, which will let you produce gauges TSS if configured in LISFLOOD. - -## compare - -This tool compares two NetCDF datasets. You can configure it with tolerances (absolute `--atol`, relative `--rtol`, thresholds for percentage of tolerated different values `--max-diff-percentage`). You can also set the option `--save-diffs` to write files with the diffences, so that you can inspect maps and differences with tools like [Panoply](https://www.giss.nasa.gov/tools/panoply/). - -```text -usage: compare [-h] -a DATASET_A -b DATASET_B -m MASKAREA [-M SUBMASKAREA] - [-e] [-s] [-D] [-r RTOL] [-t ATOL] [-p MAX_DIFF_PERCENTAGE] - [-l MAX_LARGEDIFF_PERCENTAGE] - -Compare NetCDF outputs: 0.12.12 - -optional arguments: - -h, --help show this help message and exit - -a DATASET_A, --dataset_a DATASET_A - path to dataset version A - -b DATASET_B, --dataset_b DATASET_B - path to dataset version B - -m MASKAREA, --maskarea MASKAREA - path to mask - -e, --array-equal flag to compare files to be identical - -s, --skip-missing flag to skip missing files in comparison - -D, --save-diffs flag to save diffs in NetCDF files for visual - comparisons. Files are saved in ./diffs folder of - current directory.For each file presenting - differences, you will find files diffs, original A and - B (only for timesteps where differences are found). - -r RTOL, --rtol RTOL rtol - -t ATOL, --atol ATOL atol - -p MAX_DIFF_PERCENTAGE, --max-diff-percentage MAX_DIFF_PERCENTAGE - threshold for diffs percentage - -l MAX_LARGEDIFF_PERCENTAGE, --max-largediff-percentage MAX_LARGEDIFF_PERCENTAGE - threshold for large diffs percentage -``` - -## thresholds - -The thresholds tool computes the discharge return period thresholds using the method of L-moments. -It is used to post-process the discharge from the LISFLOOD long term run. -The resulting thresholds can be used in a flood forecasting system to define the flood warning levels. - -### Usage: -The tool takes as input a Netcdf file containing the annual maxima of the discharge signal. LISFLOOD computes time series of discharge values (average value over the selected computational time step). The users are therefore required to compute the annual maxima. As an example, this step can be achieved by using CDO (cdo yearmax), for all the details please refer to [https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-190001.2.5](https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-190001.2.5) - -The output NetCDF file contains the following return period thresholds [1.5, 2, 5, 10, 20, 50, 100, 200, 500], together with the Gumbel parameters (sigma and mu). - -```text -usage: thresholds [-h] [-d DISCHARGE] [-o OUTPUT] - -Utility to compute the discharge return period thresholds using the method of L-moments. -Thresholds computed: [1.5, 2, 5, 10, 20, 50, 100, 200, 500] - -options: - -h, --help show this help message and exit - -d DISCHARGE, --discharge DISCHARGE - Input discharge files (annual maxima) - -o OUTPUT, --output OUTPUT - Output thresholds file -``` - -## water-demand-historic - -This utility allows to create water demand maps at the desired resolution and for the desired geographical areas. The maps indicate, for each pixel, the time-varying water demand map to supply for domestic, livestock, industrial, and thermoelectric water consumption. The temporal discretization is monthly for domestic and energy demand, yearly for industrial and livestock demand. The maps of sectoral water demand are required by the LISFLOOD OS [water use module](https://ec-jrc.github.io/lisflood-model/2_18_stdLISFLOOD_water-use/). Clearly, the sectoral water demand maps and the scripts of this utility can be used also for other applications, as well as for stand-alone analysis of historical water demand for anthropogenic use. - -#### Input -The creation of the sectoral water demand maps requires a template map that defines the desired geographical area and spatial resolution. The generation of the maps relies on a number of external datasets (examples are the [Global Human Settlement - Datasets - European Commission (europa.eu)](https://ghsl.jrc.ec.europa.eu/datasets.php) and [FAO AQUASTAT Dissemination System](https://data.apps.fao.org/aquastat/?lang=en)). The locations of the template map, of the input datasets and files, of the output folder, and other users’ choices (e.g. start year and end year) are specified in a configuration file. The syntax of the configuration file is pre-defined and an example is provided to the users. The complete list of external datasets, the instructions on how to prepare (i) the external dataset, (ii) the template map, (iii) the input folder, (iv) the output folder, and (v) the configuration file are explained into details [here](src/lisfloodutilities/water-demand-historic/README.md) - -#### Output -Four sectoral water demand maps in netCDF-4 format. The geographical extent and the spatial resolution are defined by the template map (users-defined input file). Each netCDF-4 file has 12 months, for each year included in the temporal time span identified by the user. Sectoral water demand data with lower (yearly) temporal resolution are repeated 12 times. - -#### Usage -The methodology includes five main steps. The instructions on how to retrieve the scrips, create the environment including all the required packages, and use the utility are provided [here](src/lisfloodutilities/water-demand-historic/README.md) - -#### Important notes on documentation and data availability -The complete list of external datasets, the instructions on how to retrieve the external datasets, the methodology, and the usage of the scripts are explained into details [here](src/lisfloodutilities/water-demand-historic/README.md). The README file provides detailed technical information about the input datasets and the usage of this utility. The methodology is explained in the manuscript: Choulga, M., Moschini, F., Mazzetti, C., Grimaldi, S., Disperati, J., Beck, H., Salamon, P., and Prudhomme, C.: Technical note: Surface fields for global environmental modelling, EGUsphere, 2023 ([preprint](https://doi.org/10.5194/egusphere-2023-1306)). - -The global sectoral water demand maps at 3 arcmin (or 0.05 degrees) resolution, 1979-2019, produced using the scripts of this utility can be downloaded from [Joint Research Centre Data Catalogue - LISFLOOD static and parameter maps for GloFAS - European Commission (europa.eu)](https://data.jrc.ec.europa.eu/dataset/68050d73-9c06-499c-a441-dc5053cb0c86) - - - -## waterregions - -The modelling of water abstraction for domestic, industrial, energetic, agricoltural and livestock use can require a map of the water regions. The concept of water regions and information for their definition are explained [here](htpst://ec-jrc.github.io/lisflood-model/2_18_stdLISFLOOD_water-use/). -Since groundwater and surface water resources demand and abstraction are spatially distributed inside each water region, each model set-up must include all the pixels of the water region. This requirement is crucial for the succes of the calibration of the model. This utility allows the user to meet this requirement. -More specifically, this utility can be used to: -1. create a water region map which is consistent with a set of calibration points: this purpose is achieved by using the script define_waterregions. -2. verify the consistency between an existing water region map and an exixting map of calibration catchments: this purpose is achieved by using the script verify_waterregions -It is here reminded that when calibrating a catchment which is a subset of a larger computational domain, and the option wateruse is switched on, then the option groudwatersmooth must be switched off. The explanation of this requirement is provided in the chapter [Water use](https://ec-jrc.github.io/lisflood-model/2_18_stdLISFLOOD_water-use/) of the LISFLOOD documentation. - -#### Requirements -python3, pcraster 4.3. The protocol was tested on Linux. - -### define_waterregions -This utility allows to create a water region map which is consistent with a set of calibration points. The protocol was created by Ad De Roo (Unit D2, Joint Research Centre). - -#### Input -- List of the coordinates of the calibration points. This list must be provided in a .txt file with three columns: LONGITUDE(or x), LATITUDE(or y), point ID. -- LDD map can be in NetCDF format or pcraster format. When using pcraster format, the following condition must be satisfied: *PCRASTER_VALUESCALE=VS_LDD*. -- Countries map in NetCDF format or pcraster format. When using pcraster format, the following condition must be satisfied: *PCRASTER_VALUESCALE=VS_NOMINAL*. This map shows the political boundaries of the Countries, each Coutry is identified by using a unique ID. This map is used to ensure that the water regions are not split accross different Countries. -- Map of the initial definition of the water regions in NetCDF format or pcraster format. When using pcraster format, the following condition must be satisfied: *PCRASTER_VALUESCALE=VS_NOMINAL*. This map is used to attribute a water region to areas not included in the calibration catchments. In order to create this map, the user can follow the guidelines provided [here](https://ec-jrc.github.io/lisflood-model/2_18_stdLISFLOOD_water-use/). -- file *.yaml* or *.json* to define the metadata of the output water regions map in NetCDF format. An example of the structure of these files is provided [here](tests/data/waterregions) - -##### Input data provided by this utility: -This utility provides three maps of [Countries IDs](tests/data/waterregions): 1arcmin map of Europe (EFAS computational domain), 0.1 degree and 3arcmin maps of the Globe. ACKNOWLEDGEMENTS: both the rasters were retrieved by upsampling the original of the World Borders Datase provided by http://thematicmapping.org/ (the dataset is available under a Creative Commons Attribution-Share Alike License). - -#### Output -Map of the water regions which is consistent with the calibration catchments. In other words, each water region is entirely included in one calibration catchment. The test to check the consistency between the newly created water regions map and the calibration catchments is implemented internally by the code and the outcome of the test is printed on the screen. -In the output map, each water region is identified by a unique ID. The format of the output map can be NetCDF or pcraster. - -#### Usage -The following command lines allow to produce a water region map which is consistent with the calibration points (only one commad line is required: each one of the command lines below shows a different combination of input files format): - -*python define_waterregions.py -p calib_points_test.txt -l ldd_test.map -C countries_id_test.map -w waterregions_initial_test.map -o my_new_waterregions.map*
- -*python define_waterregions.py -p calib_points_test.txt -l ldd_test.nc -C countries_id_test.nc -w waterregions_initial_test.nc -o my_new_waterregions.nc -m metadata.test.json*
- -*python define_waterregions.py -p calib_points_test.txt -l ldd_test.map -C countries_id_test.nc -w waterregions_initial_test.map -o my_new_waterregions.nc -m metadata.test.yaml*
- - -The input maps can be in nectdf format or pcraster format (the same command line can accept a mix of pcraster and NetCDF formats).It is imperative to write the file name in full, that is including the extension (which can be either ".nc" or ".map").
-The utility can return either a pcraster file or a NetCDF file. The users select their preferred format by specifying the extension of the file in the output option (i.e. either ".nc" or ".map").
-The metadata file in .yaml format must be provided only if the output file is in NetCDF format.
- -The code internally verifies that the each one of the newly created water regions is entirely included within one calibration catchments. If this condition is satisfied, the follwing message in printed out: *“OK! Each water region is completely included inside one calibration catchment”*. If the condition is not satisfied, the error message is *“ERROR: The water regions WR are included in more than one calibration catchment”*. Moreover, the code provides the list of the water regions WR and the calibration catchments that do not meet the requirment. This error highlight a problem in the input data: the user is recommended to check (and correct) the list of calibration points and the input maps. - -The input and output arguments are listed below. - - -```text -usage: define_waterregions.py [-h] -p CALIB_POINTS -l LDD -C COUNTRIES_ID -w - WATERREGIONS_INITIAL -o OUTPUT_WR - -Define Water Regions consistent with calibration points: {} - -optional arguments: - -h, --help show this help message and exit - -p CALIB_POINTS, --calib_points CALIB_POINTS - list of calibration points: lon or x, lat or y, point id. File extension: .txt, - -l LDD, --ldd LDD LDD map, file extension: .nc or .map - -C COUNTRIES_ID, --countries_id COUNTRIES_ID - map of Countries ID, fike extension .nc or .map - -w WATERREGIONS_INITIAL, --waterregions_initial WATERREGIONS_INITIAL - initial map of water regions, file extension: .nc or .map - -o OUTPUT_WR, --output_wr OUTPUT_WR - output map of water regions, file extension: .nc or .map - -m METADATA, --metadata_file METADATA - Path to metadata file for NetCDF, .yaml or .json format -``` - - - -### verify_waterregions - -This function allows to verify the consistency between a water region map and a map of calibration catchments. This function must be used when the water region map and the map of calibration catchments have been defined in an independent manner (i.e. not using the utility **define_waterregions**). The function verify_waterregions verifies that each water region map is entirely included in one calibration catchment. If this condition is not satisfied, an error message is printed on the screen. - -#### Input -- Map of calibration catchments in NetCDF format. -- Water regions map in NetCDF format. - -#### Output -The output is a message on the screen. There are two options: -- 'OK! Each water region is completely included inside one calibration catchment.' -- 'ERROR: The water regions WR are included in more than one calibration catchment’: this message is followed by the list of the water regions and of the catchment that raised the isuue. -In case of error message, the user can implement the function **define_waterregions**. - -#### Usage -The following command line allows to produce a water region map which is consistent with the calibration points: - -*python verify_waterregions.py -cc calib_catchments_test.nc -wr waterregions_test.nc* - -The input and output arguments are listed below. All the inputs are required. - -```text -usage: verify_waterregions.py [-h] -cc CALIB_CATCHMENTS -wr WATERREGIONS - -Verify that the Water Regions map is consistent with the map of the -calibration catchments - -optional arguments: - -h, --help show this help message and exit - -cc CALIB_CATCHMENTS, --calib_catchments CALIB_CATCHMENTS - map of calibration catchments, NetCDF format - -wr WATERREGIONS, --waterregions WATERREGIONS - map of water regions, NetCDF format -``` - -NOTE: -The utility **pcr2nc** can be used to convert a map in pcraster format into NetCDF format. - - -## gridding - -This tool is used to interpolate meteo variables observations stored in text files containing (lon, lat, value) into grids. -It uses inverse distance interpolation method from pyg2p. - -#### Requirements -python3, pyg2p - -### Usage - -> __Note:__ This guide assumes you have installed the program with pip tool. -> If you cloned the source code instead, just substitute the executable `gridding` with `python bin/gridding` that is in the root folder of the cloned project. - -The tool requires four mandatory command line input arguments: - -- -i, --in: Set input folder path with kiwis/point files -- -o, --out: Set output folder base path for the tiff files or the NetCDF file path. -- -c, --conf: Set the grid configuration type to use. Right now only 5x5km, 1arcmin are available. -- -v, --var: Set the variable to be processed. Right now only variables pr,pd,tn,tx,ws,rg,pr6,ta6 are available. - -The input folder must contain the meteo observation in text files with file name format: \YYYYMMDDHHMI_YYYYMMDDHHMISS.txt -The files must contain the columns longitude, latitude, observation_value is separated by TAB and without the header. -Not mandatory but could help to store the files in a folder structure like: ./YYYY/MM/DD/\YYYYMMDDHHMI_YYYYMMDDHHMISS.txt - -Example of command that will generate a NetCDF file containing the precipitation (pr) grids for March 2023: - -```bash -gridding -i /meteo/pr/2023/ -o /meteo/pr/pr_MARCH_2023.nc -c 1arcmin -v pr -s 202303010600 -e 202304010600 -``` - -The input and output arguments are listed below and can be seen by using the help flag. - -```bash -gridding --help -``` - -```text -usage: gridding [-h] -i input_folder -o {output_folder, NetCDF_file} -c - {5x5km, 1arcmin,...} -v {pr,pd,tn,tx,ws,rg,...} - [-d files2process.txt] [-s YYYYMMDDHHMISS] [-e YYYYMMDDHHMISS] - [-q] [-t] [-f] - -version v0.1 ($Mar 28, 2023 16:01:00$) This script interpolates meteo input -variables data into either a single NETCDF4 file or one GEOTIFF file per -timestep. The resulting NetCDF is CF-1.6 compliant. - -optional arguments: - -h, --help show this help message and exit - -i input_folder, --in input_folder - Set input folder path with kiwis/point files - -o {output_folder, NetCDF_file}, --out {output_folder, NetCDF_file} - Set output folder base path for the tiff files or the - NetCDF file path. - -c {5x5km, 1arcmin,...}, --conf {5x5km, 1arcmin,...} - Set the grid configuration type to use. - -v {pr,pd,tn,tx,ws,rg,...}, --var {pr,pd,tn,tx,ws,rg,...} - Set the variable to be processed. - -d files2process.txt, --dates files2process.txt - Set file containing a list of filenames to be - processed in the form of - YYYYMMDDHHMI_YYYYMMDDHHMISS.txt - -s YYYYMMDDHHMISS, --start YYYYMMDDHHMISS - Set the start date and time from which data is - imported [default: date defining the time units inside - the config file] - -e YYYYMMDDHHMISS, --end YYYYMMDDHHMISS - Set the end date and time until which data is imported - [default: 20230421060000] - -q, --quiet Set script output into quiet mode [default: False] - -t, --tiff Outputs a tiff file per timestep instead of the - default single NetCDF [default: False] - -f, --force Force write to existing file. TIFF files will be - overwritten and NetCDF file will be appended. - [default: False] -``` - - -## cddmap - -This tool is used to generate correlation decay distance (CDD) maps starting from station timeseries - -#### Requirements -python3, pyg2p - -### Usage - -cddmap [directory]/[--analyze]/[--merge-and-filter-jsons]/--generatemap] [--start first_station] [--end last_station] [--parallel] [--only-extract-timeseries timeseries_keys_file] [--maxdistance max_distance_in_km] - -The tool requires an input argument indicating the station timeseries main folder, and calculates the CDD for each stations as well as correlations and distances files. Outputs the results in a txt file containing station coordinates and CDD values. -After creating the CDD txt file, it can be used with one of the following commands: - -- --analyze: read cdd file previously created for postprocessing -- --merge-and-filter-jsons: merge all cdd files in a folder and filters out a list of stations. -- --generatemap: generate a NetCDF CDD map file using CDD txt file and angular distance weighted interpolation between station points -- --start and --end arguments are used to split the task in many sub tesks, evaluating only the stations between "start" and "end", since the CDD evaluation can be very time-demanding. -- --only-extract-timeseries: in combination with path of the station's main folder, extracts the timeseries specified in the timeseries_keys_file txt list of keys -- --parallel: enable CDD evaluation in parallel on multiple cores. It will require more memory -- --maxdistance: evaluates only station that are clores then maxdistance in km - -The input folder must contain the meteo observation in text files - -Example of command that will generate txt files for the CDD of precipitation (pr), in parallel mode, for station that are closer then 500 kms: - -```bash -cddmap /meteo/pr --parallel --maxdistance 500 -``` - - -## ncextract - -The `ncextract` tool extracts time series from (multiple) NetCDF or GRIB file(s) at user defined coordinates. - -### Usage - -The tool takes as input a CSV file containing point coordinates and a directory containing one or more NetCDF or GRIB files. The CSV files must contain only three columns: point identifier, and its two coordinates. The name of the coordinate fields must match those in the NetCDF or GRIB files. For example: - -```text -ID,lat,lon -0010,40.6083,-4.2250 -0025,37.5250,-6.2750 -0033,40.5257,-6.4753 -``` - -The output is a file containing the time series at the pixels corresponding to the provided coordinates, in chronological order. The function supports two otput formats: CSV or NetCDF. - -```text -usage: ncextract.py [-h] -i INPUT -d DIRECTORY -o OUTPUT [-nc] - -Utility to extract time series of values from (multiple) NetCDF files at specific coordinates. -Coordinates of points of interest must be included in a CSV file with at least 3 columns named id, -lat, lon. - -options: - -h, --help show this help message and exit - -i INPUT, --input INPUT - Input CSV file (id, lat, lon) - -d DIRECTORY, --directory DIRECTORY - Input directory with .nc files - -o OUTPUT, --output OUTPUT - Output file. Two extensions are supported: .csv or .nc -``` - -#### Use in the command prompt - -The following command extracts the discharge time series from EFAS simulations (NetCDF files in the directory _EFAS5/discharge/maps_) in a series of points where gauging stations are located (file _gauging_stations.csv_), and saves the extraction as a CSV file. - -```bash -ncextract -i ./gauging_stations.csv -d ./EFAS5/discharge/maps/ -o ./EFAS5/discharge/timeseries/results_ncextract.csv -``` - -#### Use programmatically - -The function can be imported in a Python script. It takes as inputs two `xarray.Dataset`: one defining the input maps and the other the points of interest. The result of the extraction can be another `xarray.Dataset`, or saved as a file either in CSV or NetCDF format. - -```Python -from lisfloodutilities.ncextract import extract_timeseries - -# load desired input maps and points -# maps: xarray.Dataset -# points: xarray.Dataset - -# extract time series and save in a xarray.Dataset -ds = extract_timeseries(maps, points, output=None) -``` - - -## catchstats - -The `catchstats` tool calculates catchment statistics given a set of input NetCDF files and a set of mask NetCDF files. - -### Usage - -#### In the command prompt - -The tool takes as input a directory containing the NetCDF files from which the statistics will be computed, and another directory containing the NetCDF files that define the catchment boundaries, which can be any of the outputs of `cutmaps` (not necessarily the file _my_mask.nc_). The input files can be the LISFLOOD static maps (no temporal dimension) or stacks of maps with a temporal dimension. The mask NetCDF files must be named after the catchment ID, as this name will be used to identify the catchment in the output NetCDF. For instance, _0142.nc_ would correspond to the mask of catchment 142. Optionally, an extra NetCDF file can be passed to the tool to account for different pixel area; in this case, the statistics will be weighted by this pixel area map. - -Only some statistics are currently available: mean, sum, std (standard deviation), var (variance), min, max, median and count. The weighing based on pixel area does not affect the statistics min, max, median nor count. - -The output are NetCDF files (as many as catchments in the mask directory) containing the resulting statistics. - -```text -usage: catchstats.py [-h] -i INPUT -m MASK -s STATISTIC -o OUTPUT -a AREA [-W] - -Utility to compute catchment statistics from (multiple) NetCDF files. -The mask map is a NetCDF file with values in the area of interest and NaN elsewhere. -The area map is optional and accounts for varying pixel area with latitude. - -options: - -h, --help - show this help message and exit - -i INPUT, --input INPUT - directory containing the input NetCDF files - -m MASK, --mask MASK - directory containing the mask NetCDF files - -s STATISTIC, --statistic STATISTIC - list of statistics to be computed. Possible values: mean, sum, std, var, min, max, median, count - -o OUTPUT, --output OUTPUT - directory where the output NetCDF files will be saved - -a AREA, --area AREA - NetCDF file of pixel area used to weigh the statistics - -W, --overwrite - overwrite existing output files -``` - -**Example** - -The following command calculates the average and total precipitation for a set of catchemtns from the dataset EMO-1. The static map _pixarea.nc_ is used to account for varying pixel area. - -```bash -catchstats -i ./EMO1/pr/ -m ./masks/ -s mean sum -o ./areal_precipitation/ -a ./EFAS5/static_maps/pixarea.nc -``` - -#### In a Python script - -The tool can be imported in a Python script to be able to save in memory the output. This function takes in a `xarray.Dataset` with the input maps from which statistics will be computed, a dictionary of `xarray.DataArray` with the catchment masks, and optionally the weighing map. By default, the result is a `xarray.Dataset`, but NetCDF files could be written, instead, if a directory is provided in the `output` attribute. - -```Python -# import function -from lisfloodutilities.catchstats import catchment_statistics - -# load desired input maps and catchment masks -# maps: xarray.Dataset -# masks: Dict[int, xarray.DataArray] - -# compute statistics and save in a xarray.Dataset -ds = catchment_statistics(maps, masks, statistic=['mean', 'sum'], weight=None, output=None) -``` - -### Ouput - -The structure of the output depends on whether the input files include a temporal dimension or not: - -* If the input files DO NOT have a time dimension, the output has a single dimension: the catchment ID. It contains as many variables as the combinations of input variables and statistics. For instance, if the input variables are "elevation" and "gradient" and three statistics are required ("mean", "max", "min"), the output will contain 6 variables: "elevation_mean", "elevation_max", "elevation_min", "gradient_mean", "gradient_max" and "gradient_min". -* If the input files DO have a time dimension, the output has two dimensions: the catchment ID and time. The number of variables follows the same structure explained in the previous point. For instance, if the input files are daily maps of precipitation (variable name "pr") and we calculate the mean and total precipitation over the catchment, the output will contain two dimensions ("ID", "time") and two variables ("pr_mean", "pr_sum"). - ## Using `lisfloodutilities` programmatically -You can use lisflood utilities in your python programs. As an example, the script below creates the mask map for a set of stations (stations.txt). The mask map is a boolean map with 1 and 0. 1 is used for all (and only) the pixels hydrologically connected to one of the stations. The resulting mask map is in pcraster format. +You can use lisflood utilities in your Python programs. As an example, the script below creates the mask map for a set of stations (_stations.txt_). The mask map is a boolean map with 1 and 0. 1 is used for all (and only) the pixels hydrologically connected to one of the stations. The resulting mask map is in PCRaster format. ```python from lisfloodutilities.cutmaps.cutlib import mask_from_ldd @@ -770,6 +88,4 @@ ldd_pcr = convert(ldd, clonemap, 'tests/data/cutmaps/ldd_eu_test.map', is_ldd=Tr mask, outlets_nc, maskmap_nc = mask_from_ldd(ldd_pcr, stations) mask_map = PCRasterMap(mask) print(mask_map.data) -``` - - +``` \ No newline at end of file diff --git a/bin/lfcoords b/bin/lfcoords new file mode 100644 index 0000000..126cf6b --- /dev/null +++ b/bin/lfcoords @@ -0,0 +1,14 @@ +#!python + +import os +import sys + +current_dir = os.path.dirname(os.path.abspath(__file__)) +src_dir = os.path.normpath(os.path.join(current_dir, '../src/')) +if os.path.exists(src_dir): + sys.path.append(src_dir) + +from lisfloodutilities.lfcoords.lfcoords import main_script + +if __name__ == '__main__': + main_script() diff --git a/requirements.txt b/requirements.txt index 4365cba..a83fa53 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,7 +7,8 @@ coverage>=6.0 dask>=2022.2.0 eccodes-python>=0.9.7 exceptiongroup>=1.1.1 -fsspec>=2023.1.0 +fsspec>=2023.1.0f +geopandas>=0.12.2 geopy>=2.4.1 importlib-metadata<5.0.0 iniconfig>=2.0.0 @@ -22,6 +23,7 @@ pandas>=1.3.5 partd>=1.4.0 pluggy>=1.0.0 pycparser>=2.21 +pyflwdir>=0.5.8 pyg2p>=3.2.7 pytest>=7.3.1 pytest-cov>=4.0.0 @@ -31,11 +33,13 @@ python-dateutil>=2.8.2 pytz>=2023.3 PyYAML>=6.0 rasterio>=1.2.10 +rioxarray>=0.13.3 scipy>=1.7.3 six>=1.16.0 tomli>=2.0.1 toolz>=0.12.0 +tqdm>=4.65.0 typing_extensions>=4.5.0 ujson>=2.0.2 -xarray>=0.20.2 -zipp>=3.15.0 \ No newline at end of file +xarray>=0.20.1 +zipp>=3.15.0 diff --git a/setup.py b/setup.py index d669cb3..0eb2fdb 100644 --- a/setup.py +++ b/setup.py @@ -124,14 +124,15 @@ def run(self): version=version, packages=find_packages('src'), description='A set of utilities for lisflood users. ' - 'pcr2nc: Convert PCRaster files to netCDF CF 1.6; ' - 'nc2pcr: Convert netCDF files ot PCRaster format; ' - 'cutmaps: cut netCDF files; ' + 'catchstats: calculates catchment statistics; ' 'compare: compare two set of netCDF files; ' - 'thresholds: compute discharge return period thresholds; ' + 'cutmaps: cut netCDF files; ' 'gridding: interpolate meteo variables observations; ' + 'lfcoords: finds coordinates in the LISFLOOD grid' + 'nc2pcr: Convert netCDF files ot PCRaster format; ' 'ncextract: extract values from netCDF files; ' - 'catchstats: calculates catchment statistics; ', + 'pcr2nc: Convert PCRaster files to netCDF CF 1.6; ' + 'thresholds: compute discharge return period thresholds; ', long_description=long_description, long_description_content_type='text/markdown', setup_requires=[ @@ -143,12 +144,12 @@ def run(self): # 'GDAL=={}'.format(gdal_version), 'netCDF4>=1.5.3', 'toolz', 'xarray>=0.15.1', 'dask', 'pandas>=0.25.1', 'nine', 'pyg2p'], - author="Valerio Lorini, Stefania Grimaldi, Carlo Russo, Domenico Nappo, Lorenzo Alfieri", - author_email="valerio.lorini@ec.europa.eu,stefania.grimaldi@ec.europa.eu,carlo.russo@ext.ec.europa.eu,domenico.nappo@gmail.com,lorenzo.alfieri@ec.europa.eu", + author="Valerio Lorini, Stefania Grimaldi, Carlo Russo, Domenico Nappo, Lorenzo Alfieri, Jesús Casado Rodríguez", + author_email="valerio.lorini@ec.europa.eu,stefania.grimaldi@ec.europa.eu,carlo.russo@ext.ec.europa.eu,domenico.nappo@gmail.com,lorenzo.alfieri@ec.europa.eu,jesus.casado-rodriguez@ec.europa.eu", keywords=['netCDF4', 'PCRaster', 'mapstack', 'lisflood', 'efas', 'glofas', 'ecmwf', 'copernicus'], license='EUPL 1.2', url='https://github.com/ec-jrc/lisflood-utilities', - scripts=['bin/pcr2nc', 'bin/cutmaps', 'bin/compare', 'bin/nc2pcr', 'bin/thresholds', 'bin/gridding', 'bin/cddmap', 'bin/ncextract','bin/catchstats',], + scripts=['bin/pcr2nc', 'bin/cutmaps', 'bin/compare', 'bin/nc2pcr', 'bin/thresholds', 'bin/gridding', 'bin/cddmap', 'bin/ncextract','bin/catchstats','bin/lfcoords'], zip_safe=True, classifiers=[ # complete classifier list: http://pypi.python.org/pypi?%3Aaction=list_classifiers diff --git a/src/lisfloodutilities/lfcoords/__init__.py b/src/lisfloodutilities/lfcoords/__init__.py new file mode 100644 index 0000000..ed98b2d --- /dev/null +++ b/src/lisfloodutilities/lfcoords/__init__.py @@ -0,0 +1,204 @@ +import os +import logging +import yaml +from pathlib import Path +from typing import Dict +import numpy as np +import pandas as pd +import geopandas as gpd +import xarray as xr +import rioxarray as rxr + +# set logger +logger = logging.getLogger(__name__) + +# Use a module-level constant for the environment variable setting +os.environ['USE_PYGEOS'] = '0' + +class Config: + """ + Manages the application's configuration by reading a YAML file + and setting default values. + """ + + def __init__(self, config_file: Path): + """ + Reads the configuration from a YAML file and sets default values if not provided. + + Parameters: + ----------- + config_file: string or pathlib.Path + The path to the YAML configuration file. + """ + + # read configuration file + with open(config_file, 'r', encoding='utf8') as ymlfile: + config = yaml.load(ymlfile, Loader=yaml.FullLoader) + + # input file paths + self.points = Path(config['input']['points']) + self.ldd_fine = Path(config['input']['ldd_fine']) + self.upstream_fine = Path(config['input']['upstream_fine']) + self.ldd_coarse = Path(config['input']['ldd_coarse']) + self.upstream_coarse = Path(config['input']['upstream_coarse']) + + # resolutions + self.fine_resolution = None + self.coarse_resolution = None + + # output folder + self.output_folder = Path(config.get('output_folder', './shapefiles')) + self.output_folder.mkdir(parents=True, exist_ok=True) + + # conditions + self.min_area = config['conditions'].get('min_area', 10) + self.abs_error = config['conditions'].get('abs_error', 50) + self.pct_error = config['conditions'].get('pct_error', 1) + + def update_config( + self, + fine_grid: xr.DataArray, + coarse_grid: xr.DataArray + ): + """ + Extracts the resolution from the finer and coarser grids and updates the configuration object. + + Parameters: + ----------- + fine_grid: xarray.DataArray + Any map in the fine grid + coarse_grid: xarray.DataArray + Any map in the coarse grid + """ + + # resolution of the finer grid + cellsize = np.mean(np.diff(fine_grid.x)) # degrees + cellsize_arcsec = int(np.round(cellsize * 3600, 0)) # arcsec + logger.info(f'The resolution of the finer grid is {cellsize_arcsec} arcseconds') + self.fine_resolution = f'{cellsize_arcsec}sec' + + # resolution of the input maps + cellsize = np.round(np.mean(np.diff(coarse_grid.x)), 6) # degrees + cellsize_arcmin = int(np.round(cellsize * 60, 0)) # arcmin + logger.info(f'The resolution of the coarser grid is {cellsize_arcmin} arcminutes') + self.coarse_resolution = f'{cellsize_arcmin}min' + + +def read_input_files(cfg: Config) -> Dict: + """ + Reads input files, updates the Config object, and returns a dictionary + of the loaded data. + + Parameters: + ----------- + cfg: Config + Configuration object containing file paths and parameters. + + Returns: + -------- + Dict + A dictionary containing the loaded data: + * 'points': geopandas.GeoDataFrame of input points + * 'ldd_fine': xarray.DataArray of local drainage directions in the fine grid + * 'upstream_fine': xarray.DataArray of upstream area (km2) in the fine grid + * 'ldd_coarse': xarray.DataArray of local drainage directions in the coarse grid + * 'upstream_coarse': xarray.DataArray of upstream area (m2) in the coarse grid + """ + + # a helper function to reduce code repetition + def open_raster(path): + """Helper to open and squeeze a raster file.""" + return rxr.open_rasterio(path).squeeze(dim='band') + + # read upstream map with fine resolution + upstream_fine = rxr.open_rasterio(cfg.upstream_fine).squeeze(dim='band') + logger.info(f'Map of upstream area in the finer grid corretly read: {cfg.upstream_fine}') + + # read local drainage direction map + ldd_fine = rxr.open_rasterio(cfg.ldd_fine).squeeze(dim='band') + logger.info(f'Map of local drainage directions in the finer grid corretly read: {cfg.ldd_fine}') + + # read upstream area map of coarse grid + upstream_coarse = rxr.open_rasterio(cfg.upstream_coarse).squeeze(dim='band') + logger.info(f'Map of upstream area in the coarser grid corretly read: {cfg.upstream_coarse}') + + # read local drainage direction map + ldd_coarse = rxr.open_rasterio(cfg.ldd_coarse).squeeze(dim='band') + logger.info(f'Map of local drainage directions in the coarser grid correctly read: {cfg.ldd_coarse}') + + # read points text file + points = pd.read_csv(cfg.points, index_col='ID') + points.columns = points.columns.str.lower() + logger.info(f'Table of points correctly read: {cfg.points}') + points = check_points(cfg, points, ldd_fine) + + # convert to geopandas and export as shapefile + points = gpd.GeoDataFrame( + points, + geometry=gpd.points_from_xy(points['lon'], points['lat']), + crs=ldd_coarse.rio.crs + ) + point_shp = cfg.output_folder / f'{cfg.points.stem}.shp' + points.to_file(point_shp) + logger.info(f'The original points table has been exported to: {point_shp}') + + inputs = { + 'points': points, + 'ldd_fine': ldd_fine, + 'upstream_fine': upstream_fine, + 'ldd_coarse': ldd_coarse, + 'upstream_coarse': upstream_coarse, + } + + # update Config + cfg.update_config(ldd_fine, ldd_coarse) + + return inputs + + +def check_points( + cfg: Config, + points: pd.DataFrame, + ldd: xr.DataArray +) -> pd.DataFrame: + """ + Removes input points that have missing values, a small catchment area, + or are outside the map extent. + + Parameters: + ----------- + cfg: Config + Configuration object. + points: pandas.DataFrame + Table of input points with fields 'lat', 'lon' and 'area' (km2) + ldd: xarray.DataArray + Map of local drainage directions + + Returns: + -------- + pandas.DataFrame + The input table with points with conflicts removed. + """ + + # remove points with missing values + mask_nan = points.isnull().any(axis=1) + if mask_nan.sum() > 0: + points = points[~mask_nan] + logger.warning(f'{mask_nan.sum()} points were removed because of missing values.') + + # remove points with small catchment area + mask_area = points['area'] < cfg.min_area + if mask_area.sum() > 0: + points = points[~mask_area] + logger.info(f'{mask_area.sum()} points were removed due to their small catchment area.') + + # remove points outside the input LDD map + lon_min, lat_min, lon_max, lat_max = np.round(ldd.rio.bounds(), 6) + mask_lon = (points.lon < lon_min) | (points.lon > lon_max) + mask_lat = (points.lat < lat_min) | (points.lat > lat_max) + mask_extent = mask_lon | mask_lat + if mask_extent.sum() > 0: + points = points[~mask_extent] + logger.info(f'{mask_extent.sum()} points were removed because they are outside the input LDD map.') + + return points \ No newline at end of file diff --git a/src/lisfloodutilities/lfcoords/coarser_grid.py b/src/lisfloodutilities/lfcoords/coarser_grid.py new file mode 100644 index 0000000..501fa4c --- /dev/null +++ b/src/lisfloodutilities/lfcoords/coarser_grid.py @@ -0,0 +1,206 @@ +import os +import logging +from pathlib import Path +from typing import Optional, Union, Tuple +import warnings + +import numpy as np +import pandas as pd +import geopandas as gpd +import xarray as xr +import pyflwdir +from tqdm import tqdm + +from lisfloodpreprocessing import Config +from lisfloodpreprocessing.utils import catchment_polygon + +os.environ['USE_PYGEOS'] = '0' +warnings.filterwarnings("ignore") + +# set logger +logger = logging.getLogger(__name__) + + +def coordinates_coarse( + cfg: Config, + points_fine: Union[pd.DataFrame, gpd.GeoDataFrame], + polygons_fine: gpd.GeoDataFrame, + ldd_coarse: xr.DataArray, + upstream_coarse: xr.DataArray, + save: bool = False +) -> Optional[Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]]: + """ + Transforms point coordinates from a high-resolution grid to a corresponding + location in a coarser grid, aiming to match the shape of the catchment + area. It updates station coordinates and exports catchments as shapefiles. + + Parameters + ---------- + cfg : Config + Configuration object with file paths and parameters. + points_fine : pd.DataFrame or gpd.GeoDataFrame + Table with updated station coordinates and upstream areas from a finer grid. + polygons_fine : gpd.GeoDataFrame + Table with the catchment polygons from a finer grid. + ldd_coarse : xr.DataArray + Map of local drainage directions in the coarse grid. + upstream_coarse : xr.DataArray + Map of upstream area (m2) in the coarse grid. + save : bool, optional + If True, the updated tables are exported as shapefiles. + + Returns + ------- + Optional[Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]] + A tuple containing: + - A table with updated station coordinates in the coarser grid. + - A table with the catchment polygons in the coarser grid. + """ + + points_coarse = points_fine.copy() + n_points = points_coarse.shape[0] + + # create river network + fdir_coarse = pyflwdir.from_array( + ldd_coarse.data, + ftype='ldd', + transform=ldd_coarse.rio.transform(), + check_ftype=False, + latlon=True + ) + + # get resolution of the coarse grid + cellsize = np.round(np.mean(np.diff(ldd_coarse.x)), 6) # degrees + + # extract resolution of the finer grid + cols = ['area', 'lat', 'lon'] + cols_fine = [f'{col}_{cfg.fine_resolution}' for col in cols] + + # add new columns to 'points_coarse' + cols_coarse = [f'{col}_{cfg.coarse_resolution}' for col in cols] + points_coarse[cols_coarse] = np.nan + + # search range of 5x5 array + range_xy = np.linspace(-2, 2, 5) * cellsize # arcmin + polygons_coarse = [] + for point_id, attrs in tqdm(points_coarse.iterrows(), total=n_points, desc='points'): + try: + # real upstream area + area_ref, lat_ref, lon_ref = attrs[cols] + + # coordinates and upstream area in the fine grid + area_fine, lat_fine, lon_fine = attrs[cols_fine] + + # find ratio + logger.debug('Start search') + inter_vs_union, area_ratio, area_lisf = [], [], [] + for delta_lat in range_xy: + for delta_lon in range_xy: + lon = lon_fine + delta_lon + lat = lat_fine + delta_lat + basin = catchment_polygon( + fdir_coarse.basins(xy=(lon, lat)).astype(np.int32), + transform=ldd_coarse.rio.transform(), + crs=ldd_coarse.rio.crs + ) + + # calculate union and intersection of shapes + intersection = gpd.overlay(polygons_fine.loc[[point_id]], basin, how='intersection') + union = gpd.overlay(polygons_fine.loc[[point_id]], basin, how='union') + inter_vs_union.append(intersection.area.sum() / union.area.sum()) + + # get upstream area (km2) of coarse grid (LISFLOOD) + area = upstream_coarse.sel(x=lon, y=lat, method='nearest').item() * 1e-6 + area_lisf.append(area) + + # ratio between reference and coarse area + if area_ref == 0 or area == 0: + ratio = 0 + else: + ratio = area_ref / area if area_ref < area else area / area_ref + area_ratio.append(ratio) + logger.debug('End search') + + # maximum of shape similarity and upstream area accordance + i_shape = np.argmax(inter_vs_union) + area_shape = area_lisf[i_shape] + i_centre = int(len(range_xy)**2 / 2) # middle point + area_centre = area_lisf[i_centre] + + # use middle point if errors are small + abs_error = abs(area_shape - area_centre) + pct_error = 100 * abs(1 - area_centre / area_shape) + if (abs_error <= cfg.abs_error) and (pct_error <= cfg.pct_error): + i_shape = i_centre + area_shape = area_centre + + # coordinates in the fine resolution + i = i_shape // len(range_xy) + j = i_shape % len(range_xy) + lat = lat_fine + range_xy[i] + lon = lon_fine + range_xy[j] + + # coordinates and upstream area on coarse resolution + area = upstream_coarse.sel(x=lon, y=lat, method='nearest') + area_coarse = area.item() * 1e-6 + lon_coarse = area.x.item() + lat_coarse = area.y.item() + + # derive catchment polygon from the selected coordinates + basin_coarse = catchment_polygon( + fdir_coarse.basins(xy=(lon_coarse, lat_coarse)).astype(np.int32), + transform=ldd_coarse.rio.transform(), + crs=ldd_coarse.rio.crs, + name='ID' + ) + basin_coarse['ID'] = point_id + basin_coarse.set_index('ID', inplace=True) + basin_coarse[cols] = area_ref, lat_ref, lon_ref + basin_coarse[cols_fine] = area_fine, lat_fine, lon_fine + basin_coarse[cols_coarse] = area_coarse, lat_coarse, lon_coarse + + # save polygon + polygons_coarse.append(basin_coarse) + + # update new columns in 'points_coarse' + points_coarse.loc[point_id, cols_coarse] = [int(area_coarse), round(lat_coarse, 6), round(lon_coarse, 6)] + + # logger.info(f'Point {point_id} located in the coarser grid') + except Exception as e: + logger.error(f'Point {point_id} could not be located in the coarser grid: {e}') + + # handle case where no polygons were generated + if not polygons_coarse: + logger.warning('No points could be located in the finer grid. Returning empty dataframes.') + return gpd.GeoDataFrame(), gpd.GeoDataFrame() + + # concatenate polygons shapefile + polygons_coarse = pd.concat(polygons_coarse) + + # convert points to geopandas + points_coarse = gpd.GeoDataFrame( + points_coarse, + geometry=gpd.points_from_xy( + points_coarse[f'lon_{cfg.coarse_resolution}'], + points_coarse[f'lat_{cfg.coarse_resolution}'] + ), + crs=ldd_coarse.rio.crs + ) + points_coarse.sort_index(axis=1, inplace=True) + + # compute error + points_coarse['abs_error'] = abs(points_coarse[f'area_{cfg.coarse_resolution}'] - points_coarse['area']) + points_coarse['pct_error'] = points_coarse.abs_error / points_coarse['area'] * 100 + + if save: + # polygons + polygon_shp = cfg.output_folder / f'catchments_{cfg.coarse_resolution}.shp' + polygons_coarse.to_file(polygon_shp) + logger.info(f'Catchments in the coarser grid have been exported to: {polygon_shp}') + + # points + point_shp = cfg.output_folder / f'{cfg.points.stem}_{cfg.coarse_resolution}.shp' + points_coarse.to_file(point_shp) + logger.info(f'The updated points table in the coarser grid has been exported to: {point_shp}') + + return points_coarse, polygons_coarse diff --git a/src/lisfloodutilities/lfcoords/config.yml b/src/lisfloodutilities/lfcoords/config.yml new file mode 100644 index 0000000..9ceb17a --- /dev/null +++ b/src/lisfloodutilities/lfcoords/config.yml @@ -0,0 +1,13 @@ +input: + points: # CSV file defining four point attributes: 'ID', 'lat', 'lon', 'area' in km2 + ldd_fine: # TIFF or NetCDF file of the local direction drainage in the high resolution grid + upstream_fine: # TIFF or NetCDF file of the upstream area (km2) in the high resolution grid + ldd_coarse: # TIFF or NetCDF file of the local direction drainage in the low resolution grid + upstream_coarse: # TIFF or NetCDF file of the upstream area (m2) in the low resolution grid + +output_folder: # folder where catchment shapefiles will be saved. By default, './shapefiles/' + +conditions: + min_area: # minimum catchment area (km2) to consider a station. By default, 10 km2 + abs_error: # maximum absolute error (km2) allowed between the fine and coarse resolution catchments. By default, 50 km2 + pct_error: # maximum percentage error (%) allowed between the fine and coarse resolution catchments. By default, 1% \ No newline at end of file diff --git a/src/lisfloodutilities/lfcoords/finer_grid.py b/src/lisfloodutilities/lfcoords/finer_grid.py new file mode 100644 index 0000000..1e10257 --- /dev/null +++ b/src/lisfloodutilities/lfcoords/finer_grid.py @@ -0,0 +1,146 @@ +import os +import logging +from pathlib import Path +from typing import Optional, Union, Tuple +import warnings + +import numpy as np +import pandas as pd +import geopandas as gpd +import xarray as xr +import pyflwdir +from tqdm import tqdm + +from lisfloodpreprocessing import Config +from lisfloodpreprocessing.utils import find_pixel, catchment_polygon + +os.environ['USE_PYGEOS'] = '0' +warnings.filterwarnings("ignore") + +# set logger +logger = logging.getLogger(__name__) + + +def coordinates_fine( + cfg: Config, + points: pd.DataFrame, + ldd_fine: xr.DataArray, + upstream_fine: xr.DataArray, + save: bool = False +) -> Optional[Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]]: + """ + Processes point coordinates to find the most accurate pixel in a high-resolution + map, based on a reference value of catchment area. It updates the station + coordinates and exports the catchment areas as shapefiles. + + Parameters + ---------- + cfg : Config + Configuration object containing file paths and parameters. + points : pd.DataFrame + DataFrame containing reference point coordinates and upstream areas. + ldd_fine : xr.DataArray + Map of local drainage directions in the fine grid. + upstream_fine : xr.DataArray + Map of upstream area (km2) in the fine grid. + save : bool, optional + If True, the updated table of points and catchments are exported. + + Returns + ------- + Optional[Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]] + A tuple containing: + - A table with updated station coordinates and upstream areas. + - A table with the catchment polygons in the finer grid. + Returns None if no polygons are generated. + """ + + # add columns to the table of points + points_fine = points.copy() + n_points = points.shape[0] + cols = ['lat', 'lon', 'area'] + new_cols = sorted([f'{col}_{cfg.fine_resolution}' for col in cols]) + points_fine[new_cols] = np.nan + + # create river network + fdir_fine = pyflwdir.from_array( + ldd_fine.data, + ftype='d8', + transform=ldd_fine.rio.transform(), + check_ftype=False, + latlon=True + ) + + polygons_fine = [] + for point_id, attrs in tqdm(points.iterrows(), total=n_points, desc='points'): + try: + # reference coordinates and upstream area + lat_ref, lon_ref, area_ref = attrs[cols] + + # search new coordinates in an increasing range + ranges = [55, 101, 151] + penalties = [500, 500, 1000] + factors = [2, 0.5, 0.25] + acceptable_errors = [50, 80, np.nan] + for range_xy, penalty, factor, max_error in zip(ranges, penalties, factors, acceptable_errors): + logger.debug(f'Set range to {range_xy}') + lat, lon, error = find_pixel(upstream_fine, lat_ref, lon_ref, area_ref, range_xy=range_xy, penalty=penalty, factor=factor) + if error <= max_error: + break + + # update new columns in 'points_fine' + points_fine.loc[point_id, new_cols] = [int(upstream_fine.sel(y=lat, x=lon).item()), round(lat, 6), round(lon, 6)] + + # boolean map of the catchment + basin_arr = fdir_fine.basins(xy=(lon, lat)).astype(np.int32) + + # vectorize the boolean map into geopandas + basin_gdf = catchment_polygon( + basin_arr.astype(np.int32), + transform=ldd_fine.rio.transform(), + crs=ldd_fine.rio.crs, + name='ID' + ) + basin_gdf['ID'] = point_id + basin_gdf[cols] = attrs[cols] + basin_gdf.set_index('ID', inplace=True) + + # save polygon + polygons_fine.append(basin_gdf) + + # logger.info(f'Point {point_id} located in the finer grid') + except Exception as e: + logger.error(f'Point {point_id} could not be located in the finer grid: {e}') + + # handle case where no polygons were generated + if not polygons_fine: + logger.warning('No points could be located in the finer grid. Returning empty dataframes.') + return gpd.GeoDataFrame(), gpd.GeoDataFrame() + + # concatenate polygons shapefile + polygons_fine = pd.concat(polygons_fine) + + # convert points to geopandas + points_fine = gpd.GeoDataFrame( + points_fine, + geometry=gpd.points_from_xy(points_fine[f'lon_{cfg.fine_resolution}'], points_fine[f'lat_{cfg.fine_resolution}']), + crs=ldd_fine.rio.crs + ) + points_fine.sort_index(axis=1, inplace=True) + + # compute error + points_fine['abs_error'] = abs(points_fine[f'area_{cfg.fine_resolution}'] - points_fine['area']) + points_fine['pct_error'] = points_fine.abs_error / points_fine['area'] * 100 + + if save: + # polygons + polygon_shp = cfg.output_folder / f'catchments_{cfg.fine_resolution}.shp' + polygons_fine.to_file(polygon_shp) + logger.info(f'Catchments in the finer grid have been exported to: {polygon_shp}') + + # points + point_shp = cfg.output_folder / f'{cfg.points.stem}_{cfg.fine_resolution}.shp' + points_fine.to_file(point_shp) + logger.info(f'The updated points table in the finer grid has been exported to: {point_shp}') + + return points_fine, polygons_fine \ No newline at end of file diff --git a/src/lisfloodutilities/lfcoords/lfcoords.py b/src/lisfloodutilities/lfcoords/lfcoords.py new file mode 100644 index 0000000..680e665 --- /dev/null +++ b/src/lisfloodutilities/lfcoords/lfcoords.py @@ -0,0 +1,129 @@ +import sys +import argparse +import logging +from datetime import datetime + +import numpy as np +import pandas as pd +import geopandas as gpd + +from lisfloodpreprocessing import Config, read_input_files +from lisfloodpreprocessing.utils import find_conflicts +from lisfloodpreprocessing.finer_grid import coordinates_fine +from lisfloodpreprocessing.coarser_grid import coordinates_coarse + +logging.getLogger('pyogrio').propagate = False + +def main(): + """ + Main function to correct point coordinates to match the river network in + LISFLOOD static maps. + """ + parser = argparse.ArgumentParser( + description=""" + Correct the coordinates of a set of points to match the river network in the + LISFLOOD static map. + First, it uses a reference value of catchment area to find the most accurate + pixel in a high-resolution map. + Second, it finds the pixel in the low-resolution map that better matches the + catchment shape derived from the high-resolution map. + """ + ) + parser.add_argument( + '-c', '--config-file', type=str, required=True, + help='Path to the configuration file' + ) + args = parser.parse_args() + + # create the root logger + logger = logging.getLogger() + logger.setLevel(logging.INFO) + + # define a shared log format + log_format = logging.Formatter( + '%(asctime)s | %(levelname)s | %(name)s | %(message)s', + datefmt='%Y-%m-%d %H:%M:%S' + ) + + # console handler + c_handler = logging.StreamHandler() + c_handler.setFormatter(log_format) + c_handler.setLevel(logging.INFO) + logger.addHandler(c_handler) + + # File handler + f_handler = logging.FileHandler(f'lfcoords_{datetime.now():%Y%m%d%H%M}.log') + f_handler.setFormatter(log_format) + f_handler.setLevel(logging.INFO) + logger.addHandler(f_handler) + + # main script logic + success = False + + try: + logger.info('Starting coordinate correction process...') + + # read configuration + logger.info(f"Reading configuration from {args.config_file}") + cfg = Config(args.config_file) + + # read input files + logger.info('Reading input files...') + inputs = read_input_files(cfg) + + # find coordinates in high resolution + logger.info('Processing points in the high-resolution grid...') + points_HR, polygons_HR = coordinates_fine( + cfg, + points=inputs['points'], + ldd_fine=inputs['ldd_fine'], + upstream_fine=inputs['upstream_fine'], + save=True + ) + + # find conflicts in high resolution + logger.info('Finding conflicts in the high-resolution grid...') + conflicts_fine = find_conflicts( + points_HR, + resolution=cfg.fine_resolution, + pct_error=cfg.pct_error, + save=cfg.output_folder / f'conflicts_{cfg.fine_resolution}.shp' + ) + if conflicts_fine is not None: + points_HR.drop(conflicts_fine.index, axis=0, inplace=True) + + # find coordinates in LISFLOOD + logger.info('Processing points in the LISFLOOD grid...') + points_LR, polygons_LR = coordinates_coarse( + cfg, + points_fine=points_HR, + polygons_fine=polygons_HR, + ldd_coarse=inputs['ldd_coarse'], + upstream_coarse=inputs['upstream_coarse'], + save=True + ) + + # find conflicts in LISFLOOD + logger.info('Finding conflicts in the LISFLOOD grid...') + conflicts_coarse = find_conflicts( + points_LR, + resolution=cfg.coarse_resolution, + pct_error=cfg.pct_error, + save=cfg.output_folder / f'conflicts_{cfg.coarse_resolution}.shp' + ) + + logger.info('Process completed successfully') + success = True + + except FileNotFoundError as e: + logger.error(f"A required file was not found: {e}") + except OSError as e: + logger.error(f"An I/O error occurred: {e}") + except Exception as e: + logger.error(f"An unexpected error occurred: {e}") + + if not success: + sys.exit(1) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/src/lisfloodutilities/lfcoords/utils.py b/src/lisfloodutilities/lfcoords/utils.py new file mode 100644 index 0000000..8674858 --- /dev/null +++ b/src/lisfloodutilities/lfcoords/utils.py @@ -0,0 +1,218 @@ +import os +import logging +from typing import Tuple, Optional, Union +from pathlib import Path + +import numpy as np +import pandas as pd +import geopandas as gpd +import xarray as xr +from affine import Affine +from rasterio import features +from pyproj.crs import CRS + +os.environ['USE_PYGEOS'] = '0' + +# set logger +logger = logging.getLogger(__name__) + + +def find_pixel( + upstream: xr.DataArray, + lat: int, + lon: int, + area: float, + range_xy: int = 55, + penalty: int = 500, + factor: int = 2, + distance_scaler: float = .92, + error_threshold: int = 50 +) -> Tuple[float, float, float]: + """ + Finds the coordinates of the pixel in the upstream map with a smaller + error compared with a reference area. + + Parameters: + ----------- + upstream: xr.DataArray + The upstream data containing latitude and longitude coordinates. + lat: float + The original latitude value. + lon: float + The original longitude value. + area: float + The reference area to calculate percent error. + range_xy: int, optional + The range in both x and y directions to search for the new location. + penalty: int, optional + The penalty value to add to the distance when the percent error is too high. + error_threshold: float, optional + The threshold for the percent error to apply the penalty. + factor: int, optional + The factor to multiply with the distance for the error calculation. + distance_scaler: float, optional + The scaling factor for the distance calculation in pixels. + + Returns: + -------- + Tuple[float, float, float] + A tuple containing: + - lat_new : float + The latitude of the new location. + - lon_new : float + The longitude of the new location. + - min_error : float + The minimum error value at the new location. + """ + + # find coordinates of the nearest pixel in the map + nearest_pixel = upstream.sel(y=lat, x=lon, method='nearest') + lat_orig, lon_orig = (nearest_pixel[coord].item() for coord in ['y', 'x']) + + # extract subset of the upstream map + cellsize = np.mean(np.diff(upstream.x.data)) + delta = range_xy * cellsize + 1e-6 + upstream_sel = upstream.sel(y=slice(lat_orig + delta, lat_orig - delta), + x=slice(lon_orig - delta, lon_orig + delta)) + + # distance from the original pixel (in pixels) + i = np.arange(-range_xy, range_xy + 1) + ii, jj = np.meshgrid(i, i) + distance = xr.DataArray( + data=np.sqrt(ii**2 + jj**2) * distance_scaler, + coords=upstream_sel.coords, + dims=upstream_sel.dims + ) + + # percent error in catchment area + error = 100 * abs(area - upstream_sel) / area + + # penalise if error is too big + distance = distance.where(error <= error_threshold, distance + penalty) + + # update error based on distance + error += factor * distance + + # the new location is that with the smallest error + min_error = error.where(error == error.min(), drop=True) + if min_error.size == 1: + lat_new, lon_new = [min_error[coord].item() for coord in ['y', 'x']] + else: + idx = np.argwhere(~np.isnan(min_error.data.flatten()))[0][0] + x_idx, y_idx = np.unravel_index(idx, min_error.shape) + lat_new, lon_new = min_error.y.data[y_idx], min_error.x.data[x_idx] + + return lat_new, lon_new, min_error.item() + + +def catchment_polygon( + data: np.ndarray, + transform: Affine, + crs: CRS, + name: str = "catchment" +) -> gpd.GeoDataFrame: + """ + Converts a boolean 2D array of catchment extent to a GeoDataFrame. + + Parameters: + ----------- + data : numpy.ndarray + The input raster data to be vectorized. + transform : affine.Affine + The affine transform matrix for the raster data. + crs : dict or str + The Coordinate Reference System (CRS) of the input data. + name : str, optional + The name to be used for the value column in the resulting GeoDataFrame. + + Returns: + -------- + geopandas.GeoDataFrame + A GeoDataFrame containing the vectorized geometries. + """ + + # Generate shapes and associated values from the raster data + feats_gen = features.shapes( + data, + mask=data != 0, + transform=transform, + connectivity=8, + ) + + # Create a list of features with geometry and properties + feats = [ + {"geometry": geom, "properties": {name: val}} for geom, val in feats_gen + ] + + # Check if no features were found + if not feats: + raise ValueError("No features found in the raster data.") + + # Create a GeoDataFrame from the features + gdf = gpd.GeoDataFrame.from_features(feats, crs=crs) + + # Convert the value column to the same data type as the input raster data + gdf[name] = gdf[name].astype(data.dtype) + + return gdf + + +def find_conflicts( + points: gpd.GeoDataFrame, + resolution: str, + pct_error: float = 30, + save: Optional[Union[Path, str]] = None +) -> gpd.GeoDataFrame: + """ + Finds conflicts in the new point layer, either due to points that overlap, + or large catchment area errors. + + Parameters: + ----------- + points: geopandas.GeoDataFrame + Point layer resulting from `coordinates_fine` or `coordinates_coarse` + resolution: str + Spatial resolution of the fields in "points" to be checked. For instance, + '3min' will check the coordinates in the fields 'lat_3min' and 'lon_3min', + and the catchment area in the field 'area_3min' + pct_error: float + Maximum percentage error allowed in the derived catchment area compared + with the reference. It must be a value between 0 and 100 + save: pathlib.Path or string (optional) + If provided, file name of the shapefile of conflicting points + + Returns: + -------- + geopandas.GeoDataFrame + Subset of "points" with conflicts. Only if "save" is None. + """ + + # find overlaping points + columns = [f'{col}_{resolution}' for col in ['lat', 'lon']] + mask = points.duplicated(subset=columns, keep=False) + duplicates = points[mask].copy() + duplicates['conflict'] = 'points overlap' + if duplicates.shape[0] > 0: + n_duplicates = len(duplicates[columns[0]].unique()) + logger.warning(f'There are {n_duplicates} conflicts in which points are located at the same pixel in the finer grid') + + # errors in the delineated area + assert 0 <= pct_error <= 100, '"pct_error" must be a value between 0 and 100' + if 'pct_error' not in points.columns: + points['pct_error'] = abs(points['area'] - points[f'area_{resolution}']) / points['area'] * 100 + mask = points.pct_error >= pct_error + deviations = points[mask].copy() + deviations['conflict'] = 'large area error' + n_deviations = deviations.shape[0] + if n_deviations > 0: + logger.warning(f'There are {n_deviations} conflicts in which the new points area has a large error') + + # combine + if (duplicates.shape[0] > 0) or (n_deviations > 0): + conflicts = pd.concat((duplicates, deviations), axis=0) + if save is not None: + conflicts.to_file(save) + logger.info(f'The conflicting points were saved in {save}') + return conflicts + + return gpd.GeoDataFrame() # return an empty GeoDataFrame if no conflicts found \ No newline at end of file diff --git a/src/lisfloodutilities/thresholds/thresholds.py b/src/lisfloodutilities/thresholds/thresholds.py index 37b160e..85f72dc 100755 --- a/src/lisfloodutilities/thresholds/thresholds.py +++ b/src/lisfloodutilities/thresholds/thresholds.py @@ -16,6 +16,7 @@ import numpy as np import xarray as xr +from typing import Tuple, Union, List, Literal def lmoments(values: np.ndarray) -> np.ndarray: @@ -78,21 +79,112 @@ def lmoments(values: np.ndarray) -> np.ndarray: return lmoments -def gumbel_parameters_moments(dis): - sigma = np.sqrt(6) * np.nanstd(dis, ddof=1, axis=0) / np.pi - mu = np.nanmean(dis, axis=0) - 0.5772 * sigma - return sigma, mu +def gumbel_parameters_moments( + dis: xr.DataArray, + dim: str = 'time' +) -> xr.Dataset: + """Calculate the scale (sigma) and location (mu) parameters of the Gumbel distribution using method of moments. + + The method of moments is a statistical technique used to estimate the parameters of a distribution by equating + the theoretical moments of the distribution to the sample moments. + + Parameters: + ----------- + dis: xarray.DataArray + An xarray DataArray containing the data over which to calculate the Gumbel parameters. + The function computes the parameters along the dimension 'dim', typically time. + dim: string + Dimension in 'dis' over which statistics will be computed + + Returns: + -------- + parameters: xarray.Dataset + An xarray Dataset containing the estimated scale (sigma) and location (mu) parameters of the Gumbel distribution. + The dataset includes two data variables: + - 'sigma': xarray DataArray of the estimated scale parameters + - 'mu': xarray DataArray of the estimated location parameters + """ + # estimate parameters + sigma = np.sqrt(6) / np.pi * dis.std(dim=dim, ddof=1, skipna=True) + mu = dis.mean(dim=dim, skipna=True) - np.euler_gamma * sigma + + # create dataset + coords = {coord: values for coord, values in dis_max.coords.items() if coord != dim} + try: + dims = [dim for dim in list(coords) if dim not in ['lat', 'lon']] + except: + dims = [dim for dim in list(coords) if dim not in ['y', 'x']] + parameters = xr.Dataset({ + 'sigma': xr.DataArray(sigma, dims=dims, coords=coords), + 'mu': xr.DataArray(mu, dims=dims, coords=coords) + }) + + return parameters + + +def gumbel_parameters_lmoments(dis_max: xr.DataArray, dim: str = 'time') -> xr.Dataset: + """Calculate the scale (sigma) and location (mu) parameters of the Gumbel distribution using the method of L-moments. + + Parameters: + ----------- + dis_max: xarray.DataArray + An xarray DataArray containing the annual maximum series over which to calculate the Gumbel parameters. + dim: string + Temporal dimension in 'dis_max' + + Returns: + -------- + parameters: xarray.Dataset + An xarray Dataset containing the estimated scale (sigma) and location (mu) parameters of the Gumbel distribution. + The dataset includes two data variables: + - 'sigma': xarray DataArray of the estimated scale parameters + - 'mu': xarray DataArray of the estimated location parameters + """ -def gumbel_parameters_lmoments(dis): - lambda_coef = lmoments(dis) + # estimate parameters + lambda_coef = lmoments(dis_max) sigma = lambda_coef[1] / np.log(2) - mu = lambda_coef[0] - sigma * 0.5772 - return sigma, mu - - -def gumbel_function(y, sigma, mu): - return mu - sigma * np.log(np.log(y / (y - 1))) + mu = lambda_coef[0] - sigma * np.euler_gamma + + # create dataset + coords = {coord: values for coord, values in dis_max.coords.items() if coord != dim} + try: + dims = [dim for dim in list(coords) if dim not in ['lat', 'lon']] + except: + dims = [dim for dim in list(coords) if dim not in ['y', 'x']] + parameters = xr.Dataset({ + 'sigma': xr.DataArray(sigma, dims=dims, coords=coords), + 'mu': xr.DataArray(mu, dims=dims, coords=coords) + }) + + return parameters + + +def gumbel_function( + return_period: Union[np.ndarray, int], + sigma: Union[np.ndarray, float], + mu: Union[np.ndarray, float] +) -> Union[np.ndarray, float]: + """Compute the value of a Gumbel distribution associated to certain return periods. + + This function is equivalent to scipy.stats.gumbel_r.ppf(1 - 1 / return_period, loc=mu, scale=sigma) + + Parameters: + ----------- + return_period: integer or numpy.ndarray + return period + sigma: float or numpy.ndarray + scale parameter of the Gumbel distribution + mu: float or numpy.ndarray + location parameter of the Gumbel distribution + + Returns: + -------- + float or numpy.ndarray + Value of the Gumbel distribution associated to the return period + """ + return mu - sigma * np.log(np.log(return_period / (return_period - 1))) def find_main_var(ds, path): @@ -141,22 +233,51 @@ def create_dataset(dis_max, return_periods, mask, thresholds, sigma, mu): return ds_rp -def compute_thresholds_gumbel(dis_max, return_periods): - mask = np.isfinite(dis_max.isel(time=0).values) - dis_max_masked = dis_max.values[:, mask] +def compute_thresholds_gumbel( + dis_max: xr.DataArray, + return_periods: List, + method: Literal['l-moments', 'moments'] = 'l-moments', + dim: str = 'time' +) -> xr.Dataset: + """Fits the parameters of the Gumbel right distribution and estimates the discharge associated with the return periods + + Parameters: + ----------- + dis_max: xarray.DataArray + An xarray DataArray containing the annual maximum series over which to calculate the Gumbel parameters + return_periods: List + Return periods for which the associated discharge will be calculated + method: string + Method used to estimate the parameters of the Gumbel distribution + dim: string + Temporal dimension in 'dis_max' + + Returns: + -------- + xr.Dataset + An xarray Dataset containing the estimated Gumbel distribution parameters and discharge levels for the specified return periods. + The dataset includes: + - 'sigma': xarray DataArray of the estimated scale parameter of the Gumbel distribution. + - 'mu': xarray DataArray of the estimated location parameter of the Gumbel distribution. + - 'rp_X': xarray DataArray(s) of the discharge levels corresponding to each specified return period X. + """ + + # filter points with NaN in the first value + mask = np.isfinite(dis_max.isel({dim: 0})) + dis_max_masked = dis_max.where(mask, drop=True) print("Computing Gumbel coefficients") - sigma, mu = gumbel_parameters_lmoments(dis_max_masked) + if method.lower() == 'l-moments': + parameters = gumbel_parameters_lmoments(dis_max_masked, dim=dim) + elif method.lower() == 'moments': + parameters = gumbel_parameters_moments(dis_max_masked, dim=dim) + else: + raise ValueError(f"Invalid method '{method}'. Expected 'l-moments' or 'moments'.") print("Computing return periods") - thresholds = [] - for y in return_periods: - dis = gumbel_function(y, sigma, mu) - thresholds.append(dis) + thresholds = xr.Dataset({f'rp_{rp}': gumbel_function(rp, parameters.sigma, parameters.mu) for rp in return_periods}) - ds_rp = create_dataset(dis_max, return_periods, mask, thresholds, sigma, mu) - - return ds_rp + return xr.merge([parameters, thresholds]) def main(argv=sys.argv): diff --git a/tests/data/lfcoords/EFAS/ldd_1min.nc b/tests/data/lfcoords/EFAS/ldd_1min.nc new file mode 100644 index 0000000..e40a692 Binary files /dev/null and b/tests/data/lfcoords/EFAS/ldd_1min.nc differ diff --git a/tests/data/lfcoords/EFAS/uparea_1min.nc b/tests/data/lfcoords/EFAS/uparea_1min.nc new file mode 100644 index 0000000..39214b1 Binary files /dev/null and b/tests/data/lfcoords/EFAS/uparea_1min.nc differ diff --git a/tests/data/lfcoords/MERIT/ldd_3sec.tif b/tests/data/lfcoords/MERIT/ldd_3sec.tif new file mode 100644 index 0000000..106cd5f Binary files /dev/null and b/tests/data/lfcoords/MERIT/ldd_3sec.tif differ diff --git a/tests/data/lfcoords/MERIT/uparea_3sec.tif b/tests/data/lfcoords/MERIT/uparea_3sec.tif new file mode 100644 index 0000000..af0a2e1 Binary files /dev/null and b/tests/data/lfcoords/MERIT/uparea_3sec.tif differ diff --git a/tests/data/lfcoords/config.yml b/tests/data/lfcoords/config.yml new file mode 100644 index 0000000..fddd8ba --- /dev/null +++ b/tests/data/lfcoords/config.yml @@ -0,0 +1,13 @@ +input: + points: data/lfcoords/points.csv + ldd_fine: data/lfcoords/MERIT/ldd_3sec.tif + upstream_fine: data/lfcoords/MERIT/uparea_3sec.tif + ldd_coarse: data/lfcoords/EFAS/ldd_1min.nc + upstream_coarse: data/lfcoords/EFAS/uparea_1min.nc + +output_folder: data/lfcoords/shapefiles/ + +conditions: + min_area: 25 # km2 + abs_error: 50 # km2 + pct_error: 5 # % diff --git a/tests/data/lfcoords/expected.csv b/tests/data/lfcoords/expected.csv new file mode 100644 index 0000000..ea455f3 --- /dev/null +++ b/tests/data/lfcoords/expected.csv @@ -0,0 +1,4 @@ +ID,area,area_1min,area_3sec,lat,lat_1min,lat_3sec,lon,lon_1min,lon_3sec +2648,2506,2530,2498,43.476871,43.475,43.476667,-6.721875,-6.725,-6.721667 +2651,2292,2308,2286,43.385655,43.391667,43.385833,-6.827365,-6.825,-6.8275 +2653,1770,1782,1762,43.234752,43.241667,43.235,-6.849185,-6.841667,-6.849167 diff --git a/tests/data/lfcoords/points.csv b/tests/data/lfcoords/points.csv new file mode 100644 index 0000000..b566583 --- /dev/null +++ b/tests/data/lfcoords/points.csv @@ -0,0 +1,4 @@ +ID,lon,lat,area +2648,-6.721875,43.476871,2506 +2651,-6.827365,43.385655,2292 +2653,-6.849185,43.234752,1770 diff --git a/tests/test_lfcoords.py b/tests/test_lfcoords.py new file mode 100644 index 0000000..a93c7e2 --- /dev/null +++ b/tests/test_lfcoords.py @@ -0,0 +1,29 @@ +import unittest +from pathlib import Path +import pandas as pd +import pandas.testing as pdt +from lisfloodutilities.lfcoords import Config +from lisfloodutilities.lfcoords.finer_grid import coordinates_fine +from lisfloodutilities.lfcoords.coarser_grid import coordinates_coarse + + +class TestLFcoords(unittest.TestCase): + + path = Path('tests/data/lfcoords') + + def test_lfcoords(self): + + # compute test values + cfg = Config(self.path / 'config.yml') + stations_HR = coordinates_fine(cfg, save=False) + test = coordinates_coarse(cfg, stations_HR, save=False) + + # load expected values + expected = pd.read_csv(self.path / 'expected.csv', index_col='ID') + expected.index = expected.index.astype(test.index.dtype) + + # check + try: + pdt.assert_frame_equal(test, expected, check_dtype=False) + except AssertionError as e: + print(e)