Skip to content

Collating all open-source datasets, software tools and deployment platforms related to open-source agriculture.

License

Notifications You must be signed in to change notification settings

geezacoleman/OpenSourceAgriculture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

osag

Welcome to the OSA repository for all things open-source in agricultural technology (agritech) development. This accompanies the OpenSourceAg newsletter, which you can sign up to here.

The idea behind this repository is to collate all open-source datasets and projects in agtech in one place for easy reference and to get a better picture of what is out there.

Contributing

If you see a dataset is missing or you find an error in the tables, please submit a pull request or issue detailing the changes.

Overview

Datasets

Annotated image data is the backbone of precision agricultural operations such as site-specific weed control. This data is essential for training algorithms that can find weeds, insects and count fruit on the tree. A summary of datasets from each domain are provided below. Click on the drop-down list to find out more.

Weeds

Open-access image datasets of weeds
Dataset Task Image Number Class Number Species Description
Agriculture-Vision Instance Segmentation Aerial images for detecting weeds in various agricultural fields.
Carrot-Weed Segmentation 39 2 carrot (Daucus carota ssp. sativus), unspecified weeds
Corn/Lettuce/Radish Classification 7200 8 maize (Zea mays), Canada thistle (Cirsium arvense), fat hen (Chenopodium album), bluegrass (Poa spp.), lettuce, radish
CottonWeeds Classification 5,187 15 morningglory (Ipomoea spp.), carpetweed (Mollugo verticillata), Palmer amaranth (Amaranthus palmeri), waterhemp (Amaranthus tuberculata), purslane (Portulaca spp.), nutsedge (Cyperus spp.), eclipta (Eclipta prostrata), sicklepod (Senna obtusifolia), spotted spurge (Euphorbia maculata), ragweed (Ambrosia spp.), goosegrass (Eleusine indica), prickly sida (Sida spinosa), crabgrass (Digitaria spp.), swinecress (Lepidium spp.), spurred anoda (Anoda cristata)
CornWeed Object Detection 3,574 2 Zea mays, weeds The CornWeed dataset was collected on farm machines for evaluating weed detection in corn crops. A conference paper is available.
CottonWeedDet12 Object Detection 5,648 (9370 instances) 12
CropAndWeed Object Detection/segmentation/stem localization 8,034 (111,953 instances) 74 See supplementary for full list An extensive collect of 74 crop and weed species over four years in Europe. Annotated with bounding boxes, segmentation and for plant centroid detection.
CWF-788 Segmentation 788 1 cauliflower (Brassica oleracea var. botrytis)
CWFID Segmentation 60 2 carrot, unspecified weeds
CWD30 Classification, Segmentation 219,778 20 weed, 10 crop Asian flatsedge (Cyperus microiria), Asiatic dayflower (Commelina communis), Bean (Phaseolus vulgaris), Bloodscale sedge (Carex haematostoma), Cockspur grass (Echinochloa crus-galli), Copperleaf (Acalypha spp.), Corn (Zea mays), Early barnyard grass (Echinochloa oryzoides), Fall panicum (Panicum dichotomiflorum), Finger grass (Digitaria sanguinalis), Foxtail millet (Setaria italica), Goosefoot (Chenopodium album), Great millet (Sorghum bicolor), Green foxtail (Setaria viridis), Green gram (Vigna radiata), Henbit (Lamium amplexicaule), Indian goosegrass (Eleusine indica), Korean dock (Rumex crispus), Livid pigweed (Amaranthus lividus), Nipponicus sedge (Carex nipponica), Peanut (Arachis hypogaea), Perilla (Perilla frutescens), Poa annua (Poa annua), Proso millet (Panicum miliaceum), Purslane (Portulaca oleracea), Red bean (Phaseolus angularis), Redroot pigweed (Amaranthus retroflexus), Sesame (Sesamum indicum), Smooth pigweed (Amaranthus hybridus), White goosefoot (Chenopodium album) From the paper: Extensive crop-weed dataset with multi-view and multi-stage plant images. The repository includes pretrained models for transfer learning
GrassClover Segmentation 8000 5 white clover (Trifolium repens), red clover (Trifolium pratense), shepherd’s purse (Capsella bursa-pastoris), unspecified thistle, dandelion (Taraxacum officinale)
iNatAg Classification 4,720,903 2,959 see dataset card and preprint A curated collection images from the iNaturalist database for crop-weed detection training. Implemented through the AgML project
LincolnBeet Bounding box 4,402 2 sugar beet (Beta vulgaris var. altissima), unspecified weeds
Moving Fields Weed Dataset Bounding box, segmentation 94,321 36 maize varieties (2), sorghum varieties (6), weed species (28) Images collected within a fully automated high throughput phenotyping facility under controlled conditions with high spatial (2456×2058) and temporal resolution. Github (dataset download)
Plant Seedling Dataset Segmentation 5,539 12 maize, wheat (Triticum aestivum), sugar beet, scentless mayweed (Matricaria perforata), common chickweed (Stellaria media), shepherd’s purse, cleavers (Galium aparine), charlock (Sinapis arvensis), fat hen, small-flowered cranesbill (Geranium pusillum), blackgrass (Alopecurus myosuroides), loose silky-bent (Apera spica-venti)
Precision Sustainable Ag 2021 OpenCV Competition Bounding box 727 7 grass species (Poaceae spp.), horseweed (Erigeron canadensis), cowpea (Vigna unguiculata), crimson clover (Trifolium incarnatum), goosefoot (Chenopodium album), velvetleaf (Abutilon theophrasti), sunflower (Helianthus annuus)
RoboWeedMap Bounding box 1147 2 Unspecified monocotyledonous, Unspecified dicotyledonous
Sandplain Lupins Segmentation 795 (7989 instances) 1 Sandplain lupin (Lupinus cosentinii) This repository contains five datasets collected in the field by a DJI Phantom 4 or smartphone in the northern wheatbelt of Western Australia.
Soybean/Grass/Broadleaf/Soil Segmentation 15,336 3 soybean (Glycine max), grass weeds, broadleaf weeds
Sugar beets Segmentation 300 10 sugar beet, Nine unspecified weed species
Weed-AI All Hosting platform Includes over 30,000 images with bounding box annotation sourced from datasets across the internet
WeedMap Segmentation 10,196 2 sugar beet
WeedNet Segmentation 155 2 sugar beet, unspecified weeds

Insects

Open-access image datasets of insects
Dataset Task Image Number Classes Description
IP102 Classification/ object detection Classification: >75,000, bounding box: 19,000 102 A very large open-source dataset of insect pests. The IP102 is annotated with a hierarchical taxonomy and the insect pests which mainly affect one specific agricultural product are grouped into the same upper-level category. The full class list
BIOSCAN-1M Classification 1,128,308 16 The BIOSCAN-1M Insect dataset consists of specimens mostly collected from three countries (Costa Rica, Canada, and South Africa) using Malaise traps. RGB images of the organisms were taken with a Keyence VHX-7000 microscope.

Diseases

Open-access image datasets of plant diseases
Dataset Task Image Number Classes Description
PlantVillage Image Classification 54,306 14 crop species, 26 diseases Dataset with a focus on plant disease detection.
Dhan-Shomadhan: A Dataset of Rice Leaf Disease Classification for Bangladeshi Local Rice Image Classification 1106 5 dieases (Brown Spot, Leaf Scaled, Rice Blast, Rice Turngo, Steath Blight) An image classification dataset for five disease in Bangladeshi rice production, in field and white backgrounds.

Crop Phenotyping

Open-access image datasets for crop phenotyping
Dataset Task Image Number Classes Description
Global Wheat Head Dataset Object detection/segmentation GWHD2020 - 4,700, GWHD2021 - 6,422 wheat heads A field-collected dataset with wheat heads annotated with either bounding boxes (2020) or segmentation (2021). The GWHD2021 builds on the GWHD2020 by adding 1722 images and segmentation level annotations. Both can be downloaded from the link provided.
ImAg4Wheat Pre-training foundation models 2.5 M Unlabelled Comprises 2.5 million high-resolution images collected over a decade from breeding and experimental fields, spanning more than 2,000 genotypes and 500 distinct environmental conditions across 30 global sites.

Forestry

Open-access image datasets for the forestry industry
Dataset Task Image Number Classes Description
TimberVision Object detection/segmentation/tracking 2,023 images, 51,338 trunk components trunk, trunk components A field-collected dataset and framework for tree-trunk detection and tracking based on RGB images.
SynthTree43K Segmentation/depth >43,000 synthetic RGB + depth images, >162,000 trees tree trunks A synthetic dataset of tree trunks developed with the Unity game engine.

Fruit Counting

Open-access image datasets for fruit counting and yield estimation
Dataset Task Image Number Classes Description
KFuji RGB-DSM dataset Object Detection 967 (12,839 instances) 1 (fuji apples) RGB and Depth images of apple trees for fruit detection and counting.
MinneApple Object detection/ segmentation 1 (apples) 1000 (41,000 instances) A comprehensive dataset for developing apple detection and segmentation algorithms. Representative results are provided for yield estimation.

Post Harvest

Open-access image datasets for post harvest management (sorting, inspection, counting etc.) of produce and crops
Dataset Task Image Number Classes Description
SemanticSugarBeets Instance segmentation 952 (2920 individual beets) 6 (sugarbeet, cut, leaf, soil, damage, rot) Monocular RGB in .jpg format (2120x1192 px) of post harvest and post storage sugarbeet.

Text Datasets

Open-access text and multimodal datasets
Dataset Task Description
Agronomy Resources Text A collection of agronomy textbooks and guides from university extension programs.

Large Language Models

Tools (and models) related to use, analysis, development of large language (and vision) models.
Project Name Task Description
Hugging Face Collaboration platform for ML A platform for community driven development around ML/LLMs. All popular open-source LLMs are hosted here. The Hugging Face API is widely used for deployment/development.
Agronomy Arena LLM comparison tool for agronomy Provide an agricultural/plant science question to the model, 2 random AI models are selected to answer, then vote on which one you think is the most helpful response

Foundation Models

Open-access foundation models for agriculture
Model Name Task Training Approach
FoMo4Wheat Wheat image analysis ViT-based, 2-stage. Pre-train ViT-G on all data with DinoV2 init weights. Teacher-student training for L/B models. Freeze backbone, train lightweight adapter head on labelled data for specific tasks.

Geospatial Tools

Tools for ag-relevant geospatial analyses.
Project Name Task Description
OpenET FARMS Platform Landscape-scale evapotranspiration data analysis Farm and Ranch Management Support (FARMS) system enables the easy access and use (analysis, reports) of evapotranspiration (ET) data from openET. Limited to western USA

Software Development

Agriculture-specific tools for developing software.
Project Name Task Description
font.ag Ag-specific icons Font.AG is an open-source agricultural icon font, designed to provide scalable vector icons for modern agricultural applications.
Lex Icons Food systems icons A collection of peer-reviewed visual language of terms and machine readable icons

Hardware Development

Tools for developing hardware and integrating into agricultural machinery.
Project Name Task Description
AgISOStack++ ISOBUS Integration AgIsoStack++ is a free and open source library that provides easy and robust ISO 11783 and J1939 CAN communication functionality using C++

Algorithm Development

Tools for improving the algorithm development process.
Project Name Task Description
Project AgML ML Pipeline Standardising the development of ML algorithms, specific to agricultural data.
RootPainter Custom segmentation RootPainter is a GUI-based software tool for the rapid, corrective training of deep neural networks for use in biological image analysis. RootPainter uses a client-server architecture, allowing it to be used on a standard laptop with access to Google Colab or to be installed and run locally.
Segment-Anything Model (SAM) Zero-shot segmentation A recently released tool for zero-shot segmentation of images from Meta Research. Whilst not trained on agricultural data (though one plant dataset is used), the algorithm learns the concept of objects and can extrapolate well into unseen areas.

In-Field Deployment

Open-source hardware projects for field use.
Project Name Task Description
AgOpenGPS GPS Guidance A globally popular open-source GPS guidance system for tractors and implements, with substantial user base and development community. AgOpenGPS features a substantial user interface for additional features such as variable rate and mapping.
OpenWeedLocator (OWL) Site-specific weed control A DIY weed detection device based around the Raspberry Pi and Google Coral. Complete instructions for building and deploying.
Twisted Fields - Acorn Robotic platform Acorn is a solar-powered, light-weight, and open source Precision Farming Rover (PFR) for in-field use.
Insect Detect Insect monitoring Build your own insect-detecting camera trap for automated monitoring
StickyPi Insect monitoring A high-frequency smart insect trap to study daily activity in the field
Low Cost Water Quality Sampler Water quality monitoring A low-cost, automated water sampler over IoT for near-real-time water quality research developed by the Colorado State University Agricultural Water Quality Program
Mothbox Insect monitoring A low-cost, high performance insect monitor based on the Rasoberry Pi, Arducam 64MP camera with automatic image collection and analysis.
Laudando & Associates L&Aser Laser weeding An open-source implementation of a beta version of the L&Aser
FarmBot Gardening robot A gantry-style robot for monitoring and maintaining a raised garden bed. Purchasable as a kit or DIY

About

Collating all open-source datasets, software tools and deployment platforms related to open-source agriculture.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published