Welcome to the OSA repository for all things open-source in agricultural technology (agritech) development. This accompanies the OpenSourceAg newsletter, which you can sign up to here.
The idea behind this repository is to collate all open-source datasets and projects in agtech in one place for easy reference and to get a better picture of what is out there.
If you see a dataset is missing or you find an error in the tables, please submit a pull request or issue detailing the changes.
- Datasets
- Large language models
- Foundation models
- Geospatial tools
- Software Development
- Hardware Development
- Algorithm Development
- In-field Deployment
Annotated image data is the backbone of precision agricultural operations such as site-specific weed control. This data is essential for training algorithms that can find weeds, insects and count fruit on the tree. A summary of datasets from each domain are provided below. Click on the drop-down list to find out more.
Open-access image datasets of weeds
| Dataset | Task | Image Number | Class Number | Species | Description |
|---|---|---|---|---|---|
| Agriculture-Vision | Instance Segmentation | Aerial images for detecting weeds in various agricultural fields. | |||
| Carrot-Weed | Segmentation | 39 | 2 | carrot (Daucus carota ssp. sativus), unspecified weeds | |
| Corn/Lettuce/Radish | Classification | 7200 | 8 | maize (Zea mays), Canada thistle (Cirsium arvense), fat hen (Chenopodium album), bluegrass (Poa spp.), lettuce, radish | |
| CottonWeeds | Classification | 5,187 | 15 | morningglory (Ipomoea spp.), carpetweed (Mollugo verticillata), Palmer amaranth (Amaranthus palmeri), waterhemp (Amaranthus tuberculata), purslane (Portulaca spp.), nutsedge (Cyperus spp.), eclipta (Eclipta prostrata), sicklepod (Senna obtusifolia), spotted spurge (Euphorbia maculata), ragweed (Ambrosia spp.), goosegrass (Eleusine indica), prickly sida (Sida spinosa), crabgrass (Digitaria spp.), swinecress (Lepidium spp.), spurred anoda (Anoda cristata) | |
| CornWeed | Object Detection | 3,574 | 2 | Zea mays, weeds | The CornWeed dataset was collected on farm machines for evaluating weed detection in corn crops. A conference paper is available. |
| CottonWeedDet12 | Object Detection | 5,648 (9370 instances) | 12 | ||
| CropAndWeed | Object Detection/segmentation/stem localization | 8,034 (111,953 instances) | 74 | See supplementary for full list | An extensive collect of 74 crop and weed species over four years in Europe. Annotated with bounding boxes, segmentation and for plant centroid detection. |
| CWF-788 | Segmentation | 788 | 1 | cauliflower (Brassica oleracea var. botrytis) | |
| CWFID | Segmentation | 60 | 2 | carrot, unspecified weeds | |
| CWD30 | Classification, Segmentation | 219,778 | 20 weed, 10 crop | Asian flatsedge (Cyperus microiria), Asiatic dayflower (Commelina communis), Bean (Phaseolus vulgaris), Bloodscale sedge (Carex haematostoma), Cockspur grass (Echinochloa crus-galli), Copperleaf (Acalypha spp.), Corn (Zea mays), Early barnyard grass (Echinochloa oryzoides), Fall panicum (Panicum dichotomiflorum), Finger grass (Digitaria sanguinalis), Foxtail millet (Setaria italica), Goosefoot (Chenopodium album), Great millet (Sorghum bicolor), Green foxtail (Setaria viridis), Green gram (Vigna radiata), Henbit (Lamium amplexicaule), Indian goosegrass (Eleusine indica), Korean dock (Rumex crispus), Livid pigweed (Amaranthus lividus), Nipponicus sedge (Carex nipponica), Peanut (Arachis hypogaea), Perilla (Perilla frutescens), Poa annua (Poa annua), Proso millet (Panicum miliaceum), Purslane (Portulaca oleracea), Red bean (Phaseolus angularis), Redroot pigweed (Amaranthus retroflexus), Sesame (Sesamum indicum), Smooth pigweed (Amaranthus hybridus), White goosefoot (Chenopodium album) | From the paper: Extensive crop-weed dataset with multi-view and multi-stage plant images. The repository includes pretrained models for transfer learning |
| GrassClover | Segmentation | 8000 | 5 | white clover (Trifolium repens), red clover (Trifolium pratense), shepherd’s purse (Capsella bursa-pastoris), unspecified thistle, dandelion (Taraxacum officinale) | |
| iNatAg | Classification | 4,720,903 | 2,959 | see dataset card and preprint | A curated collection images from the iNaturalist database for crop-weed detection training. Implemented through the AgML project |
| LincolnBeet | Bounding box | 4,402 | 2 | sugar beet (Beta vulgaris var. altissima), unspecified weeds | |
| Moving Fields Weed Dataset | Bounding box, segmentation | 94,321 | 36 | maize varieties (2), sorghum varieties (6), weed species (28) | Images collected within a fully automated high throughput phenotyping facility under controlled conditions with high spatial (2456×2058) and temporal resolution. Github (dataset download) |
| Plant Seedling Dataset | Segmentation | 5,539 | 12 | maize, wheat (Triticum aestivum), sugar beet, scentless mayweed (Matricaria perforata), common chickweed (Stellaria media), shepherd’s purse, cleavers (Galium aparine), charlock (Sinapis arvensis), fat hen, small-flowered cranesbill (Geranium pusillum), blackgrass (Alopecurus myosuroides), loose silky-bent (Apera spica-venti) | |
| Precision Sustainable Ag 2021 OpenCV Competition | Bounding box | 727 | 7 | grass species (Poaceae spp.), horseweed (Erigeron canadensis), cowpea (Vigna unguiculata), crimson clover (Trifolium incarnatum), goosefoot (Chenopodium album), velvetleaf (Abutilon theophrasti), sunflower (Helianthus annuus) | |
| RoboWeedMap | Bounding box | 1147 | 2 | Unspecified monocotyledonous, Unspecified dicotyledonous | |
| Sandplain Lupins | Segmentation | 795 (7989 instances) | 1 | Sandplain lupin (Lupinus cosentinii) | This repository contains five datasets collected in the field by a DJI Phantom 4 or smartphone in the northern wheatbelt of Western Australia. |
| Soybean/Grass/Broadleaf/Soil | Segmentation | 15,336 | 3 | soybean (Glycine max), grass weeds, broadleaf weeds | |
| Sugar beets | Segmentation | 300 | 10 | sugar beet, Nine unspecified weed species | |
| Weed-AI | All | Hosting platform | Includes over 30,000 images with bounding box annotation sourced from datasets across the internet | ||
| WeedMap | Segmentation | 10,196 | 2 | sugar beet | |
| WeedNet | Segmentation | 155 | 2 | sugar beet, unspecified weeds |
Open-access image datasets of insects
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| IP102 | Classification/ object detection | Classification: >75,000, bounding box: 19,000 | 102 | A very large open-source dataset of insect pests. The IP102 is annotated with a hierarchical taxonomy and the insect pests which mainly affect one specific agricultural product are grouped into the same upper-level category. The full class list |
| BIOSCAN-1M | Classification | 1,128,308 | 16 | The BIOSCAN-1M Insect dataset consists of specimens mostly collected from three countries (Costa Rica, Canada, and South Africa) using Malaise traps. RGB images of the organisms were taken with a Keyence VHX-7000 microscope. |
Open-access image datasets of plant diseases
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| PlantVillage | Image Classification | 54,306 | 14 crop species, 26 diseases | Dataset with a focus on plant disease detection. |
| Dhan-Shomadhan: A Dataset of Rice Leaf Disease Classification for Bangladeshi Local Rice | Image Classification | 1106 | 5 dieases (Brown Spot, Leaf Scaled, Rice Blast, Rice Turngo, Steath Blight) | An image classification dataset for five disease in Bangladeshi rice production, in field and white backgrounds. |
Open-access image datasets for crop phenotyping
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| Global Wheat Head Dataset | Object detection/segmentation | GWHD2020 - 4,700, GWHD2021 - 6,422 | wheat heads | A field-collected dataset with wheat heads annotated with either bounding boxes (2020) or segmentation (2021). The GWHD2021 builds on the GWHD2020 by adding 1722 images and segmentation level annotations. Both can be downloaded from the link provided. |
| ImAg4Wheat | Pre-training foundation models | 2.5 M | Unlabelled | Comprises 2.5 million high-resolution images collected over a decade from breeding and experimental fields, spanning more than 2,000 genotypes and 500 distinct environmental conditions across 30 global sites. |
Open-access image datasets for the forestry industry
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| TimberVision | Object detection/segmentation/tracking | 2,023 images, 51,338 trunk components | trunk, trunk components | A field-collected dataset and framework for tree-trunk detection and tracking based on RGB images. |
| SynthTree43K | Segmentation/depth | >43,000 synthetic RGB + depth images, >162,000 trees | tree trunks | A synthetic dataset of tree trunks developed with the Unity game engine. |
Open-access image datasets for fruit counting and yield estimation
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| KFuji RGB-DSM dataset | Object Detection | 967 (12,839 instances) | 1 (fuji apples) | RGB and Depth images of apple trees for fruit detection and counting. |
| MinneApple | Object detection/ segmentation | 1 (apples) | 1000 (41,000 instances) | A comprehensive dataset for developing apple detection and segmentation algorithms. Representative results are provided for yield estimation. |
Open-access image datasets for post harvest management (sorting, inspection, counting etc.) of produce and crops
| Dataset | Task | Image Number | Classes | Description |
|---|---|---|---|---|
| SemanticSugarBeets | Instance segmentation | 952 (2920 individual beets) | 6 (sugarbeet, cut, leaf, soil, damage, rot) | Monocular RGB in .jpg format (2120x1192 px) of post harvest and post storage sugarbeet. |
Open-access text and multimodal datasets
| Dataset | Task | Description |
|---|---|---|
| Agronomy Resources | Text | A collection of agronomy textbooks and guides from university extension programs. |
Tools (and models) related to use, analysis, development of large language (and vision) models.
| Project Name | Task | Description |
|---|---|---|
| Hugging Face | Collaboration platform for ML | A platform for community driven development around ML/LLMs. All popular open-source LLMs are hosted here. The Hugging Face API is widely used for deployment/development. |
| Agronomy Arena | LLM comparison tool for agronomy | Provide an agricultural/plant science question to the model, 2 random AI models are selected to answer, then vote on which one you think is the most helpful response |
Open-access foundation models for agriculture
| Model Name | Task | Training Approach |
|---|---|---|
| FoMo4Wheat | Wheat image analysis | ViT-based, 2-stage. Pre-train ViT-G on all data with DinoV2 init weights. Teacher-student training for L/B models. Freeze backbone, train lightweight adapter head on labelled data for specific tasks. |
Tools for ag-relevant geospatial analyses.
| Project Name | Task | Description |
|---|---|---|
| OpenET FARMS Platform | Landscape-scale evapotranspiration data analysis | Farm and Ranch Management Support (FARMS) system enables the easy access and use (analysis, reports) of evapotranspiration (ET) data from openET. Limited to western USA |
Agriculture-specific tools for developing software.
| Project Name | Task | Description |
|---|---|---|
| font.ag | Ag-specific icons | Font.AG is an open-source agricultural icon font, designed to provide scalable vector icons for modern agricultural applications. |
| Lex Icons | Food systems icons | A collection of peer-reviewed visual language of terms and machine readable icons |
Tools for developing hardware and integrating into agricultural machinery.
| Project Name | Task | Description |
|---|---|---|
| AgISOStack++ | ISOBUS Integration | AgIsoStack++ is a free and open source library that provides easy and robust ISO 11783 and J1939 CAN communication functionality using C++ |
Tools for improving the algorithm development process.
| Project Name | Task | Description |
|---|---|---|
| Project AgML | ML Pipeline | Standardising the development of ML algorithms, specific to agricultural data. |
| RootPainter | Custom segmentation | RootPainter is a GUI-based software tool for the rapid, corrective training of deep neural networks for use in biological image analysis. RootPainter uses a client-server architecture, allowing it to be used on a standard laptop with access to Google Colab or to be installed and run locally. |
| Segment-Anything Model (SAM) | Zero-shot segmentation | A recently released tool for zero-shot segmentation of images from Meta Research. Whilst not trained on agricultural data (though one plant dataset is used), the algorithm learns the concept of objects and can extrapolate well into unseen areas. |
Open-source hardware projects for field use.
| Project Name | Task | Description |
|---|---|---|
| AgOpenGPS | GPS Guidance | A globally popular open-source GPS guidance system for tractors and implements, with substantial user base and development community. AgOpenGPS features a substantial user interface for additional features such as variable rate and mapping. |
| OpenWeedLocator (OWL) | Site-specific weed control | A DIY weed detection device based around the Raspberry Pi and Google Coral. Complete instructions for building and deploying. |
| Twisted Fields - Acorn | Robotic platform | Acorn is a solar-powered, light-weight, and open source Precision Farming Rover (PFR) for in-field use. |
| Insect Detect | Insect monitoring | Build your own insect-detecting camera trap for automated monitoring |
| StickyPi | Insect monitoring | A high-frequency smart insect trap to study daily activity in the field |
| Low Cost Water Quality Sampler | Water quality monitoring | A low-cost, automated water sampler over IoT for near-real-time water quality research developed by the Colorado State University Agricultural Water Quality Program |
| Mothbox | Insect monitoring | A low-cost, high performance insect monitor based on the Rasoberry Pi, Arducam 64MP camera with automatic image collection and analysis. |
| Laudando & Associates L&Aser | Laser weeding | An open-source implementation of a beta version of the L&Aser |
| FarmBot | Gardening robot | A gantry-style robot for monitoring and maintaining a raised garden bed. Purchasable as a kit or DIY |
