List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
-
Updated
Aug 14, 2024
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
Official Repository of "LLM × DATA" Survey Paper
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
[CVPR 2023] Label-Free Liver Tumor Segmentation
[CVPR 2024] Generalizable Tumor Synthesis - Realistic Synthetic Tumors in Liver, Pancreas, and Kidney
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
[ICLR 2025] Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]
Official Code for “EarthSynth: Generating Informative Earth Observation with Diffusion Models”
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Source code for LDPTrace: Locally Differentially Private Trajectory Synthesis. VLDB 2023.
[Preprint] Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
Add a description, image, and links to the data-synthesis topic page so that developers can more easily learn about it.
To associate your repository with the data-synthesis topic, visit your repo's landing page and select "manage topics."