Skip to content
@AMAP-ML

AMAP-ML

AMAP-ML

DreamX Team @ Amap (Alibaba)

GitHub Followers

We are the DreamX team at Amap (Alibaba), driving cutting-edge research and production AI systems across large language models, reinforcement learning, agent systems, multimodal understanding, generative AI (image/video), world models, autonomous driving, and intelligent mobility. With 6,000+ GitHub stars across 30+ open-source research projects, our work has been published at top-tier venues including ICLR, CVPR, ACL, AAAI, SIGGRAPH, ICCV, EMNLP, and ACM MM.

We are always looking for talented interns and full-time researchers with strong coding skills and research experience. Please email us at cxxgtxy@gmail.com if you are interested.


🔥 News

  • 2026.05.12 🎉 CoEvolve is accepted by ACL 2026 -- Training LLM Agents via Agent-Data Mutual Evolution.
  • 2026.05.12 🎉 Thinking-with-Map is accepted by ACL 2026 Findings -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
  • 2026.05.11 💻 We released DreamX-World 5B-Cam model and inference code -- A General-Purpose Interactive World Model.
  • 2026.04.22 💻 We open-sourced DCW -- Elucidating the SNR-t Bias of Diffusion Probabilistic Models (CVPR 2026).
  • 2026.04.22 💻 We open-sourced EMF -- Extending One-Step Image Generation from Class Labels to Text (CVPR 2026).
  • 2026.04.10 💻 We open-sourced SkillClaw -- Let Skills Evolve Collectively with Agentic Evolver.
  • 2026.04.10 💻 We open-sourced DreamX-World -- A General-Purpose Interactive World Model.
  • 2026.04.01 🎉 MACE-Dance is accepted by SIGGRAPH 2026 -- Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation.
  • 2026.03.23 💻 We open-sourced Omni-WorldBench -- A Comprehensive Benchmark for Evaluating Interactive Response Capabilities of World Models.
  • 2026.03.20 💻 We open-sourced AutoDrive-R2 -- Incentivizing Reasoning and Self-Reflection for VLA in Autonomous Driving.
  • 2026.03.18 💻 We open-sourced Video-STAR -- Reinforcing Open-Vocabulary Action Recognition with Tools.
  • 2026.03.11 💻 We open-sourced RL3DEdit -- Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing.
  • 2026.03.01 🎉 FE2E is accepted by CVPR 2026 -- Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation.
  • 2026.02.28 🎉 FASA is accepted by ICLR 2026 -- Frequency-Aware Sparse Attention.
  • 2026.02.27 🎉 Eevee is accepted by Findings of CVPR 2026 -- Towards Close-up High-resolution Video-based Virtual Try-on.
  • 2026.02.06 💻 We open-sourced MobilityBench -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
  • 2026.02.06 🎉 Video-STAR is accepted by ICLR 2026 -- Reinforcing Open-Vocabulary Action Recognition with Tools.
  • 2026.02.06 🎉 AutoDrive-R2 is accepted by ICLR 2026 -- Incentivizing Reasoning and Self-Reflection for VLA in Autonomous Driving.
  • 2026.02.06 🎉 SpatialGenEval is accepted by ICLR 2026 -- Benchmarking Spatial Intelligence of Text-to-Image Models.
  • 2026.02.06 🎉 Tree-GRPO is accepted by ICLR 2026 -- Tree Search for LLM Agent Reinforcement Learning.
  • 2026.02.06 🎉 S2-Guidance is accepted by ICLR 2026 -- Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models.
  • 2026.02.05 🎉 MathForge is accepted by ICLR 2026 -- Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation.
  • 2026.02.04 💻 We open-sourced Code2World -- A GUI World Model via Renderable Code Generation.
  • 2026.02.04 🎉 GPG is accepted by ICLR 2026 -- A Simple and Strong Reinforcement Learning Baseline for Model Reasoning.
  • 2026.02.04 🎉 NarrLV is accepted by ICLR 2026 -- A Comprehensive Narrative-Centric Evaluation for Long Video Generation Models.
  • 2026.02.04 🎉 EPG is accepted by ICLR 2026 -- Advancing End-To-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training.
  • 2026.02.04 🎉 Omni-Effects is accepted by AAAI 2026 -- Unified and Spatially-Controllable Visual Effects Generation.
  • 2026.02.04 🎉 ImagerySearch is accepted by AAAI 2026 -- Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints.
  • 2026.02.02 🎉 VMBench is accepted by ICCV 2025 -- A Benchmark for Perception-Aligned Video Motion Generation.
  • 2026.01.31 🎉 SocioReasoner is accepted by ICLR 2026 -- Urban Socio-Semantic Segmentation with Vision-Language Reasoning.
  • 2026.01.07 💻 We open-sourced Thinking-with-Map -- Reinforced Parallel Map-Augmented Agent for Geolocalization.
  • 2025.10.22 💻 We open-sourced Taming-Hallucinations -- Boosting MLLMs' Video Understanding via Counterfactual Video Generation.
  • 2025.06.20 💻 We open-sourced FluxText -- A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing.
  • 2025.05.21 💻 We open-sourced UniVG-R1 -- Reasoning Guided Universal Visual Grounding with Reinforcement Learning.
  • 2025.04.07 💻 We open-sourced RealQA -- Realistic Image Quality and Aesthetic Scoring with Multimodal LLM.

📚 Research Areas

🧠 LLM Reasoning & Agent Systems

Repository Description Venue
SkillClaw A framework enabling LLM agent skills to evolve collectively from real interactions, with automatic deduplication, improvement, and verification across sessions, agents, and devices. -
Tree-GRPO Adopts tree-search rollouts in place of independent chain-based rollouts for LLM agent RL, achieving superior performance with only a quarter of the rollout budget. ICLR 2026
GPG A minimalist RL approach (Group Policy Gradient) that directly optimizes the original RL objective, eliminating critic/reference models and KL constraints while outperforming GRPO. ICLR 2026
MathForge Proposes difficulty-aware GRPO and multi-aspect question reformulation to boost math reasoning by targeting harder questions from both algorithmic and data perspectives. ICLR 2026
CoEvolve A framework for training LLM agents via agent-data mutual evolution, using RL with failure-signal-driven task synthesis under changing training distributions. ACL 2026
FASA Frequency-aware sparse attention that identifies and preserves critical frequency components to achieve efficient and accurate sparse decoding. ICLR 2026

🎨 Image Generation & Editing

Repository Description Venue
FluxText A novel text editing framework for multi-line scene text in complex visual scenarios, with Condition Injection LoRA module and regional text perceptual loss. -
FE2E Leveraging image editing priors from diffusion models for accurate monocular depth and normal estimation. CVPR 2026
RL3DEdit An RL-based single-pass 3D scene editing framework using VGGT as geometry-aware reward model and GRPO to anchor 2D editing priors onto the 3D consistency manifold. CVPR 2026
S2-Guidance Leverages stochastic block-dropping to construct sub-networks for training-free guidance, surpassing CFG on text-to-image and text-to-video generation. ICLR 2026
EPG Advancing end-to-end pixel-space generative modeling via self-supervised pre-training, eliminating the need for a separate VAE. ICLR 2026
Omni-Effects A unified framework for prompt-guided and spatially controllable composite visual effects generation, using LoRA-MoE and spatial-aware prompts. AAAI 2026
SpatialGenEval A benchmark with 1,230 information-dense prompts and 12,300 multi-choice questions to evaluate complex spatial intelligence in text-to-image models. ICLR 2026
DCW Elucidating the SNR-t bias of diffusion probabilistic models and proposing a differential correction method to improve generation quality across various diffusion models. CVPR 2026
EMF Extending one-step image generation from class labels to text via discriminative text representation. CVPR 2026
USP Unified self-supervised pretraining via masked latent modeling in VAE space, significantly improving diffusion model convergence and generation quality. ICCV 2025

🎬 Video Generation & Understanding

Repository Description Venue
MACE-Dance A cascaded expert framework explicitly decoupling motion generation and appearance synthesis for high-quality music-driven dance video generation, with 70K-clip MA-Data dataset. SIGGRAPH 2026
Video-STAR Combines contextual sub-motion decomposition with tool-augmented reinforcement learning for open-vocabulary action recognition using GRPO with hierarchical rewards. ICLR 2026
NarrLV The first benchmark to comprehensively evaluate narrative expression capabilities of long video generation models, inspired by film narrative theory. ICLR 2026
ImagerySearch A prompt-guided adaptive test-time search strategy that dynamically adjusts search space and reward for imaginative video generation with long-distance semantic dependencies. AAAI 2026
Eevee A high-resolution dataset and benchmark for video-based virtual try-on, supporting both full-shot and close-up garment detail views. Findings of CVPR 2026
VMBench A perception-aligned video motion benchmark with human-aligned metrics achieving 35.3% improvement in Spearman's correlation over baselines. ICCV 2025
Taming-Hallucinations Introduces DualityForge, a controllable diffusion framework generating counterfactual videos for contrastive training, reducing MLLM video hallucinations by 24%. -

🌍 World Models & Interactive AI

Repository Description Venue
Code2World A VLM-based GUI world model that predicts dynamic transitions via renderable code generation, boosting Gemini-2.5-Flash by +9.5% on AndroidWorld navigation. -
DreamX-World A general-purpose world model for interactive world simulation, generating diverse, high-fidelity worlds that users can explore, control, and transform with event prompts. -
Omni-WorldBench A comprehensive benchmark specifically designed to evaluate the interactive response capabilities of world models across diverse scenarios. arXiv 2026

👁️ Multimodal & Vision-Language

Repository Description Venue
AutoDrive-R2 A vision-language-action model using rule-based RL to elicit reasoning and self-reflection for autonomous driving trajectory prediction with physics-grounded rewards. ICLR 2026
SocioReasoner A vision-language reasoning framework for urban socio-semantic segmentation that simulates human annotation via cross-modal recognition and multi-stage RL-based reasoning. ICLR 2026
UniVG-R1 Reasoning guided universal visual grounding with reinforcement learning. CVPR 2026
RealQA A 14,715-image UGC dataset with 10 fine-grained attributes for realistic image quality and aesthetic scoring; achieves SOTA on 5 public IQA/IAA benchmarks using next-token prediction. -

🗺️ Maps, Mobility & Spatial Intelligence

Repository Description Venue
Thinking-with-Map A map-augmented agent that conducts reasoning with real-world maps for geolocalization, trained via reinforcement learning. ACL 2026 Findings
MobilityBench A scalable benchmark for evaluating route-planning agents in real-world mobility scenarios. arXiv 2026

Pinned Loading

  1. FluxText FluxText Public

    Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"

    Python 448 32

  2. Tree-GRPO Tree-GRPO Public

    [ICLR 2026] Tree Search for LLM Agent Reinforcement Learning

    Python 354 34

  3. Code2World Code2World Public

    Code2World: A GUI World Model via Renderable Code Generation

    Python 318 17

  4. FE2E FE2E Public

    [CVPR 2026] Beyond Generation: Advancing Image Editing Priors for Depth and Normal Estimation

    Python 237 8

  5. RL3DEdit RL3DEdit Public

    HTML 199 8

  6. GPG GPG Public

    [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

    Python 182 5

Repositories

Showing 10 of 42 repositories

Top languages

Loading…

Most used topics

Loading…