Change the repository type filter
All
Repositories list
29 repositories
MMIR
PublicMSSBench
Public[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"ProbMed
Public"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"Mojito
PublicOfficial repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""MiniGPT-5
PublicOfficial implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
edit-room.github.io
Publicllm_coordination
PublicCode repository for the NAACL 2025 paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"swap-anything
PublicOfficial implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"MMWorld
PublicOfficial repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"ComCLIP
PublicOfficial implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"Screen-Point-and-Read
PublicCode repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"via-video
PublicR2H
PublicOfficial implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"ViCor
Public- A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Discffusion
PublicOfficial repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"MultipanelVQA
PublicNaivgation-as-wish
PublicOfficial implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"minigpt-5.github.io
Publicphotoswap
PublicOfficial implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"PECTVLM
PublicT2IAT
PublicPEViT
PublicOfficial implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"VLMbench
PublicNeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
CPL
PublicOfficial implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"ACLToolBox
PublicFedVLN
Public[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"