UCSC ERIC Lab

All

29 repositories

MMIR
Public
"Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"
Python
•0•7•0•0•Updated Feb 25, 2025Feb 25, 2025
MSSBench
Public
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
safety ai-agents situational-awareness ai-assistant large-language-models multimodal-large-language-models
Python
•
MIT License
•1•12•1•0•Updated Feb 24, 2025Feb 24, 2025
ProbMed
Public
"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
evaluation vision-and-language medical-vqa medical-diagnosis llms large-multimodal-models
Python
•1•15•1•0•Updated Feb 21, 2025Feb 21, 2025
Mojito
Public
Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""
motion-control video-generation diffusion-models controllable-generation text-to-video-generation
0•3•0•0•Updated Feb 10, 2025Feb 10, 2025
MiniGPT-5
Public
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
transformers diffusion-models multimodal-generation multimodal-llm
Python
•
Apache License 2.0
•52•864•6•0•Updated Dec 12, 2024Dec 12, 2024
Aerial-Vision-and-Dialog-Navigation
Public
Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
navigation aerial-imagery drone-navigation vision-and-language vln
Python
•6•50•3•0•Updated Nov 4, 2024Nov 4, 2024
edit-room.github.io
Public
JavaScript
•0•0•1•0•Updated Oct 18, 2024Oct 18, 2024
llm_coordination
Public
Code repository for the NAACL 2025 paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
multiagent llms coordination-game agent-coordination
Python
•
MIT License
•2•29•0•0•Updated Oct 13, 2024Oct 13, 2024
swap-anything
Public
Official implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"
image-editing personalization diffusion-models subject-driven-generation photoswapping swap-anything
Python
•
MIT License
•10•243•4•0•Updated Oct 10, 2024Oct 10, 2024
MMWorld
Public
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
evaluation video-understanding video-dataset multi-disciplinary multimodal-large-language-models world-model
Python
•
MIT License
•1•25•0•0•Updated Sep 21, 2024Sep 21, 2024
ComCLIP
Public
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
causality clip svo slip vision-and-language compositionality flickr8k-dataset image-text-matching flickr30k image-text-retrieval
Python
•
MIT License
•3•35•0•1•Updated Aug 18, 2024Aug 18, 2024
Screen-Point-and-Read
Public
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
screen-reader ai-agents grounding gui-agents tree-of-lens layout-understanding
Python
•2•24•0•0•Updated Jul 31, 2024Jul 31, 2024
via-video
Public
0•24•1•0•Updated Jun 20, 2024Jun 20, 2024
R2H
Public
Official implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"
helper navigation dialogue multimodal embodied-agent response-generation ai-agent
Python
•1•4•0•0•Updated Jun 19, 2024Jun 19, 2024
ViCor
Public
This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
0•3•0•0•Updated Jun 11, 2024Jun 11, 2024
awesome-vision-language-navigation
Public
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
navigation vision-and-language embodied-agent vision-and-language-navigation
MIT License
•23•448•1•0•Updated May 2, 2024May 2, 2024
Discffusion
Public
Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
vision-and-language few-shot-learning discriminative-learning diffusion-models
Python
•
MIT License
•3•28•1•0•Updated Apr 27, 2024Apr 27, 2024
MultipanelVQA
Public
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
vqa vlm mllm screen-ai multipanel-understanding
Jupyter Notebook
•
MIT License
•0•7•0•0•Updated Apr 11, 2024Apr 11, 2024
Naivgation-as-wish
Public
Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
robustness attack-defense federated-learning embodied-agent vision-and-language-navigation
Python
•
MIT License
•0•5•0•0•Updated Apr 10, 2024Apr 10, 2024
minigpt-5.github.io
Public
JavaScript
•1•0•0•0•Updated Apr 3, 2024Apr 3, 2024
photoswap
Public
Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"
image-editing personalization diffusion-models generative-ai photoswap
Jupyter Notebook
•
MIT License
•23•349•5•0•Updated Feb 28, 2024Feb 28, 2024
PECTVLM
Public
Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"
Smalltalk
•
MIT License
•0•7•0•0•Updated Oct 17, 2023Oct 17, 2023
T2IAT
Public
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Python
•
MIT License
•0•7•0•0•Updated Aug 15, 2023Aug 15, 2023
PEViT
Public
Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"
pytorch image-classification fine-tuning vision-transformer parameter-efficient-tuning
Python
•
MIT License
•5•104•9•0•Updated Aug 7, 2023Aug 7, 2023
VLMbench
Public
NeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"
language-grounding vision-and-language robotic-manipulation compositionality embodied-ai
Python
•
MIT License
•9•90•5•0•Updated Mar 5, 2023Mar 5, 2023
Mitigate-Gender-Bias-in-Image-Search
Public
Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
image-search multimodality gender-bias fairness-ml vision-language
Python
•
MIT License
•1•11•2•0•Updated Feb 6, 2023Feb 6, 2023
CPL
Public
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
vqa image-classification causal-inference vision-and-language image-text-retrieval counterfactual-reasoning prompt-tuning
Python
•
MIT License
•5•33•6•0•Updated Dec 5, 2022Dec 5, 2022
ACLToolBox
Public
Python
•
MIT License
•1•8•0•0•Updated Nov 15, 2022Nov 15, 2022
FedVLN
Public
[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"
federated-learning privacy-preserving-machine-learning vision-and-language-navigation
C++
•
MIT License
•2•14•0•0•Updated Oct 8, 2022Oct 8, 2022