Skip to content
View clownrat6's full-sized avatar
🤡
A holistic joke
🤡
A holistic joke

Highlights

  • Pro

Organizations

@DAMO-NLP-SG

Block or report clownrat6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
clownrat6/README.md

Hi there 👋

我是北京大学计算机科学与技术专业直博四年级学生 (预计 2026 年毕业),本科毕业于华南理工大学电子与信息学院(2021 届)。

人生格言: 知行合一,格物致知;志存高远,脚踏实地。

📌 主要研究方向

我的研究方向主要集中在 "多模态大模型与图像/视频理解" 领域,具体包括:

  • 多模态大模型 (视频理解), 包括:
    • 泛视频理解: Qwen2.5-VL core contributor
    • 音视频理解: VideoLLaMA2; CMM
    • 流视频理解: VideoLLaMA3
    • 长视频理解: Inf-CL (CVPR 2025 Highlight)
    • 细粒度视频理解: VideoRefer (CVPR 2025)
  • 图像/视频分割,包括:
    • 弱监督分割:  OCR (CVPR 2023)
    • 视频实例分割: TAR (ICCV 2025)
    • 多模态分割:  WiCo (IJCAI 2023, Neurocomputing 2024); PVD (AAAI 2024); BriVIS (AAAI 2025)
    • 医学图像分割: Fused U-Net (Medical Physics 2021)

📈 学术成果

目前已发表论文 20+ 篇,总 Google Scholar 引用量为 Citations

所参与开源项目获得广泛关注,代表性项目的 GitHub Star 数如下:

VideoLLaMA2 Stars VideoLLaMA3 Stars Inf-CL Stars CMM Stars VideoRefer Stars

💬 联系方式

如果您对我的研究感兴趣,欢迎联系交流合作或提供实习 / 全职机会 🙏🙏。这是我的联系邮箱: [email protected]

📎 Homepages

🔥 News

  • 2021.03: I join Sensetime as a research intern in shenzhen for developing MMSegmentation toolkit.

💻 Selected Research Papers

My full paper list is shown at my personal homepage.

Pinned Loading

  1. DAMO-NLP-SG/VideoLLaMA2 DAMO-NLP-SG/VideoLLaMA2 Public

    VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    Python 1.2k 81

  2. DAMO-NLP-SG/VideoLLaMA3 DAMO-NLP-SG/VideoLLaMA3 Public

    Frontier Multimodal Foundation Models for Image and Video Understanding

    Jupyter Notebook 897 67

  3. DAMO-NLP-SG/Inf-CLIP DAMO-NLP-SG/Inf-CLIP Public

    [CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training sc…

    Python 260 11

  4. DAMO-NLP-SG/CMM DAMO-NLP-SG/CMM Public

    ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

    Python 46 2

  5. OpenVIS OpenVIS Public

    [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.

    Python 23

  6. mmsegmentation mmsegmentation Public

    Forked from open-mmlab/mmsegmentation

    OpenMMLab Semantic Segmentation Toolbox and Benchmark.

    Python 4 2