I am Yu Zhang (张彧).
I earned my Ph.D. in the College of Computer Science and Technology, Zhejiang University (浙江大学计算机学院), under the supervision of Prof. Zhou Zhao (赵洲). Previously, I graduated from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院), with dual bachelor's degrees in Computer Science and Automation. I have also served as a visiting scholar at University of Rochester with Prof. Zhiyao Duan and University of Massachusetts Amherst with Prof. Przemyslaw Grabowicz.
My research interests primarily focus on Multi-Modal Generative AI, specifically in Spatial Audio, Music, Singing, and Speech. I have published first-author papers at top international AI conferences, such as NeurIPS, ACL, and AAAI. Currently, I am working on spatial audio generation with multimodal prompts and streaming voice conversion.
I am actively seeking research collaborations. Please feel free to contact me via email at [email protected].
- Personal Pages: https://aaronz345.github.io (updated recently🔥)
- Linkedin: www.linkedin.com/in/yuzhang34
- Google Scholar: https://scholar.google.com/citations?user=kA9A6LsAAAAJ
- DBLP: https://dblp.org/pid/50/671-126.html
*denotes co-first authors
ACMMM 2025
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.ACMMM 2025
A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference, Changhao Pan*, Wenxiang Guo*, Yu Zhang*, et al.
Preprint
Versatile Framework for Song Generation with Prompt-based Control, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.
ACL 2025
TCSinger 2: Customizable Multilingual Singing Voice Synthesis, Yu Zhang, Wenxiang Guo, Changhao Pan, et al.EMNLP 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control, Yu Zhang, Ziyue Jiang, Ruiqi Li, et al.NeurIPS 2024 Spotlight
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks, Yu Zhang, Changhao Pan, Wenxinag Guo, et al.AAAI 2024
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis, Yu Zhang, Rongjie Huang, Ruiqi Li, et al.ACL 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation, Wenxiang Guo*, Yu Zhang*, Changhao Pan*, et al.