![]() |
Cheng Shi (石骋)PhD Student
Department of Computer Science |
I am currently a first-year Ph.D. student at Department of Computer Science, The University of Hong Kong, where I have the privilege of being supervised by Prof. Yizhou Yu (ACM/IEEE Fellow). Previously, I obtained my Master's degree from ShanghaiTech University in 2024, where I was advised by Prof. Sibei Yang, and my Bachelor's degree from ShanghaiTech University in 2022.
My research interests lie at the intersection of computer vision, natural language processing, and multimodal AI. My current research focuses on open-world visual perception and vision foundation models.
* denotes equal contribution and † corresponding author
Vision Transformer Needs More Than Register
In submission, 2025
|
|
Vision Function Layer in Multimodal LLMs
NeurIPS 2025
|
|
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
NeurIPS 2025
|
|
Discovering Compositional Hallucination in LVLMs
NeurIPS 2025
|
|
![]() |
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
ECCV 2024
|
![]() |
Plain-DNet: A Plain Multi-Dataset Object Detector
ECCV 2024
|
![]() |
Zip-Your-CLIP: CLIP Itself is a Good Object-detector
ICLR 2024
|
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
NeurIPS 2023
|
|
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
ICCV 2023
|
|
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
ICCV 2023
|
|
Contrastive Grouping with Transformer for Referring Image Segmentation
CVPR 2023
|
|
DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance
SIGGRAPH 2023
|
|
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding
ECCV 2022
|