Advancing multimodal reasoning through structured chain-of-thought inference, cross-modal alignment, and knowledge-aware modeling, enabling evidence-grounded, interpretable, and reliable reasoning across vision, language, and structured knowledge.
Fine-grained human-centric perception for embodied intelligence, including 2D/3D pose estimation, 3D/4D human reconstruction, and anatomical understanding, enabling human–robot interaction and medical embodied applications.
Embodied intelligence for medical applications, enabling active perception, robotic control, and medical imaging understanding for autonomous diagnosis and decision-making, with applications such as ultrasound robotics.