Home

2026 Openings:

Announcements

We are looking for motivated students in speech processing, multimodal learning, and computer vision.
Please refer to this page for more information.

About MINT Lab

In our daily lives, humans naturally interact with the world by seeing and hearing. At MINT Lab, we aim to bridge the gap between humans and AI by developing technologies that enable natural communication. Our research focuses on speech processing and multimodal AI, moving toward a future where AI can fully engage with human environments just as we do.

Research Highlights

Speech Synthesis

Kim et al., "CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation," IEEE Trans. Audio Speech Lang. Process., 2025

Voice Conversion

Kim et al., "AdaptVC: High Quality Voice Conversion with Adaptive Learning," Proc. ICASSP, 2025

Video-to-Speech

Kim et al., "From Faces to Voices: Learning Hierarchical Representations for High-Quality Video-to-Speech," Proc. CVPR (Highlight), 2025

Audio-Visual Speech Enhancement

Jung et al., "FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching," Proc. Interspeech, 2024

Talking Face Generation

Jang et al., "Faces that Speak: Jointly Synthesising Talking Face and Speech from Text," Proc. CVPR, 2024

Interactive Head Generation

Kim et al., "TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation," arXiv preprint, 2025

News

[May 2026] 1 paper has been accepted to ICML 2026
[Mar. 2026] MINT Lab @ Chung-Ang University opens!

Page updated

Google Sites

Report abuse