Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual ...
From classrooms and corporate boardrooms to live concerts and weddings, audio visual (AV) technology plays a critical role in how we communicate and experience events. The integration of sound and ...
Abstract: In real-world physiological and psychological scenarios, there often exists a robust complementary correlation between audio and visual signals. Audio-Visual Event Localization (AVEL) aims ...
Abstract: Audio-Visual Speech Recognition (AVSR) is a promising approach to improving the accuracy and robustness of speech recognition systems with the assistance of visual cues in challenging ...