Qiu, Chen, Yang, Zhu, Liu, Tan, Zhao, Murthy, Ram, Prabhakar, Heinecke, Xiong, Savarese, Wang: Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial https://arxiv.org/abs/2603.05413 https://arxiv.org/pdf/2603.05413 https://arxiv.org/html/2603.05413
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Junchuan Zhao, Minh Duc Vu, Ye Wang: Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection https://arxiv.org/abs/2603.05373 https://arxiv.org/pdf/2603.05373 https://arxiv.org/html/2603.05373
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Yen-Shan Chen, Shih-Yu Lai, Ying-Jung Tsou, Yi-Cheng Lin, Bing-Yu Chen, Yun-Nung Chen, Hung-Yi Lee, Shang-Tse Chen: Latent-Mark: An Audio Watermark Robust to Neural Resynthesis https://arxiv.org/abs/2603.05310 https://arxiv.org/pdf/2603.05310 https://arxiv.org/html/2603.05310
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Seokhoon Moon, Kyudan Jung, Jaegul Choo: SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings https://arxiv.org/abs/2603.05302 https://arxiv.org/pdf/2603.05302 https://arxiv.org/html/2603.05302
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Linghan Fang, Tianxin Xie, Li Liu: Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards https://arxiv.org/abs/2603.05231 https://arxiv.org/pdf/2603.05231 https://arxiv.org/html/2603.05231
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Xie, Chung, Lin, Lu, Ren, Chen, Lee: TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling https://arxiv.org/abs/2603.05094 https://arxiv.org/pdf/2603.05094 https://arxiv.org/html/2603.05094
06.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi: Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction https://arxiv.org/abs/2603.04943 https://arxiv.org/pdf/2603.04943 https://arxiv.org/html/2603.04943
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang: The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights https://arxiv.org/abs/2603.04865 https://arxiv.org/pdf/2603.04865 https://arxiv.org/html/2603.04865
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Han Yin, Yang Xiao, Younghoo Kwon, Ting Dang, Jung-Woo Choi: Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models https://arxiv.org/abs/2603.04862 https://arxiv.org/pdf/2603.04862 https://arxiv.org/html/2603.04862
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Aurchi Chowdhury, Rubaiyat -E-Zaman, Sk. Ashrafuzzaman Nafees: WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech https://arxiv.org/abs/2603.04809 https://arxiv.org/pdf/2603.04809 https://arxiv.org/html/2603.04809
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Akif Islam, Raufun Nahar, Md. Ekramul Hamid: When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper https://arxiv.org/abs/2603.04710 https://arxiv.org/pdf/2603.04710 https://arxiv.org/html/2603.04710
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
[2026-03-06 Fri (UTC), 11 new articles found for csSD Sound]
06.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Novack, Zukowski, Carr, Parker, Evans, Taylor, Berg-Kirkpatrick, McAuley, Pons: Low-Resource Guidance for Controllable Latent Audio Diffusion https://arxiv.org/abs/2603.04366 https://arxiv.org/pdf/2603.04366 https://arxiv.org/html/2603.04366
05.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Ioannis Prokopiou, Ioannis Sina, Agisilaos Kounelis, Pantelis Vikatos, Themos Stafylakis: LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance https://arxiv.org/abs/2603.04293 https://arxiv.org/pdf/2603.04293 https://arxiv.org/html/2603.04293
05.03.2026 06:35
๐ 1
๐ 0
๐ฌ 0
๐ 0
Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim: ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis https://arxiv.org/abs/2603.04219 https://arxiv.org/pdf/2603.04219 https://arxiv.org/html/2603.04219
05.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Nikita Kuznetsov, Maksim Kaledin: FastWave: Optimized Diffusion Model for Audio Super-Resolution https://arxiv.org/abs/2603.04122 https://arxiv.org/pdf/2603.04122 https://arxiv.org/html/2603.04122
05.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Tobias Morocutti, Emmanouil Karystinaios, Jonathan Greif, Gerhard Widmer: Multi-Stage Music Source Restoration with BandSplit-RoFormer Separation and HiFi++ GAN https://arxiv.org/abs/2603.04032 https://arxiv.org/pdf/2603.04032 https://arxiv.org/html/2603.04032
05.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Taehan Lee, Jaehan Jung, Hyukjun Lee: A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs https://arxiv.org/abs/2603.03855 https://arxiv.org/pdf/2603.03855 https://arxiv.org/html/2603.03855
05.03.2026 06:35
๐ 0
๐ 0
๐ฌ 0
๐ 0
Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, Ming Li: Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement https://arxiv.org/abs/2603.03811 https://arxiv.org/pdf/2603.03811 https://arxiv.org/html/2603.03811
05.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Swapnil Parekh: ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition https://arxiv.org/abs/2603.03359 https://arxiv.org/pdf/2603.03359 https://arxiv.org/html/2603.03359
05.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
[2026-03-05 Thu (UTC), 8 new articles found for csSD Sound]
05.03.2026 06:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
Epshita Jahan, Khandoker Md Tanjinul Islam, Pritom Biswas, Tafsir Al Nafin: An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization https://arxiv.org/abs/2603.03158 https://arxiv.org/pdf/2603.03158 https://arxiv.org/html/2603.03158
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Riccardo Rota, Kiril Ratmanski, Jozef Coldenhoff, Milos Cernak: Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising https://arxiv.org/abs/2603.02794 https://arxiv.org/pdf/2603.02794 https://arxiv.org/html/2603.02794
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Mathuranathan Mayuravaani, W. Bastiaan Kleijn, Andrew Lensen, Charlotte S{\o}rensen: Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids https://arxiv.org/abs/2603.02724 https://arxiv.org/pdf/2603.02724 https://arxiv.org/html/2603.02724
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Fu, Chao, Yang, Huang, Zezario, Nasretdinov, Juki\'c, Tsao, Wang: Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement https://arxiv.org/abs/2603.02641 https://arxiv.org/pdf/2603.02641 https://arxiv.org/html/2603.02641
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Kirill Borodin, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian: When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus https://arxiv.org/abs/2603.02364 https://arxiv.org/pdf/2603.02364 https://arxiv.org/html/2603.02364
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Zijian Yang, J\"org Barkoczi, Ralf Schl\"uter, Hermann Ney: Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study https://arxiv.org/abs/2603.02285 https://arxiv.org/pdf/2603.02285 https://arxiv.org/html/2603.02285
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Mao, Ma, Chen, Zhu, Ge, Hao, Zhao, Huo, Yang, Chang, Liu, Wang, He, Xiao, Zhu: When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning https://arxiv.org/abs/2603.02266 https://arxiv.org/pdf/2603.02266 https://arxiv.org/html/2603.02266
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Li Songyi, Zheng Linze, Liang Jinghua, Zhang Zifeng: MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection https://arxiv.org/abs/2603.02255 https://arxiv.org/pdf/2603.02255 https://arxiv.org/html/2603.02255
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0
Liang Jinghua, Zhang Zifeng, Li Songyi, Zheng Linze: MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification https://arxiv.org/abs/2603.02254 https://arxiv.org/pdf/2603.02254 https://arxiv.org/html/2603.02254
04.03.2026 06:36
๐ 0
๐ 0
๐ฌ 0
๐ 0