arXiv cs.SD Sound's Avatar

arXiv cs.SD Sound

@cssd-bot

Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.SD/new List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g

33
Followers
1
Following
5,290
Posts
16.02.2025
Joined
Posts Following

Latest posts by arXiv cs.SD Sound @cssd-bot

Qiu, Chen, Yang, Zhu, Liu, Tan, Zhao, Murthy, Ram, Prabhakar, Heinecke, Xiong, Savarese, Wang: Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial https://arxiv.org/abs/2603.05413 https://arxiv.org/pdf/2603.05413 https://arxiv.org/html/2603.05413

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Junchuan Zhao, Minh Duc Vu, Ye Wang: Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection https://arxiv.org/abs/2603.05373 https://arxiv.org/pdf/2603.05373 https://arxiv.org/html/2603.05373

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Yen-Shan Chen, Shih-Yu Lai, Ying-Jung Tsou, Yi-Cheng Lin, Bing-Yu Chen, Yun-Nung Chen, Hung-Yi Lee, Shang-Tse Chen: Latent-Mark: An Audio Watermark Robust to Neural Resynthesis https://arxiv.org/abs/2603.05310 https://arxiv.org/pdf/2603.05310 https://arxiv.org/html/2603.05310

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Seokhoon Moon, Kyudan Jung, Jaegul Choo: SLICE: Speech Enhancement via Layer-wise Injection of Conditioning Embeddings https://arxiv.org/abs/2603.05302 https://arxiv.org/pdf/2603.05302 https://arxiv.org/html/2603.05302

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Linghan Fang, Tianxin Xie, Li Liu: Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards https://arxiv.org/abs/2603.05231 https://arxiv.org/pdf/2603.05231 https://arxiv.org/html/2603.05231

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Xie, Chung, Lin, Lu, Ren, Chen, Lee: TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling https://arxiv.org/abs/2603.05094 https://arxiv.org/pdf/2603.05094 https://arxiv.org/html/2603.05094

06.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi: Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction https://arxiv.org/abs/2603.04943 https://arxiv.org/pdf/2603.04943 https://arxiv.org/html/2603.04943

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang: The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights https://arxiv.org/abs/2603.04865 https://arxiv.org/pdf/2603.04865 https://arxiv.org/html/2603.04865

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Han Yin, Yang Xiao, Younghoo Kwon, Ting Dang, Jung-Woo Choi: Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models https://arxiv.org/abs/2603.04862 https://arxiv.org/pdf/2603.04862 https://arxiv.org/html/2603.04862

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Aurchi Chowdhury, Rubaiyat -E-Zaman, Sk. Ashrafuzzaman Nafees: WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech https://arxiv.org/abs/2603.04809 https://arxiv.org/pdf/2603.04809 https://arxiv.org/html/2603.04809

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Akif Islam, Raufun Nahar, Md. Ekramul Hamid: When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper https://arxiv.org/abs/2603.04710 https://arxiv.org/pdf/2603.04710 https://arxiv.org/html/2603.04710

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

[2026-03-06 Fri (UTC), 11 new articles found for csSD Sound]

06.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Novack, Zukowski, Carr, Parker, Evans, Taylor, Berg-Kirkpatrick, McAuley, Pons: Low-Resource Guidance for Controllable Latent Audio Diffusion https://arxiv.org/abs/2603.04366 https://arxiv.org/pdf/2603.04366 https://arxiv.org/html/2603.04366

05.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Ioannis Prokopiou, Ioannis Sina, Agisilaos Kounelis, Pantelis Vikatos, Themos Stafylakis: LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance https://arxiv.org/abs/2603.04293 https://arxiv.org/pdf/2603.04293 https://arxiv.org/html/2603.04293

05.03.2026 06:35 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim: ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis https://arxiv.org/abs/2603.04219 https://arxiv.org/pdf/2603.04219 https://arxiv.org/html/2603.04219

05.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Nikita Kuznetsov, Maksim Kaledin: FastWave: Optimized Diffusion Model for Audio Super-Resolution https://arxiv.org/abs/2603.04122 https://arxiv.org/pdf/2603.04122 https://arxiv.org/html/2603.04122

05.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Tobias Morocutti, Emmanouil Karystinaios, Jonathan Greif, Gerhard Widmer: Multi-Stage Music Source Restoration with BandSplit-RoFormer Separation and HiFi++ GAN https://arxiv.org/abs/2603.04032 https://arxiv.org/pdf/2603.04032 https://arxiv.org/html/2603.04032

05.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Taehan Lee, Jaehan Jung, Hyukjun Lee: A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs https://arxiv.org/abs/2603.03855 https://arxiv.org/pdf/2603.03855 https://arxiv.org/html/2603.03855

05.03.2026 06:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, Ming Li: Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement https://arxiv.org/abs/2603.03811 https://arxiv.org/pdf/2603.03811 https://arxiv.org/html/2603.03811

05.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Swapnil Parekh: ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition https://arxiv.org/abs/2603.03359 https://arxiv.org/pdf/2603.03359 https://arxiv.org/html/2603.03359

05.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

[2026-03-05 Thu (UTC), 8 new articles found for csSD Sound]

05.03.2026 06:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Epshita Jahan, Khandoker Md Tanjinul Islam, Pritom Biswas, Tafsir Al Nafin: An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization https://arxiv.org/abs/2603.03158 https://arxiv.org/pdf/2603.03158 https://arxiv.org/html/2603.03158

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Riccardo Rota, Kiril Ratmanski, Jozef Coldenhoff, Milos Cernak: Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising https://arxiv.org/abs/2603.02794 https://arxiv.org/pdf/2603.02794 https://arxiv.org/html/2603.02794

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Mathuranathan Mayuravaani, W. Bastiaan Kleijn, Andrew Lensen, Charlotte S{\o}rensen: Single Microphone Own Voice Detection based on Simulated Transfer Functions for Hearing Aids https://arxiv.org/abs/2603.02724 https://arxiv.org/pdf/2603.02724 https://arxiv.org/html/2603.02724

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Fu, Chao, Yang, Huang, Zezario, Nasretdinov, Juki\'c, Tsao, Wang: Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement https://arxiv.org/abs/2603.02641 https://arxiv.org/pdf/2603.02641 https://arxiv.org/html/2603.02641

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Kirill Borodin, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Grach Mkrtchian: When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus https://arxiv.org/abs/2603.02364 https://arxiv.org/pdf/2603.02364 https://arxiv.org/html/2603.02364

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Zijian Yang, J\"org Barkoczi, Ralf Schl\"uter, Hermann Ney: Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study https://arxiv.org/abs/2603.02285 https://arxiv.org/pdf/2603.02285 https://arxiv.org/html/2603.02285

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Mao, Ma, Chen, Zhu, Ge, Hao, Zhao, Huo, Yang, Chang, Liu, Wang, He, Xiao, Zhu: When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning https://arxiv.org/abs/2603.02266 https://arxiv.org/pdf/2603.02266 https://arxiv.org/html/2603.02266

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Li Songyi, Zheng Linze, Liang Jinghua, Zhang Zifeng: MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection https://arxiv.org/abs/2603.02255 https://arxiv.org/pdf/2603.02255 https://arxiv.org/html/2603.02255

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Liang Jinghua, Zhang Zifeng, Li Songyi, Zheng Linze: MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification https://arxiv.org/abs/2603.02254 https://arxiv.org/pdf/2603.02254 https://arxiv.org/html/2603.02254

04.03.2026 06:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0