Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, Keer Xu, ...
PRiSM: Benchmarking Phone Realization in Speech Models
https://arxiv.org/abs/2601.14046
21.01.2026 09:30
π 0
π 1
π¬ 0
π 0
Bharadwaj, Li, Kim, Choi, Yeo, Shim, Zhou, Boldt, Jacome, Chang, Agrawal, Xu, Yang, Zhu, Watanabe, Mortensen: PRiSM: Benchmarking Phone Realization in Speech Models https://arxiv.org/abs/2601.14046 https://arxiv.org/pdf/2601.14046 https://arxiv.org/html/2601.14046
21.01.2026 06:32
π 0
π 2
π¬ 0
π 0
Can we make discrete speech units lightweightπͺΆ and streamableπ? Excited to share our new #Interspeech2025 paper: On-device Streaming Discrete Speech Units arxiv.org/abs/2506.01845 (1/n)
15.08.2025 20:44
π 1
π 1
π¬ 2
π 0
Meows, music, murmurs and more - we trained a general purpose audio encoder and open sourced the code, checkpoint and evaluation toolkit.
22.07.2025 03:36
π 3
π 0
π¬ 0
π 0
π’ We've open-sourced NatureLM-audio, the first audio-language foundation model for #bioacoustics.
Trained on large-scale animal vocalization, human speech & music datasets, the model enables zero-shot classification, detection & querying across diverse species & environments ππ½
24.04.2025 15:54
π 27
π 12
π¬ 2
π 0
π Resources for ESPnet-SDS:
π Codebase (part of ESPnet): github.com/espnet/espnet
π README & User Guide: github.com/espnet/espne...
π₯ Demo Video: www.youtube.com/watch?v=kI_D...
17.03.2025 14:29
π 1
π 1
π¬ 0
π 0
New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
π: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...
17.03.2025 14:29
π 7
π 5
π¬ 1
π 0
π New #ICLR2025 Paper Alert! π
Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? π£οΈπ
We benchmark their turn-taking abilities and uncover major gaps in conversational AI. π§΅π
π: arxiv.org/abs/2503.01174
05.03.2025 16:03
π 9
π 6
π¬ 1
π 0
Wait I thought the rock was named Dwayne Johnson
06.02.2025 13:29
π 0
π 0
π¬ 0
π 0
gpu poverty is real
28.01.2025 05:10
π 2
π 0
π¬ 1
π 0
Happy New Year
02.01.2025 23:21
π 23823
π 4475
π¬ 386
π 313
Philip Whittington, Gregor Bachmann, Tiago Pimentel
Tokenisation is NP-Complete
https://arxiv.org/abs/2412.15210
20.12.2024 05:18
π 2
π 1
π¬ 0
π 0
Today, weβre introducing NatureLM-audio: the first large audio-language model tailored for understanding animal sounds. arxiv.org/abs/2411.07186 π§΅π
05.12.2024 00:45
π 15
π 8
π¬ 2
π 4
Announcing π₯ FineWeb2: A sparkling update with 1000s of π£οΈlanguages.
We applied the same data-driven approach that led to SOTA English performance inπ· FineWeb to thousands of languages.
π₯ FineWeb2 has 8TB of compressed text data and outperforms other datasets.
08.12.2024 09:19
π 76
π 19
π¬ 1
π 0
WAVLab is up in bsky!
06.12.2024 19:15
π 8
π 2
π¬ 0
π 0
πββοΈ
30.11.2024 16:55
π 0
π 0
π¬ 0
π 0
I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA
(Self-)nominations welcome!
19.11.2024 11:13
π 82
π 34
π¬ 44
π 3
Examples from dataset, a world map surrounded by spectrograms showing animal sounds from different regions of the world
Scatter plot where points are sound data sets, x axis is number of categories in dataset and y axis is duration of dataset in hours
iNatSounds is shown as the largest dataset on both axes
iNatSounds: new dataset from folks @inaturalist.bsky.social & co-authors; looks to be one of the largest public datasets of animal sounds
openreview.net/forum?id=QCY...
github.com/visipedia/in...
#prattle π¬
#bioacoustics
29.11.2024 03:30
π 30
π 14
π¬ 1
π 5
πββοΈπ
24.11.2024 23:49
π 1
π 0
π¬ 0
π 0
πββοΈπ
24.11.2024 23:44
π 0
π 0
π¬ 0
π 0
πββοΈ
23.11.2024 00:36
π 0
π 0
π¬ 0
π 0
We're here too now! π₯³
22.11.2024 14:42
π 8
π 6
π¬ 0
π 0
Me (shikharb@bsky.social) and our lab bsky.app/profile/wavl...
22.11.2024 23:09
π 1
π 0
π¬ 0
π 0