Justin Salamon's Avatar

Justin Salamon

@justinsalamon

Head of Sound Design AI Research at Adobe. Machine learning and signal processing for audio & video. Musician. He/him. www.justinsalamon.com

351
Followers
125
Following
15
Posts
20.11.2024
Joined
Posts Following

Latest posts by Justin Salamon @justinsalamon

Preview
FLAM: Frame-Wise Language-Audio Modeling Recent multi-modal audio-language models (ALMs) excel at text-audio retrieval but struggle with frame-wise audio understanding. Prior works use temporal-aware labels or unsupervised training to improv...

To learn more please check out our ICML'25 paper: "FLAM: Frame-Wise Language-Audio Modeling"
arxiv.org/abs/2505.053...

Big congratulations to Yusong Yu, @tsirif.bsky.social and the whole team from @adobe.com research, @mit.edu and @mila-quebec.bsky.social

24.06.2025 19:30 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

FLAM is trained jointly on instance (global) and frame-wise (local) objectives.

The secret sauce: A memory-efficient and calibrated frame-wise objective with logit adjustment to address spurious correlations, such as event dependencies and label imbalances during training

24.06.2025 19:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Enter FLAM: Frame-Wise Language-Audio Modeling.

A model trained to produce a calibrated likelihood for *any* text prompt.

FLAM outperforms prior self-supervised models on both closed-set and open-set SED, while preserving strong retrieval and zero-shot classification accuracy

24.06.2025 19:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Our goal is for the model to detect *any* sound via free form text queries.

"So use CLAP", some of you will say.

The problem is its output likelihoods are not calibrated for different prompts :(

That's ok ranked retrieval, but for detection it's a no go.

24.06.2025 19:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Sound Event Detection models, ie finding sounds in audio/video recordings, are typically constrained to a predefined "closed" set of sounds, like in this (old!) model below for urban sound detection.

It has some applications, but it doesn't address general purpose sound search.

24.06.2025 19:26 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

I think we finally cracked it? FLAM can detect *any* sound via text prompts

arXiv (ICML'25): arxiv.org/abs/2505.053...
demos: flam-model.github.io

Led by Yusong Wu, with @tsirif.bsky.social Ke Chen, Cheng-Zhi Anna Huang, Aaron Courville, @urinieto.bsky.social @pseeth.bsky.social

24.06.2025 19:26 ๐Ÿ‘ 10 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

Generative Extend in Premiere Pro just won *five* awards at NAB 2025, including the NAB Show Product of the Year award! SODA, our group, created the audio GenAI model in charge of audio extensions in the feature. Couldn't be more proud of the team!
w/ @urinieto.bsky.social @pseeth.bsky.social

11.04.2025 05:16 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1
Generative Extend | Premiere Pro 2025 Updates | Adobe Video
Generative Extend | Premiere Pro 2025 Updates | Adobe Video YouTube video by Adobe Video & Motion

Generative Extend just released in Premiere Pro! Use GenAI to extend your video *and audio* clips for a perfectly timed edit.

The audio model was built by our team, the Sound Design AI (SODA) group at Adobe Research w/ @pseeth.bsky.social and @urinieto.bsky.social ๐Ÿ™Œ

www.youtube.com/watch?v=_Bv5...

02.04.2025 17:56 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 2
Post image

We didn't expect this... our Sketch2Sound demo video has gone viral on IG with more than 5.2 million views ๐Ÿคฏ

Amazing job @hugofloresgarcia.bsky.social @pseeth.bsky.social @urinieto.bsky.social

I should've done my hair...
www.instagram.com/reel/DEEBRhd...

22.02.2025 01:28 ๐Ÿ‘ 13 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
AI is changing how we study bird migration After decades of frustration, machine-learning tools are unlocking a treasure trove of acoustic data for ecologists.

Really cool to see our AI for bioacoustics work in the MIT Technology Review! BirdVox was a fantastic project that beyond posing a fascinating research problem, introduced me to the world of bird watching and deepened my appreciation of the natural world
www.technologyreview.com/2024/12/18/1...

19.12.2024 22:02 ๐Ÿ‘ 10 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

@waspaa.com is on Bluesky!
Big changes are coming for WASPAA 2025!

14.12.2024 04:03 ๐Ÿ‘ 11 ๐Ÿ” 6 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 1

Sketch2Sound is out!

Takes a text prompt + vocal (or sonic) imitation and generates sound effects that perfectly match the energy and dynamics of your voice.

It's an extremely intuitive (and fun!) way to create SFX that are perfectly timed to your video.

Led by @hugofloresgarcia.bsky.social ๐Ÿ‘

12.12.2024 17:36 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I'm hiring a Research Engineer for my team at Adobe Research.

Full details in the link in the post below, DM me if interested.

10.12.2024 19:31 ๐Ÿ‘ 11 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Here's another example of work from our group:

MultiFoley, a Video-to-Audio model that generates perfectly synced audio for video at 48 kHz and supports multimodal conditioning.

More on MultiFoley here: bsky.app/profile/czya...

09.12.2024 19:04 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿ“ข Audio AI Job opportunity at Adobe!

The Sound Design AI Group (SODA) is looking for an exceptional research engineer to join us in building the future of AI-assisted audio and video creation.

Strong ML background, GenAI experience a plus.

Details: adobe.wd5.myworkdayjobs.com/external_exp...

09.12.2024 19:00 ๐Ÿ‘ 11 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 3

New model from the SODA group at Adobe and UMich!

MultiFoley generates perfectly synced audio for video at full 48 kHz and supports multimodal conditioning.

You can define the generated sound via a text prompt, an example SFX, or audio you want to extend.

Led by our intern @czyang.bsky.social ๐Ÿ‘‡

27.11.2024 07:42 ๐Ÿ‘ 12 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0