Jonas's Avatar

Jonas

@jonasgeiping

ML research, safety & efficiency

286
Followers
145
Following
11
Posts
28.05.2023
Joined
Posts Following

Latest posts by Jonas @jonasgeiping

Finally, this project was made possible by the INCITE program of the DoE, who sponsored our compute on the OLCF Frontier supercomputer. Without them, we could not have done open research at this scale!

10.02.2025 16:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
jwkirchenbauer.bsky.social PhD Student at University of Maryland, advised by @tomgoldstein.bsky.social. jwkirchenbauer.notion.site

Thank you to all of my collaborators, @sean-mcleish.bsky.social , Neel Jain, jwkirchenbauer.bsky.social, Siddharth Singh, Brian Bartoldson, Bhavya Kailkhura, Abhinav Bhatele and especially Tom Goldstein, for doing this.

This really was a long project for us, with initial starts in Summer '23!

10.02.2025 16:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
tomg-group-umd/huginn-0125 Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

You can find the model here: huggingface.co/tomg-group-u...
The code here: github.com/seal-rg/recu...
and the tech report here: www.arxiv.org/abs/2502.05171

10.02.2025 16:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

What is it doing when it thinks longer?

We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure

So, this model really is rotating shapes in a high-dimensional space?

10.02.2025 16:47 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
Post image

What is pretty exciting is that simply by training with our arch and objective, a separation emerges from scale - the model's latents converge quicker for some tokens in a sentence than others,

In this figure the model takes more time to think about the key parts of the text:

10.02.2025 16:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We had enough compute for only a single shot to train at scale (and that is the model we've published).

On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...

10.02.2025 16:47 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

First, the model (with 3.5B params), even though trained semi-optimally, and for 800B tokens, is competive with 7B open-source models trained for 2-3T tokens (OLMo-v1) - but we can't beat the new OLMo data recipe (yet)

This is pretty exciting, for our first large-scale run

10.02.2025 16:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

has something for everyone, new model architecture, optimizer details, AMD training (we trained on 4096 AMD GPUs), our data pipeline, and lots of analysis!

Here are a few of my highlights:

10.02.2025 16:47 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Ok, so I can finally talk about this!

We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale.

The model has an internal latent space in which it can adaptively spend more compute to think longer.

I think the tech report ...πŸ¦β€β¬›

10.02.2025 16:47 πŸ‘ 23 πŸ” 7 πŸ’¬ 1 πŸ“Œ 1
Post image

New open source reasoning model!

Huginn-3.5B reasons implicitly in latent space 🧠

Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time.

We trained on 800B tokens πŸ‘‡

10.02.2025 15:58 πŸ‘ 12 πŸ” 4 πŸ’¬ 1 πŸ“Œ 2
Post image

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach πŸš€πŸš€πŸš€

arxiv.org/abs/2502.05171

10.02.2025 07:14 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
Principal Investigators (m/f/d) as Hector Endowed Fellows of the ELLIS Institute TΓΌbingen

institute-tue.ellis.eu/en/jobs/PI2025

11.12.2024 01:36 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Principal Investigators (m/f/d) as Hector Endowed Fellows of the ELLIS Institute TΓΌbingen

I'm at NeurIPS in Vancouver right now! Feel free to reach out to talk about anything in LLM safety or efficiency research.

Also, our new ELLIS institute TΓΌbingen is hiring new faculty, the deadline is next week - reach out to us in person and at our booth for more info πŸ‡ͺπŸ‡ΊπŸ‡ͺπŸ‡ΊπŸ‡ͺπŸ‡Ί

11.12.2024 01:36 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0