Jonathan Bragg's Avatar

Jonathan Bragg

@jbragg

Leading agents R&D at AI2. AI & HCI research scientist. https://jonathanbragg.com

258
Followers
24
Following
7
Posts
05.10.2023
Joined
Posts Following

Latest posts by Jonathan Bragg @jbragg

Brooke Vlahos, Peter Clark, Doug Downey, @yoavgo.bsky.social Ashish Sabharwal, Daniel S. Weld

06.11.2025 17:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu @guywiener.bsky.social Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka...

06.11.2025 17:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿ™ Many thanks to my @ai2.bsky.social teammatesโ€”Mike Dโ€™Arcy @nbalepur.bsky.social Dan Bareket, Bhavana Dalvi @sergeyf.bsky.social Dany Haddad, Jena D. Hwang, @peterjansen-ai.bsky.social Varsha Kishore, Bodhisattwa Majumder @arnaik19.bsky.social Sigal Rahamimov, Kyle Richardson...

06.11.2025 17:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - allenai/agent-baselines Contribute to allenai/agent-baselines development by creating an account on GitHub.

We tested 22 agent classesโ€”more *kinds* than other benchmarks

๐Ÿค–AgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines

๐Ÿ“šBlog: allenai.org/blog/astabench
๐Ÿ“„Paper: arxiv.org/abs/2510.21652
๐Ÿ“ŠLeaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard

06.11.2025 17:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿ› ๏ธAstaBench is the first to provide reproducible (date-limited) large-scale search toolsโ€”plus a full scientific research environment for agents.

๐Ÿ“ŠOur leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)

06.11.2025 17:01 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
AstaBench with abstract measurement icons

AstaBench with abstract measurement icons

Agent benchmarks don't measure true *AI* advances

We built one that's hard & trustworthy:
๐Ÿ‘‰ AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
๐Ÿ‘‰ SOTA results across 22 agent *classes*
๐Ÿ‘‰ AgentBaselines agents suite

๐Ÿ†• arxiv.org/abs/2510.21652

๐Ÿงต๐Ÿ‘‡

06.11.2025 17:01 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

@kylelo.bsky.social your gifs are an unapproved manipulation of my human attention

09.10.2025 21:06 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0