Great to see BALROG on @bsky.app as well!
25.11.2024 15:00
๐ 1
๐ 0
๐ฌ 0
๐ 0
Great to see BALROG on @bsky.app as well!
Tired of saturated benchmarks? Want scope for a significant leap in capabilities?
๐ฅ Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!
BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.
1/๐งต