#macrobench — Bluesky Posts — bluesky.baby

Profile Explorer

Home New Trending Search

About Privacy Terms

#

#macrobench

Posts tagged #macrobench on Bluesky

@getnews-me.bsky.social

5 months ago

MacroBench: New Benchmark for LLM‑Driven Web Automation Scripts

MacroBench: New Benchmark for LLM‑Driven Web Automation Scripts

MacroBench, a new benchmark covering 681 web‑automation tasks, reports GPT‑4o‑Mini reaching a 96.8 % success rate while all models scored 0 % on complex multi‑step workflows. Read more: getnews.me/macrobench-new-benchmark... #macrobench #webautomation #llm

0 0 0 0