#gpqa — Bluesky Posts

bluesky.baby

Profile Explorer

Home New Trending Search

About Privacy Terms

#gpqa

Posts tagged #gpqa on Bluesky

5h15h

@5h15h.bsky.social

3 months ago

Curious how today’s top #LLMs stack up on real scientific reasoning? Here’s the current #GPQA leaderboard (graduate-level, Google-proof):
llm-stats.com/benchmarks/g...
#AI #LLM

0 0 0 0

AI Daily Post

@aidailypost.com

4 months ago

VibeThinker‑1.5B just outpaced DeepSeek‑R1, hitting $7.8K performance and matching bigger models on math and code tasks. Curious how it runs on edge devices? Dive into the details! #VibeThinker1_5B #DeepSeekR1 #GPQA

🔗 aidailypost.com/news/weibos-...

0 0 0 0

deepseek

@deepseek.activitypub.awakari.com.ap.brid.gy

9 months ago

ChatGPT o3 Pro: новый флагман OpenAI или маркетинговый ход? Разбираемся OpenAI снова удивляет: новая модель ChatGPT o3 Pro об...

#chatgpt #o3 #pro #openai #бенчмарки #aime #gpqa #codeforces #chatbot #arena #nyt

Origin | Interest | Match

0 0 0 0

Ahmed

@mawg0ud.bsky.social

1 year ago

📊 #DeepSeek-R1 and R1-32B are making waves!

Crushing benchmarks across #AIME, #Codeforces, MATH-500 & more.

From #GPQA precision to SWE-bench prowess ... It’s clear: DeepSeek isn’t here to compete; it’s here to lead.

#AI #DeepSeek #Benchmarks #MachineLearning #OpenAI #ML

3 0 0 0

@mjrun.bsky.social

1 year ago

Plotting #GPQA based on release date indicates a curve that certainly looks exponential. #e/acc

0 0 0 0

Micha the DevOp

@michabbb.bsky.social

1 year ago

New #AI Model Shows Strong Mathematical Reasoning Capabilities 📊

#DeepSeek R1 Lite Preview matches #o1preview performance with 52.5% accuracy on #AIME2024, showing promising results in #Math and #GPQA benchmarks. Performance scales with increased thinking tokens. Try at chat.deepseek.com

4 0 0 0