#Benchmarking — Bluesky Posts

@dhutchinson.bsky.social

1 day ago

Many thanks to the editors of @up_johd and the peer reviewers for everything that went into bringing this article to the finish line! 8/8

#digialhumanities #llm #benchmarking #AI #digitalhistory

0 1 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 day ago

Awakari App

Safety Evals: 12 Questions Before You Trust the Pass Rate A sharper way to read AI safety evaluation results before a reassuring percentage turns into false confidence. Continue reading on Medium »

#llm-evaluation #ai-safety #mlops #benchmarking #machine-learning

Origin | Interest | Match

0 0 0 0

Thilo Muth

@drmuth.bsky.social

4 days ago

🔬 New benchmarking study for the proteomics community!
From variability to consensus: PSM rescoring harmonizes peptide identification across search engines and datasets.
Preprint:
doi.org/10.64898/202...

#TeamMassSpec #Proteomics #MassSpectrometry #OpenScience #Benchmarking

2 1 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

1 week ago

There are no Champions in Supervised Long-Term Time Series Forecasting

Lorenzo Brigato, Rafael Morand, Knut Joar Strømmen et al.

Action editor: Devendra Dhami

https://openreview.net/forum?id=yO1JuBpTBB

#benchmarking #forecasting #benchmark

0 0 0 0

CESGA-HPC

@cesga-hpc.bsky.social

1 week ago

Evaluating the performance of quantum devices Diego Andrade, associate Prof. at the University of A Coruña and researcher at CITIC, leads research lines focused on quantum computing, AI, and high-perform...

⚛️📈 How do we measure quantum progress?

📊 Our new benchmark suite with @udc.gal enables systematic evaluation of quantum platforms.

https://www.youtube.com/watch?v=Mv_qfJAXG0A

#QuantumComputing #Benchmarking #PCCC

0 0 1 0

HGPU group

@hgpu.bsky.social

1 week ago

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the translation of high-level languages into CUDA, overlooking …

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

#CUDA #LLM #Benchmarking #Package

hgpu.org?p=30630

0 0 0 0

FunctionalProgramming

@functionalprogramming.activitypub.awakari.com.ap.brid.gy

1 week ago

Original post on hgpu.org

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation Recent studies have demonstrated the potential of Large Language Models (LLMs) in generating GPU Kernels. Current benchmarks focus on the tr...

#Computer #science #CUDA #paper #Benchmarking #LLM #nVidia #nVidia #A40 #nVidia #GeForce […]

0 0 0 0

Kubernetes

@kubernetes.activitypub.awakari.com.ap.brid.gy

2 weeks ago

Original post on franksworld.com

How Enterprises Measure LLM Performance and Cost Imagine trying to gauge the performance of an engine in real-world conditions. You wouldn’t just rev it up in a static environment and call it a d...

#AI #Large #Language #Models #Red #Hat #AI #benchmarking #AI #performance #evaluation

Origin | […]

0 0 0 0

roxsross

@roxsross.bsky.social

2 weeks ago

📊 Por qué ya no evaluamos con SWE-bench Verified

Contaminación y medición errónea del progreso en código frontera.

openai.com/index/why-we-no-longer-e...

#Benchmarking #AIEngineering #CodeGen #RoxsRoss

0 0 0 0

The Information, Advice and Support Services Network (IASSN)

@iass-network.councilfordisabledchildren.org.uk

2 weeks ago

Minimum Standards Benchmarking Report 2025–26 📊

A snapshot of how SENDIAS services are meeting national minimum standards. It highlights national trends and supports continuous improvement across SENDIAS.

🔗 councilfordisabledchildren.org.uk/about-us-0/n...

#SENDIAS #SEND #Benchmarking

1 3 0 0

NimblePros

@nimblepros.com

2 weeks ago

Gathering benchmarks for your .NET app and aren't sure if you're comparing the right things? In this post and video, Phil will talk you through validating your benchmarks in .NET: https://bit.ly/3Yyg80F

#dotnet #benchmarking

0 0 0 0

Miguel Filipe

@mfilipe.bsky.social

3 weeks ago

I benchmarked 8 local LLMs writing Go on my Framework 13 AMD Strix Point

#Benchmarking Local LLMs for coding in Go on my framework13 AMD Strix Point laptop...
msf.github.io/blogpost/ben...

0 0 1 0

The Gregory Lab @ DukeU

@gregorylab.bskyverified.social

3 weeks ago

Work from the #DukeMGC will be on display at #AGBT2026:

Tuesday 1:30-3:30, poster #401

Wednesday 4:45-6:15, poster #472

Come find us to chat about our data! 🧬

#AGBT #SpatialTranscriptomics #SingleCell #Benchmarking #LongReadSequencing

0 0 0 2

TMLR Published Papers

@tmlr-pub.bsky.social

3 weeks ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Jialin Yang, Dongfu Jiang, Tony He et al.

Action editor: Frederic Sala

https://openreview.net/forum?id=buDwV7LUA7

#structured #benchmarking #formats

0 0 0 0

Christian Klass

@christianklass.bsky.social

3 weeks ago

Small pre-announcement from today: The Procyon team is working on a new browser-focused benchmark. More about it soon. #Benchmarking

0 0 0 0

Chinballs Gaming

@chinballs.tv

3 weeks ago

9070XT Does it need a better CPU? YouTube video by Chinballs Gaming

Is your CPU holding back your 9070XT? #benchmarking #AMD #UltraWide #9070XT

0 0 0 0

ceph.io

@ceph.io

3 weeks ago

CLAY vs JErasure in Ceph, what’s the real performance story?
Part 4 of this CBT benchmarking series explains why CLAY incurs a write hit but can reduce recovery network traffic by ~50%.

Read more: t.ly/CLAYvsJErasure
#Ceph #Storage #OpenSource #Benchmarking

1 0 0 0

Juan Sanchez

@juanyobluesky.bsky.social

3 weeks ago

Advancing AI benchmarking with Game Arena We’re expanding Game Arena with Poker and Werewolf, while Gemini 3 Pro and Flash top our chess leaderboard.

🎮📊 Game Arena: mejoras para benchmarking de IA y evaluación de modelos. #Benchmarking #DeepMind

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 month ago

Awakari App

I Changed One String and My Model’s Score Dropped 70 Points Understanding LLM evaluation by experimenting with different stop sequences Continue reading on Towards AI »

#machine-learning #llm #mlops #artificial-intelligence #benchmarking

Origin | Interest | Match

0 0 0 0

Alessio Xompero

@axompi.bsky.social

1 month ago

✨ 𝐘𝐨𝐮𝐫 𝐎𝐩𝐩𝐨𝐫𝐭𝐮𝐧𝐢𝐭𝐢𝐞𝐬

• Test on UR5 and Franka Emika Panda robots on the competition site based on requests and availability

• Benchmark against state-of-the-art solutions and advance these robotic tasks in real-world conditions

• Win cash prizes for top performance

#benchmarking #openscience

0 0 1 0

Ardea International

@ardeaint.bsky.social

1 month ago

Finding, Fixing, and Preventing: Insights from the 2025 Modern Slavery Benchmarks - Ardea International At the Ardea International Modern Slavery Conference, Dr Martin Buttle presented the latest findings from the CCLA Modern Slavery Benchmark.

New Article - Finding, Fixing, and Preventing: Insights from the 2025 Modern Slavery Benchmarks: www.ardeainternational.com/thinking/ins...

#Benchmarking #EndModernSlavery

0 0 0 0

Ubuntu

@ubuntu.activitypub.awakari.com.ap.brid.gy

1 month ago

Maravel-Framework 10.61.9 Benchmarks vs Lumen and Laravel Maravel-Framework 10.61.9 Thanks to https://github.com/myaaghubi/PHP-Frameworks-Bench I was able to benchmark Maravel Micro-Framework 10.52...

#Software #benchmark #benchmarking #maravel #maravelith #prodsens #live

Origin | Interest | Match

1 0 0 0

Sarthak Makhija

@msarthak.bsky.social

1 month ago

Ever wondered what go test -bench actually measures? 🕵️‍♂️

I dissected Go’s internals to show how the Compiler, CPU, and Framework interact.

tech-lessons.in/en/blog/diss...

#golang #benchmarking

1 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 month ago

Linux b4 Kernel Develops AI Agent for Code Review Using Dog Fooding analysis of Source Material 1. Core Topic & Intended Audience: The core topic is the integration of AI-assisted code review i...

#Technology #Desktop #Linux #Linux #benchmarking #Linux […]

[Original post on archynewsy.com]

0 0 0 0

Chinballs Gaming

@chinballs.tv

1 month ago

Desktop Bazzite vs Windows 11 YouTube video by Chinballs Gaming

5 systems, 8 benchmarks and 2 OS'es equals a lot of data. Hopefully this gives you a good idea of the performance when considering switching to Bazzite from Windows. #bazzite #benchmarking @bazzite.gg #Linux #gamingOnLinux

16 7 2 0

The Gregory Lab @ DukeU

@gregorylab.bskyverified.social

1 month ago

We are thrilled to share the 1st pub out of the #DukeMGC. Congrats to Lab Analyst Ellora Haukenfrers' on your 1st first author paper!

We present 'A platform-agnostic evaluation of non-formalin fixed #singlecell RNA technologies'

#benchmarking #scRNAseq
www.biorxiv.org/content/10.6...

1 0 0 0

deepseek

@deepseek.activitypub.awakari.com.ap.brid.gy

1 month ago

PhysProver: Advancing Automatic Theorem Proving for Physics The combination of verifiable languages and LLMs has significantly influenced both the mathematical and computer science communities because it provides a rigorous foundation for theorem proving. R…

PhysProver: Advancing Automatic Theorem Proving for Physics The combination of verifiable languages and LLMs has significantly influenced both the mathematical and computer science communities beca...

#Computer #science #paper #Physics #Benchmarking #LLM #Package

Origin | Interest | Match

0 0 0 0

Javier Santoyo

@jsantoyo.bsky.social

1 month ago

Evaluating the practical aspects and performance of commercial single-cell RNA sequencing technologies. #SingleCell #scRNAseq #Benchmarking #NARgenomicsAndBioinformatics 🧪🧬 🖥️
academic.oup.com/nargab/artic...

1 0 0 0

llm-d

@llm-d.ai

1 month ago

Community Demo: Verified & Reproducible LLM Benchmarks | llm-d Project In the llm-d open-source project, we believe a supported guide is only as good as the data backing it. In this community demo, the SIG-benchmarking team showcases the benchmarking suite that brings…

A huge shoutout to the contributors in SIG-benchmarking for making performance transparency a core pillar of the llm-d project!

🚀 Check out the full demo here: youtu.be/TNYXjZpLCN4

#AI #Kubernetes #Benchmarking

0 0 0 0