Home New Trending Search
About Privacy Terms
#
#Benchmark
Posts tagged #Benchmark on Bluesky

⚡ PNNL y OpenAI se asocian para agilizar permisos federales

Presentan DraftNEPABench, un benchmark para acelerar revisiones de infraestructura con IA.

openai.com/index/pacific-northwest-...

#AIcoding #NEPA #Benchmark #RoxsRoss

1 0 0 0
Preview
BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico El Banco Centroamericano de Integración Económica aplicará tercer recorte consecutivo de 15 puntos básicos en tasas de interés desde el 1 de junio de 2026 acumulando reducción de entre 80 y 95 puntos en tres años, beneficiando presupuestos nacionales mediante eficiencias logradas tras captar 2,000 millones de dólares en emisión Benchmark más grande de su historia. Este artículo BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico se publicó primero en Diario El Mundo | Noticias de Honduras y el Mundo.

#Economía #Presupuestos #Benchmark BCIE reduce 95 puntos básicos en tres años tras emitir 2,000 millones en Benchmark histórico

0 0 0 0
Early Benchmarks Show Apple's MacBook Neo Outperforming Top x86 CPUs in Single-Core Tests Benchmark results from Notebookcheck reveal that the new Apple MacBook Neo, powered by the A18 Pro chip, delivers record-breaking single-core performance that surpasses all current x86 processors from Intel and AMD. In Cinebench 2024 testing, the A18 Pro achieved 147 points while consuming only 3.5 to 4 watts. This efficiency is noteworthy, as the test itself lasts roughly ten minutes and taxes a CPU core consistently during the process. The performance figure places Apple’s chip ahead of even high-end desktop CPUs such as Intel’s Core Ultra 9 285K and AMD’s Ryzen 9 9950X3D—not to mention every modern mobile chip from AMD, Intel, and Qualcomm. The A18 Pro also tops Apple’s previous M3 generation, cementing the company’s continued lead in single-core efficiency. Despite these impressive results, the article notes that Apple’s architectural design includes specialized accelerators that favor workload types optimized for its ecosystem, meaning the raw benchmark may not represent typical real-world usage outside macOS or Apple-optimized software. Notebookcheck suggests that Apple’s tight integration between hardware and software provides a unique advantage versus general-purpose processors. Industry reactions are mixed; some applaud the innovation, while others label the coverage as overly promotional. Regardless, the results signal a new level of competition between Apple’s ARM-based systems and the traditional x86 giants, Intel and AMD.

Early Benchmarks Show Apple's MacBook Neo Outperforming Top x86 CPUs in Single-Core Tests

🤖 IA: It's clickbait ⚠️
👥 Usuarios: It's clickbait ⚠️

#apple #benchmark #cpu

View full AI summary:

0 0 0 0
Researchers Develop a Comprehensive Benchmark to Evaluate AI Expertise As AI systems increasingly excelled at traditional academic benchmarks, researchers recognized the need for more challenging tests. In response, an international team of nearly 1,000 experts developed Humanity's Last Exam (HLE), a 2,500-question assessment covering mathematics, humanities, natural sciences, ancient languages, and other highly specialized fields. Each question was carefully crafted so that current AI models could not solve it, with any solvable questions removed from the final exam. Early testing revealed that even the most advanced AI models struggle significantly, with scores ranging from roughly 2.7% to around 50% for the most capable systems. Dr. Tung Nguyen from Texas A&M University emphasized that the goal is not to defeat AI but to identify gaps in AI knowledge and provide a durable benchmark for measuring AI progress. The exam demonstrates that high performance on traditional human-focused tests does not equate to genuine intelligence, as AI systems still lack deep, contextual understanding and specialized expertise. Humanity's Last Exam also highlights the importance of human expertise and the value of global, interdisciplinary collaboration in evaluating AI capabilities.

Researchers Develop a Comprehensive Benchmark to Evaluate AI Expertise

🤖 IA: It's clickbait ⚠️
👥 Usuarios: It's clickbait ⚠️

#ai #benchmark #research

View full AI summary:

0 0 0 0

mSOP-765k: A Benchmark For Multi-Modal Structured Output Predictions

Bianca Lamm, Janis Keuper

Action editor: Mohammad Ghavamzadeh

https://openreview.net/forum?id=H7eYL4yFZS

#benchmark #advertisements #modal

0 0 0 0
Evaluación de modelos de IA frente a preguntas sin sentido BullshitBench es un benchmark diseñado para evaluar cómo los modelos de inteligencia artificial responden a preguntas sin sentido o basadas en premisas incorrectas. La prueba analiza si los modelos detectan estas premisas defectuosas, si señalan directamente el sinsentido y si evitan continuar con suposiciones inválidas de forma confiada. La plataforma permite filtrar los resultados según diferentes criterios, como la visibilidad del modelo y la técnica de razonamiento utilizada. Además, ofrece un ranking de modelos según su capacidad para rechazar claramente las preguntas sin sentido, mostrando la mejora de cada versión en términos de porcentajes de respuestas correctas y de detección de errores. Los datos se organizan con códigos de colores que indican el tipo de respuesta: verde para respuestas claras, ámbar para respuestas parciales, rojo para aceptar el sinsentido y errores que indican fallos. Esta herramienta resulta útil para desarrolladores y investigadores que buscan entender las limitaciones de los modelos de lenguaje actuales y mejorar su capacidad de razonamiento crítico, evitando que los modelos den respuestas incorrectas con confianza. BullshitBench también permite comparar modelos entre sí y rastrear el progreso de su desarrollo a lo largo del tiempo, proporcionando información valiosa sobre la evolución de la inteligencia artificial en contextos de razonamiento complejo y detección de información inválida.

Evaluación de modelos de IA frente a preguntas sin sentido

🤖 IA: No es clickbait ✅
👥 Usuarios: No es clickbait ✅

#ia #modelosdelenguaje #benchmark

Ver resumen IA completo:

0 0 0 0

#Google: #AI agents learn to cooperate on their own - no hardcoded #orchestration needed. Train them against a diverse pool of #opponents and #cooperation emerges as a property of #training.

#Benchmark:
Iterated Prisoner's Dilemma.

Result: stable collaboration

#AI #MultiAgent #MachineLearning

3 0 0 0

LLMs hallucinate – but not at the same rate. AA-Omniscience data reveals major differences between models and domains.

Well structured and worth checking out: https://artificialanalysis.ai/evaluations/omniscience

#AI #LLM #benchmark

0 5 0 1

📰 Benchmark Intel Core Ultra 5 250K Plus Bocor, Gambarkan Performa Arrow Lake Refresh

👉 Baca artikel lengkap di sini: ahmandonk.com/2026/03/09/intel-core-ul...

#arrowLake #benchmark #cpu #intel

0 0 0 0
Geekbench 6 benchmark results showing iPhone 17e with A19 chip performance compared to iPhone 17.

Geekbench 6 benchmark results showing iPhone 17e with A19 chip performance compared to iPhone 17.

I primi benchmark Geekbench 6 rivelano che iPhone 17e con chip A19 è alla pari con iPhone 17 per la CPU. La GPU a 4 core del 17e mostra un leggero calo grafico rispetto ai 5 core del 17. 📱📊
#iphone17e #benchmark #chipa19

0 0 0 0

There are no Champions in Supervised Long-Term Time Series Forecasting

Lorenzo Brigato, Rafael Morand, Knut Joar Strømmen et al.

Action editor: Devendra Dhami

https://openreview.net/forum?id=yO1JuBpTBB

#benchmarking #forecasting #benchmark

0 0 0 0

New #J2C Certification:

\texttt{Complex-Edit}: CoT-Like Instruction Generation for Complexity-Controllable Image Editing ...

Siwei Yang, Mude Hui, Bingchen Zhao, Yuyin Zhou, Nataniel Ruiz, Cihang Xie

https://openreview.net/forum?id=lL1JR6dxG8

#editing #instruction #benchmark

0 0 0 0
Post image

MacBook Neo benchmark:
CPU vicina a iPhone 16 Pro, chip A18 Pro con GPU ridotta.

Dati:
Neo: 3461/8668/31286
iPhone 16 Pro: 3445/8624/32575
M4 Air: 3696/14730/54630

Analisi prestazioni hardware 💻📊

#apple #macbookneo #benchmark

0 0 0 0
MacBook Neo performance Single Core - Geekbench

MacBook Neo performance Single Core - Geekbench

MacBook Neo performance Multi-Core - Geekbench

MacBook Neo performance Multi-Core - Geekbench

Le MacBook Neo est la grosse nouveauté de cet #AppleLaunch
Niveau performances on se situe quelque part entre la puce M1 et la puce M4 en fonction des usages. Hâte de voir ce qu'il donnera en conditions réelles ! 🤩
#MacBookNeo #Geekbench #benchmark

0 0 0 0
Post image Post image

#BYD has unveiled its second-gen blade battery, setting a new #benchmark in fast‑charging technology.

At a launch event in Shenzhen, the company demonstrated charging speeds from 10% to 70% in just five minutes, and up to 97% in nine minutes, comparable to refueling a car.

0 0 1 0
Awakari App

The Price Per Million Tokens Is Lying to You About 9 months ago, I was building a RAG system, for those who don’t know its a kind of enhanced memory system for AI agents. One of the… Continue r...

#benchmark #ai #developer-tools #llm #machine-learning

Origin | Interest | Match

1 0 0 0
Preview
The Price Per Million Tokens Is Lying to You About 9 months ago, I was building a RAG system, for those who don't know its a kind of enhanced...

The Price Per Million Tokens Is Lying to You About 9 months ago, I was building a RAG system, for those who don't know its a kind of enhanced memory system for AI agents. One of the agentic flo...

#ai #llm #benchmark #devtools

Origin | Interest | Match

1 0 0 0
Preview
48. The cartographers of the financial world For our forty-eighth episode, we're exploring a company that is the ultimate "tollbooth" business. Imagine you're a massive pension fund or an asset manager. How do you measure your performance? How do you decide how to invest in international stocks? [pause] You need a benchmark, a universal standard, a map of the financial world. Our company today creates those maps. And for the privilege of using them, they collect a small, recurring fee on trillions of dollars of global assets. We are talking about the financial data and index powerhouse... MSCI. When you hear the name MSCI ($MSCI), you probably think of their world-renowned stock market indices, like the MSCI World or MSCI Emerging Markets benchmarks. They are the creators of the yardsticks that a huge portion of the global investment industry, including countless ETFs and mutual funds, measure themselves against. But the real story behind MSCI is its evolution into a deeply embedded financial data and analytics powerhouse. The indices are just the beginning. The company operates a powerful, recurring-revenue "toll road" model, collecting fees based on the assets tied to its benchmarks. Furthermore, its suite of mission-critical risk analytics and ESG data tools are woven into the daily operations of the world's largest asset managers, creating incredibly high switching costs. But with the stock perpetually trading at a premium valuation, is the price of admission too high? We're running the numbers to determine if MSCI's formidable competitive moat makes it a must-own compounder or if its high valuation presents too much risk in a cyclical market.

📣 New Podcast! "48. The cartographers of the financial world" on @Spreaker #analytics #assetmanagement #benchmark #blackrock #compounder #data #esg #etf #finance #financial #index #investing #moat #msci #portfolio #recurring #risk #royalty #stock #valuation

0 0 0 0
Preview
How Well Does Agent Development Reflect Real-World Work? AI agents are increasingly developed and evaluated on benchmarks relevant to human work, yet it remains unclear how representative these benchmarking efforts are of the labor market as a whole. In thi...

Current AI agent benchmarks are poorly aligned with real-world human work. They are heavily skewed toward programming-centric tasks. Domains where most people work and contribute value are underrepresented in how we measure AI progress.

arxiv.org/abs/2603.01203
#ai #benchmark

6 0 0 0
Post image

I Benchmarked Java on Single-Board Computers: Orange Pi 5 Ultra and Raspberry Pi 5 Lead the Pack Table of Contents Benchmark ToolBenchmarkRunner.java - The User ToolSummarizeReports.java - The Auto...

#Embedded #Java #Java #Core #JBang #Performance #Raspberry #Pi […]

[Original post on foojay.io]

0 0 0 0

Leveraging the True Depth of LLMs

Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret

Action editor: Changyou Chen

https://openreview.net/forum?id=JccJ6YfWd4

#llms #llm #benchmark

2 0 0 0
Preview
Coolest Stores: Boss Flagship on Passeig de Gràcia | invidis Barcelona | Boss brings a striking blend of digital signage and Catalan design heritage to its new flagship on Barcelona’s Passeig de Gràcia. The expansive store combines architectural craftsmanship, ...

Boss brings a striking blend of #digitalsignage to its new flagship. The expansive store combines architectural craftsmanship, natural light, and immersive brand experiences to set a new #benchmark for modern #Retail.

invidis.com/news/2026/02...

0 0 0 0
Video thumbnail

Cognix v0.2.5 released.

Benchmark vs Claude Code & Aider (3 runs, same LLM):
- Exec: 100% (= Claude Code, > Aider 87.5%)
- Lint: 0.00 (best in class)
- claude-opus-4.6 support added

Report on Zenn/Dev.to soon.

pipx install cognix
cognix-dev.github.io/cognix/

#Claude #Aider #Benchmark

0 0 0 0
Post image

#Gemini 3.1 is here.

another day another #benchmark drop.

Gemini 3.1 is here.

stats looks pretty good honestly.

look at that #ARC-AGI-2 jump!

#BrowseComp also through the roof, so it should have a really good agentic search function.

2 0 0 0
Preview
We hid backdoors in binaries — Opus 4.6 found 49% of them This blog post was authored by Piotr Grabowski, Rafał Strzaliński, Michał Kowalczyk, Piotr Migdał,...

We hid backdoors in binaries — Opus 4.6 found 49% of them This blog post was authored by Piotr Grabowski , Rafał Strzaliński , Michał Kowalczyk , Piotr Migdał , and Jacek Migdal . Claude can ...

#ai #benchmark #security

Origin | Interest | Match

0 0 0 0
Post image

Jack Altman joins Benchmark as General Partner, bringing his Alt Capital team along. A significant shift in the VC landscape! #VentureCapital #Benchmark #JackAltman Link: thedailytechfeed.com/jack-altman-...

0 0 0 0
Preview
Le levier stratégique presque toujours sous-estimé Pourquoi le choix du fournisseur de propreté est-il crucial pour la RSE ? Impact environnemental, bien-être social et image de marque : apprenez à choisir un partenaire responsable pour vos bureaux…

Propreté, entretien des locaux professionnels ou industriels, un achat parfois négligé... faire un nouveau #benchmark pour reconsidérer vos options peut vous aider à marquer des points plutôt faciles. #greentech #impact #achatresponsable #respect #stratégieRSE #nettoyage yveszieba.me/2026/02/18/l...

0 0 0 0
Post image

Xiaomi 17 Ultra Leica Edition arriva in Europa
#Android #Android16 #Benchmark #Cameraphone #Flagship #Geekbench #Leak #LeicaEdition #Prestazioni #Smartphone #Snapdragon8EliteGen5 #TechNews #Tecnologia #Xiaomi17Ultra
www.ceotech.it/xiaomi-17-ul...

0 0 0 0
AI 基础设施的语言之争:为何构建 LLM 网关时,我们放弃了 Python 选择了 Go? - Tony Bai 本文永久链接 - https://tonybai.com/2026/02/18/why-we-chose-go-over-python-for-llm-gateways 大家好,我是Tony Bai。 在 2026 年的今天,人工智能早已走出了实验室,成为企业级应用的核心驱动力。Python,凭

AI 基础设施的语言之争:为何构建 LLM 网关时,我们放弃了 Python 选择了 Go? 本文永久链接 – tonybai.com/2026/02/18/why-we-chose-...

#技术志 #AgenticCoding #AIInfrastructure #AI基础设施 #benchmark #ConcurrencyModel #ContextSwitching #GIL #GMPScheduling #GMP调度 #Go

Origin | Interest | Match

0 0 0 0
Post image

Galaxy S26 Ultra vicino ad iPhone 17 Pro Max nei benchmark
#A19Pro #Apple #Benchmark #Confronto #Fotocamera #Fotografia #GalaxyS26Ultra #Geekbench #iPhone17ProMax #Nightography #Prestazioni #Samsung #SamsungGalaxy #Snapdragon8EliteGen5
www.ceotech.it/galaxy-s26-u...

2 0 0 0