's Avatar

@braintelligence

Believer in inclusive democracy Posting mostly about AI/ML and tech if I can help it

373
Followers
866
Following
2,288
Posts
08.11.2024
Joined
Posts Following

Latest posts by @braintelligence

Preview
Indonesia to Block Children Under 16 From Social Media The ban is to take effect March 28, according to a government minister, but details about how it would be carried out were scarce.

Indonesia said that it would bar anyone under the age of 16 from access to social media, joining a growing list of countries that are enacting such restrictions in a bid to safeguard the well-being of children.

07.03.2026 02:40 πŸ‘ 171 πŸ” 31 πŸ’¬ 7 πŸ“Œ 7

GPT-5.4 Pro (xhigh) also improved CritPt record from Gemini 3.1 Pro's 17% to 30%. OpenAI appears to have an edge on the hardest math and physics reasoning tasks.

"CritPt evaluates language models on solving unpublished, frontier-level physics problems that require genuine research-scale reasoning."

06.03.2026 20:16 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Wow

07.03.2026 02:29 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
The image is a benchmark comparison infographic titled "Qwen3.5-4B vs GPT-4o." It compares the Qwen3.5-4B open-weight model (released March 2026) against OpenAI's GPT-4o (from May 2024).
Summary of Results
 * Total Wins: Qwen3.5-4B wins 5 out of 7 benchmarks; GPT-4o wins 2 out of 7.
 * Average Advantage: Qwen has a +9.6 average advantage over GPT-4o across the categories shown.
Benchmark Performance (Bar Chart)
The bar chart displays percentage scores across seven specific benchmarks, with Qwen represented in light blue and GPT-4o in gold/brown.
| Benchmark | Leader |
|---|---|
| GPQA Diamond | Qwen3.5-4B (Significant lead) |
| MMLU-Pro | Qwen3.5-4B |
| MATH-500 | Qwen3.5-4B (Largest lead, nearly 95%) |
| MMMU-Pro | Qwen3.5-4B |
| Video-MME | Qwen3.5-4B |
| MMMLU | GPT-4o (Slight lead) |
| MMLU | GPT-4o (Slight lead) |
Key Takeaway
The graphic highlights that the much smaller 4B parameter Qwen model from 2026 outperforms the older 2024 flagship GPT-4o in specialized reasoning and math tasks, while GPT-4o maintains a narrow edge in general knowledge benchmarks like MMLU and MMMLU.
Would you like me to analyze the specific percentage gaps for any of these individual benchmarks?

The image is a benchmark comparison infographic titled "Qwen3.5-4B vs GPT-4o." It compares the Qwen3.5-4B open-weight model (released March 2026) against OpenAI's GPT-4o (from May 2024). Summary of Results * Total Wins: Qwen3.5-4B wins 5 out of 7 benchmarks; GPT-4o wins 2 out of 7. * Average Advantage: Qwen has a +9.6 average advantage over GPT-4o across the categories shown. Benchmark Performance (Bar Chart) The bar chart displays percentage scores across seven specific benchmarks, with Qwen represented in light blue and GPT-4o in gold/brown. | Benchmark | Leader | |---|---| | GPQA Diamond | Qwen3.5-4B (Significant lead) | | MMLU-Pro | Qwen3.5-4B | | MATH-500 | Qwen3.5-4B (Largest lead, nearly 95%) | | MMMU-Pro | Qwen3.5-4B | | Video-MME | Qwen3.5-4B | | MMMLU | GPT-4o (Slight lead) | | MMLU | GPT-4o (Slight lead) | Key Takeaway The graphic highlights that the much smaller 4B parameter Qwen model from 2026 outperforms the older 2024 flagship GPT-4o in specialized reasoning and math tasks, while GPT-4o maintains a narrow edge in general knowledge benchmarks like MMLU and MMMLU. Would you like me to analyze the specific percentage gaps for any of these individual benchmarks?

at least on benchmarks, Qwen3.5 4B beats GPT-4o

GPTQ 4-bit quant means it fits into 2 GB

06.03.2026 23:51 πŸ‘ 46 πŸ” 5 πŸ’¬ 5 πŸ“Œ 0

You can tell they never read nor studied anything by the people who lived through wwii or Vietnam

07.03.2026 01:19 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Rapidly rebranding all my search benchmarks as eval awareness benchmarks

06.03.2026 19:33 πŸ‘ 104 πŸ” 14 πŸ’¬ 6 πŸ“Œ 2

I curse at it but only after I ask it to summarize the current relevant files and functions… then I start a new conversation

I think we’re screwed if they give the tools the ability to remember…

06.03.2026 20:28 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This is pretty amazing. Could flip the vast swatches of rural America to EVs

Imagine building a modest solar farm and some battery and capacitor banks… and the rural residents could indefinitely power their vehicles with a short stop, and never have to truck in gasolineβ€” completely self sufficient

06.03.2026 16:40 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Does anyone know exactly how the new interrupt modes work on the latest models? Are they just interrupting the context and appending the interruption with special tags or something?

06.03.2026 04:38 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
A line graph titled "GPT-5.4: 1M Context Reality Check" showing needle-in-a-haystack accuracy (MRCR v2, 8-needle) across different context window ranges. The accuracy starts at 97.3% for the 4-8K range and remains relatively high until 128-256K, where it begins a sharp decline. In the final two ranges, highlighted in red as the "1M context" zone, the accuracy drops significantly to 57.5% (labeled as a "40pt drop") at 256-512K and falls to 36.6% at the 512K-1M range. The source is cited as OpenAI GPT-5.4 eval table, dated March 5, 2026.

A line graph titled "GPT-5.4: 1M Context Reality Check" showing needle-in-a-haystack accuracy (MRCR v2, 8-needle) across different context window ranges. The accuracy starts at 97.3% for the 4-8K range and remains relatively high until 128-256K, where it begins a sharp decline. In the final two ranges, highlighted in red as the "1M context" zone, the accuracy drops significantly to 57.5% (labeled as a "40pt drop") at 256-512K and falls to 36.6% at the 512K-1M range. The source is cited as OpenAI GPT-5.4 eval table, dated March 5, 2026.

GPT-5.4 has 1M token context! wow!

reality:

06.03.2026 00:58 πŸ‘ 82 πŸ” 3 πŸ’¬ 5 πŸ“Œ 0

Does local law clarify though that it’s being used to circumvent another nations rights protections? Seems like they would have had grounds to oppose this if they knew

05.03.2026 22:12 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Proton Mail Helped FBI Unmask Anonymous β€˜Stop Cop City’ Protestor A court record reviewed by 404 Media shows privacy-focused email provider Proton Mail handed over payment data related to a Stop Cop City email account to the Swiss government, which handed it to the...

A court record reviewed by 404 Media shows privacy-focused email provider Proton Mail handed over payment data related to a Stop Cop City email account to the Swiss government, which handed it to the FBI.

05.03.2026 21:15 πŸ‘ 208 πŸ” 87 πŸ’¬ 16 πŸ“Œ 26
Preview
OpenAI building GitHub alternative after frequent platform outages and disruptions β€” a public OpenAI code repository would directly compete with one of its biggest investors The project could eventually be sold to customers and would put OpenAI in direct competition with one of its biggest investors.

I was wondering when someone was going to tackle this. There’s an trove of AI centric features yet to be developed

www.tomshardware.com/tech-industr...

05.03.2026 20:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

The Very Large Array (VLA) in New Mexico is open to visitors 7 days a week from 9 AM – 4 PM. Come check out our telescopes on a self-guided walking tour, and don't forget to stop by the Visitor Center & Gift Shop! #VisitVLA

Admission: https://public.nrao.edu/visit/very-large-array/

05.03.2026 20:05 πŸ‘ 16 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Preview
a cartoon woman with a flower crown on her head is driving a boat with netflix written on the bottom ALT: a cartoon woman with a flower crown on her head is driving a boat with netflix written on the bottom
05.03.2026 20:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

Had early access to GPT-5.4 and Pro. The stats are very good and so are the models.

One fun illustration of progress, this is the prompt "the book Piranesi as a p5js 3d space. do it for me," back in 2024 in GPT-4 (which took multiple corrections) and in GPT-5.4 Pro, which did it in one prompt.

05.03.2026 19:22 πŸ‘ 42 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0

A year ago, releasing complete source code was necessary for the production of working object code.

Today releasing complete documentation is starting to be sufficient for the production of working object code.

Golly.

05.03.2026 15:43 πŸ‘ 13 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

I don't know who needs to know this... but...

A VPN is not a privacy application. It doesn't hide your location data, specifically. It doesn't encrypt your data, specifically. All it does is route your traffic through a single server somewhere else.

Let me explain. Or don't, this is the internet.

04.03.2026 16:39 πŸ‘ 2679 πŸ” 420 πŸ’¬ 117 πŸ“Œ 24
Preview
a woman with curly hair is laughing in front of a sign that says prime video ALT: a woman with curly hair is laughing in front of a sign that says prime video

Yann LeCunn

05.03.2026 05:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Mark Zuckerberg is 'done with' the Meta’s highest-paid employee as company’s reorganisation proves - The Times of India Tech News News: Meta CEO Mark Zuckerberg has quietly begun dismantling the power structure he built around Alexandr Wang, his $14 billion bet to lead the company's AI.

Interesting development… I guess Alexandr Wang is on the way out. That was a bit quicker than I expected. I would have thought he’d be given at least a year of runway.

timesofindia.indiatimes.com/technology/t...

04.03.2026 19:52 πŸ‘ 12 πŸ” 3 πŸ’¬ 2 πŸ“Œ 1
Preview
Glaze by Raycast. Desktop apps, reimagined by you. Create software for you and your team. Lives on your Mac, connects to your files, tools and hardware.

There’s a lot of info missing but potentially really cool local app builder:

www.glazeapp.com

05.03.2026 05:33 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Anthropic’s AI tool Claude central to U.S. campaign in Iran, amid a bitter feud Anthropic’s AI tool Claude is playing a key role in the U.S. military’s campaign in Iran, amid a bitter fight with the Pentagon over the terms of its use in war.

Wapo is reporting that Claude was used in target selection

β€œTo strike 1,000 targets in 24 hours in Iran, the U.S. military leveraged AI

Anthropic’s Claude partnered with the military’s Maven Smart System, suggesting targets and issuing precise location coordinates. wapo.st/4rN5sa1 β€œ

05.03.2026 05:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The A18 Pro in the MacBook Neo is 19% faster than the M2 Ultra in the Mac Pro in single-core performance (Geekbench 6).

The MacBook Neo starts at $599.

The Mac Pro, which is still for sale, starts at $6,999.

04.03.2026 16:11 πŸ‘ 401 πŸ” 47 πŸ’¬ 60 πŸ“Œ 8

Look i know ai is "not sentient" but like, if you went back in time to the 90s and told someone about this, they would tell you you had a sentient robot inside your computer

05.03.2026 00:24 πŸ‘ 118 πŸ” 12 πŸ’¬ 11 πŸ“Œ 1

I think one of the things that bothers me the most about the way Polymarket presents itself to the world is that it's adopted the language of journalism to make itself sound more legitimate. Like look at this post from yesterday morning incorrectly "projecting" the winner.

04.03.2026 15:10 πŸ‘ 3841 πŸ” 671 πŸ’¬ 57 πŸ“Œ 49
Preview
Toyota and Stellantis withdraw from COβ‚‚ pool with Tesla - electrive.com Until now, the European COβ‚‚ pool centered around Tesla has been a major conglomerate of automakers, featuring the likes of Ford, Mazda, Honda, Toyota, and

Toyota and Stellantis withdraw from COβ‚‚ pool with Tesla

However, for 2026, two major manufacturers, Toyota and Stellantis, are withdrawing from this pool - likely taking two of Tesla's largest financial contributors with them.

03.03.2026 15:55 πŸ‘ 206 πŸ” 43 πŸ’¬ 15 πŸ“Œ 4
Post image

Over 40% of global shipping by volume exists to move fossil fuels from one place to another.

A huge share of the world's maritime infrastructure has been built around a system that is going to change dramatically as renewable energy and electrification displace fossil fuels.

04.03.2026 11:05 πŸ‘ 763 πŸ” 354 πŸ’¬ 34 πŸ“Œ 33
Post image

Ah, yes, the invaluable wisdom of the markets.

04.03.2026 12:00 πŸ‘ 3629 πŸ” 507 πŸ’¬ 79 πŸ“Œ 81