James MacGlashan's Avatar

James MacGlashan

@jmac-ai

Ask me about Reinforcement Learning Research @ Sony AI AI should learn from its experiences, not copy your data. My website for answering RL questions: https://www.decisionsanddragons.com/ Views and posts are my own.

2,397
Followers
1,426
Following
849
Posts
28.09.2024
Joined
Posts Following

Latest posts by James MacGlashan @jmac-ai

I agree, and I think you can make this stronger by pointing out that there are _many_ different AI technologies.

For example, many of the the ethical concerns about AI tech involve training on people's data.

But there are also AI technologies that learn from self-generated experiences.

06.03.2026 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I need to stop reading news in the morning.

06.03.2026 14:43 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Nice write up!

Maybe you can answer this. I believe associative scans prefer the sequence dim first. But attention usually has it latter.

Delta Net's are a little different, but I think still have a chunk associative scan? So do hybrid models suffer from different dimension ordering prefs?

06.03.2026 00:25 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Wow. This is mind numbingly stupid.

05.03.2026 21:00 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Florence ICE detainee dead after untreated tooth infection, official says An ICE detainee, who had been at a Florence detention center for four months, died Monday following an untreated tooth infection.

An ICE detainee in Arizona has died of a TOOTH INFECTION after it went untreated for weeks, a local official says. He was a Haitian asylum seeker imprisoned in Florence, Arizona. @emilybregel.bsky.social reports.
tucson.com/news/local/b...

04.03.2026 16:24 ๐Ÿ‘ 2916 ๐Ÿ” 1772 ๐Ÿ’ฌ 166 ๐Ÿ“Œ 337
Video thumbnail

[TW: graphic fracture, sound of breaking bone]

Sen Tim Sheehy (R-Montana) badly breaking the arm of a Marine veteran protesting the war Iran.

04.03.2026 22:05 ๐Ÿ‘ 9194 ๐Ÿ” 4496 ๐Ÿ’ฌ 1045 ๐Ÿ“Œ 1615

Absolutely grotesque.

04.03.2026 22:57 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

Jayapal to Noem: "I want to introduce you to just four of the US citizens unlawfully detained by ICE ... they're in this room with us."

04.03.2026 17:26 ๐Ÿ‘ 3992 ๐Ÿ” 1173 ๐Ÿ’ฌ 76 ๐Ÿ“Œ 59

Man, "rewrite it in rust" is being made pretty easy with llms.

(Ideally there'd be more significant design changes for a difference in language, but still...)

04.03.2026 20:03 ๐Ÿ‘ 10 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

What a fucking psychopath.

04.03.2026 15:59 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Indeed! I have a preview of it!

04.03.2026 15:04 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

TIL Yann and I have similar ideas about how to change publishing.

The additions I'd make are
- Make conferences smaller w/ presentations by invite (similar to @togelius.bsky.social suggestion)
- Build social media tooling for sharing/discussion and paper meta data to support it.

04.03.2026 14:45 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Once again, a long post with strong opinions. It's probably twice as long as it should be, it's also repetitive and written in affect. And you probably disagree with my argument. So maybe you shouldn't read it. On the other hand, most things worth reading are written in affect.

02.03.2026 01:48 ๐Ÿ‘ 35 ๐Ÿ” 7 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 4

The loss of agent is personally painful. I used to use that term to differentiate the kind of AI research I do from genAI. "I work on intelligent agents."

Now people assume that means "I wire up APIs to LLMs."

04.03.2026 00:10 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

A clip from the moment Hillary Clinton found out Lauren Boebert took an unauthorized photo of her deposition: 'I'm done!'

02.03.2026 21:44 ๐Ÿ‘ 14666 ๐Ÿ” 3896 ๐Ÿ’ฌ 858 ๐Ÿ“Œ 620

I cannot sufficiently convey how horrifying this graph is.

02.03.2026 15:30 ๐Ÿ‘ 16 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I'd be happy to have it with you should the opportunity ever arise! :)

02.03.2026 06:34 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I opined because decision making agents is literally my area of expertise. It is of deep importance to me to build machines with human-like cognitive abilities & I am painfully aware of how our current methods are lacking.

I don't need to be a philisopher to have a relevant view on it.

02.03.2026 06:30 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

If you really want to insist on embracing vague and useless deifnitions, have fun I guess.

It doesn't change the fact that there are meaningful distinctions you can draw for properties people posses. I guess you'll have to come up with a new name for these important differences.

02.03.2026 06:22 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I'm not concerned with philosophers bc they often insist on vague definitions as a matter of course.

Computer scientists and mathematicians have been able to separate agency/decision making straightforwardly for decades and made progress on them as a result.

The definition is clean, and useful.

02.03.2026 06:18 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

A running average is not sufficient for the class of problem I described. It's lacking a goal for the environment and an optimization process for that goal with the model. Similarly to how I pointed out elsewhere that deploying a frozen policy from RL loses the agency. Same with a classifier.

02.03.2026 06:18 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Preview
Rule 110 - Wikipedia

Likewise, that the conditions for some form of agency is simple does not mean everything posses agency.

(And if you're not familiar with my analogy to rule 110, see here. It's a super cool result.)
en.wikipedia.org/wiki/Rule_110

01.03.2026 23:35 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Yes, it is easy to define programs with limited agency. RL research started with toy problems that were exactly that! Nevertheless, it remains distinct.

Analogously, rule 110 is crazy simple and Turing Complete. Despite that modest requirement, most CA or otherwise are still not Turing Complete.

01.03.2026 23:35 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Sutton & Barto Book: Reinforcement Learning: An Introduction

I would start w/ the Sutton, Barto RL book. The second edition is free online here:
incompleteideas.net/book/the-boo...

Video interviews with Rich Sutton might be a helpful, gentler, intro too.

This post on my RL website here might also be helpful:
www.decisionsanddragons.com/posts/should...

01.03.2026 23:26 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

That kind of system is where people might argue LLMs capture some form of agency implicitly. But it's weak and the lack of clear objective during inference really hinders it and is why its susceptible to project injection attacks.

01.03.2026 23:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Human-Timescale Adaptation in an Open-Ended Task Space Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (...

MetaRL where you deploy a model that has internalized an RL optimization process with environment interaction for a well defined and regularly measured objective can be said to retain agency. For example, see the AdA work (arxiv.org/abs/2301.07608)

01.03.2026 23:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Contrast with classic classification and a deployed model which does not possess that. There is no internal model being computed and optimized in the deployed classifier.

Similarly, if you deploy a frozen policy learned from RL, you've lost the agent part. Still useful, but not an agent anymore.

01.03.2026 23:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Agency is best described as behavior governed by optimizations of models of an environment in which the system is embedded. This is principally what classic RL focuses on.

01.03.2026 23:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

People and animals certainly. Probably insects in limited ways. I'd be surprised to find much if any in plants/fungus but the biological world is often more complex than appears and I'm not a biologist so I won't comment on that.

01.03.2026 23:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Really? Do you think RNA compute models of their environment and solutions to maximize objectives of that environments that govern their behavior?

01.03.2026 23:10 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0