Your welcome!
This is also in my reading list, as an application of IL in offline-to-online learning
- Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning arxiv.org/abs/2509.26605
Your welcome!
This is also in my reading list, as an application of IL in offline-to-online learning
- Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning arxiv.org/abs/2509.26605
- Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning arxiv.org/abs/2407.15007
And this "invitation" was also very intuitive
- An Invitation to Imitation www.ri.cmu.edu/publications...
was at an event on AI for science yesterday, a panel discussion here at NeurIPS. The panelists discussed how they plan to replace humans at all levels in the scientific process. So I stood up and protested that what they are doing is evil.
Full post:
togelius.blogspot.com/2025/12/plea...
π£ #ICML tutorials: We want to know what *you* would like to learn. This year, Adam White and I are calling for nominations of topics and/or presenters.
Until December 7th, you can send us your suggestions, and we will use them to shape the program.
icml.cc/Conferences/...
π¨The Formalism-Implementation Gap in RL researchπ¨
Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE).
1β£ Let's advance science of RL
2β£ Let's be explicit about how benchmarks map to formalism
1/X
I am happy to share that our paper "Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning" has been accepted at NeurIPS 2025!
Endless thanks to my amazing co-authors @claireve.bsky.social and @keggensperger.bsky.social
π Read it on arXiv: arxiv.org/abs/2505.05226
(1/3)
cvoelcker.de/blog/2025/re...
I finally gave in and made a nice blog post about my most recent paper. This was a surprising amount of work, so please be nice and go read it!
Thanks a lot! That was lightning fast π
Maybe a blog post would also help =)
Could you add me please?
Definition of dynamic programming in RL, from Csaba SzepesvΓ‘riβs RL theory lecture notes (Lecture 2, "Planning in MDPs")
Definition of dynamic programming, from Putermanβs Markov Decision Processes β chapter 1.
I came across a couple of other definitions that might be helpful to mention (apologies if youβre already considering these).
The first one is from Csaba SzepesvΓ‘riβs RL theory lecture notes (lecture 2, planning in MDPs), and the second one is from Puterman's MDP book (chapter 1).
What are we talking about when we talk about Dynamic Programming?
#ReinforcementLearning
What if all mathematicians had great visualization skills, tools, and public notes!
Onno and I will be presenting our poster at # W1005 tomorrow (Wed) morning.
He made a great thread about it, come chat with us about POMDP theory :)
I will not be at #ICML2025 this year, but 3 of my PhD students at π€ Adage (Adaptive Agents Lab) π€ are, presenting 3 papers.
β Avery Ma
β Claas Voelcker (cvoelcker.bsky.social)
β Tyler Kastner
Meet them to talk about Model-based RL, Distributional RL, and Jailbreaking LLMs.
Levine's take on the success of LLMs compared to video models is interesting, but I'll expand on how efforts toward AI could take two different paths, and why I think AI and NeuroAI could take different approaches moving forward. π§΅
π§ π€ #MLSky
Preprint Alert π
Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?
TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases β without extra loss terms and predictors!
π§΅ (1/10)
cleanrl is amazing (github.com/vwxyzjn/clea...) and its structure makes sense for teaching but an actual research codebase should not inherit this style! you do not want this amount of code duplication
rlhfbook also available on arxiv for SEO π happy friday
arxiv.org/abs/2504.12501
Recorded a recent "talk" / rant about RL fine-tuning of LLMs for a guest lecture in Stanford CSE234: youtube.com/watch?v=NTSY.... Covers some of my lab's recent work on personalized RLHF, as well as some mild Schmidhubering about my own early contributions to this space
PQN puts Q-learning back on the map and now comes with a blog post + Colab demo! Also, congrats to the team for the spotlight at #ICLR2025
Happy #Nowruz and the beginning of the spring!
I wanted to send you the link just now but hopefully you have found it =)
Sure *_*
Looking forward to it :)
Not yet. Just the classical claim that they're trying to learn the distribuition of the return =))
Do yo have any insights?
I was reading about the ways that I can enhance the performance of dqn on a real-world problem. One of the candidates was c51 but i haven't implement it yet becuase of computational costs. But it was interesting for becuase i haven't read the papers before
I didn't know until last week that it can cause a huge performance boost using it with dqn.
Iβve put together a short list of opportunities for early career academics willing to come to Europe: www.cvernade.com/miscellaneou...
This mostly covers France and Germany for now but Iβm willing to extend it. I build on @ellis.eu resources and my own knowledge of these systems.
RL is so back!
(well, for some of us, it never really left)
awards.acm.org/about/2024-t...