Harley Wiltzer's Avatar

Harley Wiltzer

@harwiltz

PhD student at Mila / McGill. Studying distributional RL for transfer across risk-sensitive utilities, and for long-horizon high-frequency decision-making.

65
Followers
179
Following
14
Posts
09.12.2024
Joined
Posts Following

Latest posts by Harley Wiltzer @harwiltz

Post image

E65: NeurIPS 2024 โ€“ Posters and Hallways 3

- Claire Bizon Monroc of Inria : WFCRL for Wind Farm Control
Andrew Wagenmaker of @ucberkeleyofficial.bsky.social : Leveraging Simulation to Bridge Sim-to-Real Gap
- @harwiltz.bsky.social of @mila-quebec.bsky.social : Multivariate Distributional RL
(cont)

10.03.2025 17:21 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thanks so much @patrickshafto.bsky.social!!

12.12.2024 22:44 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

There's an Easter egg after the 1024th iteration

09.12.2024 22:31 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thanks a lot :D

09.12.2024 19:20 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

For feature dimensions any larger than 1, things get tricky: projecting distributions onto finite representations can be expensive, and sampled-based updates can be biased. We present new methods using *randomized projections* and *signed measures* to overcome these issues.

09.12.2024 15:30 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

This is closely related to our recent work on the Distributional Successor Measure (arxiv.org/abs/2402.08530). We strengthen the analysis to tractable projected DP and TD algorithms, and provide convergence rates as a function of the return distribution resolution & feature dim.

09.12.2024 15:30 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

We learn the joint distribution over SFs in RL. Whereas SFs enable 0-shot transfer of value functions across a finite-dimensional class of reward functions, distributional SFs enable 0-shot generalization of return *distribution* functions across the class.

09.12.2024 15:30 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
Foundations of Multivariate Distributional Reinforcement Learning In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning....

This was joint work with legendary collaborators: @jessefarebro.bsky.social, @arthurgretton.bsky.social, and Mark Rowland.

Paper: arxiv.org/abs/2409.00328
#NeurIPS2024 poster: neurips.cc/virtual/2024...

09.12.2024 15:30 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

How can you 0-shot transfer predictions of long-term performance across reward functions *and* risk-sensitive utilities?

We can do this via Distributional Successor Features. Our recent work introduces the 1st tractable & provably convergent algos for learning DSFs.

#NeurIPS2024 #6704
12 Dec, 11-2

09.12.2024 15:30 ๐Ÿ‘ 16 ๐Ÿ” 4 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 2

The rescaled superiority also preserves consistent action rankings for any distortion risk measure. We design DRL algorithms from these insights, and demonstrate that they are much more robust in a high-frequency option trading domain, *especially* with risk-sensitive utilities.

09.12.2024 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

By *rescaling* the superiority, we can preserve *distributional action gaps* at high frequency. However, these gaps collapse at a slower sqrt(h) rate! Consequently, we discover that Baird's rescaled advantage has unbounded variance, making it tough to estimate in stochastic MDPs.

09.12.2024 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Towards solving this problem, we define the *superiority* as a probabilistic analogue of the advantage. Our axiomatic characterization of the superiority admits a simple and natural representation, despite the fact that superiority samples cannot be observed.

09.12.2024 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Q-Learning at high frequency fails, since action values differ by a quantity proportional to h, the amount of time between actions.

What about return distributions? We show that action-conditioned distributions also collapse, but different statistics collapse at different rates.

09.12.2024 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://arxiv.org/pdf/2410.11022

This was the result of a fantastic collaboration with the OT wizard Yash Jhaveri, Marc G. Bellemare, David Meger, and @patrickshafto.bsky.social.

Paper: arxiv.org/abs/2410.11022
#NeurIPS2024 poster: neurips.cc/virtual/2024...

09.12.2024 14:46 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Video thumbnail

In value-based RL, when decisions are made at high frequency, all hell breaks loose.

Our paper "Action Gaps & Advantages in Continuous-Time Distributional RL" shows how Distributional RL sheds light on this, enabling high-frequency model-free risk-sensitive RL.

#NeurIPS2024 #6410
13 Dec, 11-2

09.12.2024 14:46 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

Distributional SFs: enable 0-shot generalization of return *distribution* functions across a finite-dimensional reward function class

"Foundations of Multivariate Distributional Reinforcement Learning"

#NeurIPS2024 #6704
12 Dec 11am-2pm
neurips.cc/virtual/2024...

Wiltzer Farebrother Rowland

08.12.2024 22:43 ๐Ÿ‘ 5 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0