Aldo Pacchiano's Avatar

Aldo Pacchiano

@aldopacchiano

AI research at Broad Institute and Boston University. Reinforcement Learning / Bandits / Experiment Design Mexicano πŸ‡²πŸ‡½

176
Followers
73
Following
13
Posts
11.11.2024
Joined
Posts Following

Latest posts by Aldo Pacchiano @aldopacchiano

Our work "On the Hardness of Bandit Learning" co-authored with Nataly Brukhim, Miro Dudik and Robert Schapire got into COLT 2025!

03.05.2025 19:23 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

3. Feasible Action Search for Bandit Linear Programs via Thompson Sampling - led by Aditya Gangrade and joint with Clayton Scott and Venkatesh Saligrama.

More info to follow. All (mostly) BU authors!

(2/2)

02.05.2025 16:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Three works accepted to ICML:

1. Multiple-policy Evaluation via Density Estimation - led by PhD student Yilei Chen, joint with Ioannis Paschalidis.

2. Adaptive Exploration for Multi-Reward Multi-Policy Evaluation - led by Alessio Russo.

(1/2)

02.05.2025 16:47 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

New paper β€œPure Exploration with Feedback Graphs” with BU postdoc Alessio Russo @alessiorusso.bsky.social and BU PhD student Yichen Song. Accepted for oral presentation @ AISTATS 2025.

12.03.2025 22:04 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

[5/5] β€œORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization” we introduce algorithms that in combination with LLM reward generation can find useful reward shapings using online model selection strategies.

30.01.2025 00:25 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

[4/5] β€œA Theoretical Framework for Partially-Observed Reward States in RLHF” develops and analyzes a model for RLHF where we posit the human feedback to be generated by a stateful labeler. @mircomutti.bsky.social

30.01.2025 00:25 πŸ‘ 4 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

[3/5] β€œSecond Order Bounds for Contextual Bandits with Function Approximation” proposes algorithms that satisfy regret bounds for contextual bandit problems that scale with the variance of the measurement noise.

30.01.2025 00:25 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[2/5]

and

β€œORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization” with Chen Bo Calvin Zhang, Zhang-Wei Hong and Pulkit Agrawal.

30.01.2025 00:25 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[1/5] Happy to share a bit about 3 papers accepted into ICLR 2025

β€œSecond Order Bounds for Contextual Bandits with Function Approximation” Authored by me

β€œA Theoretical Framework for Partially-Observed Reward States in RLHF” with Chinmaya Kausik, Mirco Mutti and Ambuj Tewari

30.01.2025 00:25 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.

04.12.2024 16:01 πŸ‘ 235 πŸ” 61 πŸ’¬ 15 πŸ“Œ 30

It is great to see the Decision Pretrained Transformer used in creative ways: ""HVAC-DPT: A Decision Pretrained Transformer for HVAC Control""!

arxiv.org/pdf/2411.19746

04.12.2024 16:32 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It's very early in the year!

02.12.2024 20:01 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
DPhil in Engineering Science | University of Oxford About the courseThe DPhil in Engineering Science will offer you the opportunity to develop in-depth knowledge, understanding and expertise in your chosen field of engineering research. To support

🚨 PSA 🚨 Deadline to apply for your dream Phd in ML
@FLAIR_Ox
is coming up on the 2nd of December AOE. We work on compute-only scaling of LLMs, (meta/multi-agent) RL at the Hyperscale, Human-AI coordination, opponent-shaping for vaccine design, GenAI for finance & much more..

29.11.2024 19:45 πŸ‘ 16 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

paper link?

30.11.2024 09:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The secret to doing good research is always to be a little underemployed. You waste years by not being able to waste hours. - Amos Tversky

19.11.2024 18:57 πŸ‘ 38 πŸ” 3 πŸ’¬ 0 πŸ“Œ 1

I have just opened a profile here!

11.11.2024 13:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0