Our work "On the Hardness of Bandit Learning" co-authored with Nataly Brukhim, Miro Dudik and Robert Schapire got into COLT 2025!
Our work "On the Hardness of Bandit Learning" co-authored with Nataly Brukhim, Miro Dudik and Robert Schapire got into COLT 2025!
3. Feasible Action Search for Bandit Linear Programs via Thompson Sampling - led by Aditya Gangrade and joint with Clayton Scott and Venkatesh Saligrama.
More info to follow. All (mostly) BU authors!
(2/2)
Three works accepted to ICML:
1. Multiple-policy Evaluation via Density Estimation - led by PhD student Yilei Chen, joint with Ioannis Paschalidis.
2. Adaptive Exploration for Multi-Reward Multi-Policy Evaluation - led by Alessio Russo.
(1/2)
New paper βPure Exploration with Feedback Graphsβ with BU postdoc Alessio Russo @alessiorusso.bsky.social and BU PhD student Yichen Song. Accepted for oral presentation @ AISTATS 2025.
[5/5] βORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimizationβ we introduce algorithms that in combination with LLM reward generation can find useful reward shapings using online model selection strategies.
[4/5] βA Theoretical Framework for Partially-Observed Reward States in RLHFβ develops and analyzes a model for RLHF where we posit the human feedback to be generated by a stateful labeler. @mircomutti.bsky.social
[3/5] βSecond Order Bounds for Contextual Bandits with Function Approximationβ proposes algorithms that satisfy regret bounds for contextual bandit problems that scale with the variance of the measurement noise.
[2/5]
and
βORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimizationβ with Chen Bo Calvin Zhang, Zhang-Wei Hong and Pulkit Agrawal.
[1/5] Happy to share a bit about 3 papers accepted into ICLR 2025
βSecond Order Bounds for Contextual Bandits with Function Approximationβ Authored by me
βA Theoretical Framework for Partially-Observed Reward States in RLHFβ with Chinmaya Kausik, Mirco Mutti and Ambuj Tewari
Introducing π§Genie 2 π§ - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents π§ .
It is great to see the Decision Pretrained Transformer used in creative ways: ""HVAC-DPT: A Decision Pretrained Transformer for HVAC Control""!
arxiv.org/pdf/2411.19746
It's very early in the year!
π¨ PSA π¨ Deadline to apply for your dream Phd in ML
@FLAIR_Ox
is coming up on the 2nd of December AOE. We work on compute-only scaling of LLMs, (meta/multi-agent) RL at the Hyperscale, Human-AI coordination, opponent-shaping for vaccine design, GenAI for finance & much more..
paper link?
The secret to doing good research is always to be a little underemployed. You waste years by not being able to waste hours. - Amos Tversky
I have just opened a profile here!