π§βπ¬ Oumayma Mahjoub and Wiem Khilfi will be presenting Sable at #ICML2025.
ποΈWednesday, 16 July, 4:30 PM PDT
πWest Exhibition Hall B2-B3, Poster Number W-820
#MARL #AI #ICML2025
π§βπ¬ Oumayma Mahjoub and Wiem Khilfi will be presenting Sable at #ICML2025.
ποΈWednesday, 16 July, 4:30 PM PDT
πWest Exhibition Hall B2-B3, Poster Number W-820
#MARL #AI #ICML2025
If you are interested, have a look at the full paper and code:
πPaper: arxiv.org/abs/2410.01706
π§βπ»Code: bit.ly/4eMUXhn
πWebsite/Data: sites.google.com/view/sable-m...
(7/N)
π A massive thank you to my incredible co-authors Oumayma Mahjoub, Ruan De Kock, Wiem Khlifi, Simon Du Toit, Jemma Daniel, Louay Ben Nessir, Louise Beyers, Claude Formanek & Arnu Pretorius
(6/N)
β‘Despite its power, Sable is remarkably efficient. It scales to over 1000 agents with linear memory increase and boasts 7x better GPU memory efficiency and up to a 6.5x improvement in throughput compared to MAT (previous SOTA).
(5/N)
π¬In a benchmark across 45 diverse tasks (the largest in the literature), Sable substantially outperformed existing methods, ranking best 11 times more often than previous SOTA methods.
(4/N)
πͺ Our solution? Sable adapts the retention mechanism from Retentive Networks (RetNets) and achieves centralised learning advantages without the associated drawbacks. This allows for efficient, long-term memory and impressive scalability.
(3/N)
π€ The challenge? Centralised training in MARL performs well but cannot scale, limiting its use to scenarios with only a few agents. This creates a trade-off between performance and agent scalability.
(2/N)
Sable graphed against the Multi-Agent Transformer (MAT), showing Sable outperforms MAT in performance, throughput and GPU memory useage
π¨ Thrilled to share our #ICML2025 paper: "Sable: a Performant, Efficient and Scalable Sequence Model for MARL"!
We introduce a new SOTA cooperative Multi-Agent Reinforcement Learning algorithm that delivers the advantages of centralised learning without its drawbacks.
(1/N)
Please add me π
Totally agree with you on the filtering point, but we're all are pretty bad at predicting what papers will be useful in future e.g PPO was rejected.
So maybe only reviewing for soundness is a good thing?
Can you add me π
End-to-end compiling RL algorithms and envs and running everything across multiple TPU cores/GPUs, so that you never have to communicate anything with the CPU. This gives ridiculous speed ups, on the order 100x depending on environment. I don't think torch is there yet.