Home New Trending Search
About Privacy Terms
#
#offpolicy
Posts tagged #offpolicy on Bluesky
Original post on hachyderm.io

"Mediators, confounders, colliders – a crash course in causal inference"
by Florian Hartig (2019): theoreticalecology.wordpress.com/2019/04/14/mediators-con...

#offPolicy #causality #causalInference #stats #statistics #counterFactuals […]

1 0 0 0
Off-Policy RL with Stale Data Boosts LLM Training

Off-Policy RL with Stale Data Boosts LLM Training

Second‑Moment Trust Policy Optimization (M2PO) cut clipped tokens from ~1.22 % to 0.06 % and matched on‑policy results on LLMs up to 32 B parameters despite data being stale by 256 updates. getnews.me/off-policy-rl-with-stale... #offpolicy #llm

0 0 0 0
Off‑Policy Max‑Entropy RL with Future Visitation Rewards

Off‑Policy Max‑Entropy RL with Future Visitation Rewards

ArXiv posted December 2024 proposes an intrinsic reward via KL‑divergence of future state‑action visitation versus a reference, enabling off‑policy learning from replay buffers. getnews.me/off-policy-max-entropy-r... #maxentropyrl #offpolicy

0 0 0 0
Group‑Relative REINFORCE Revealed as an Off‑Policy Method for LLM Training

Group‑Relative REINFORCE Revealed as an Off‑Policy Method for LLM Training

Group‑Relative REINFORCE can act as an off‑policy method, enabling reuse of existing data and cutting costly rollouts. The paper was submitted in September 2025. getnews.me/group-relative-reinforce... #grouprelativereinforce #offpolicy

0 0 0 0
Data Rewriting Improves Stability of Off‑Policy LLM Fine‑Tuning

Data Rewriting Improves Stability of Off‑Policy LLM Fine‑Tuning

The data‑rewriting approach for fine‑tuning was tested on five mathematical reasoning datasets and outperformed supervised fine‑tuning. Code and data will be released on GitHub. Read more: getnews.me/data-rewriting-improves-... #offpolicy #ml

0 0 0 0