#offpolicy — Bluesky Posts

bluesky.baby

Profile Explorer

Home New Trending Search

About Privacy Terms

#offpolicy

Posts tagged #offpolicy on Bluesky

Eric Maugendre about data

@maugendre.hachyderm.io.ap.brid.gy

3 weeks ago

Original post on hachyderm.io

"Mediators, confounders, colliders – a crash course in causal inference"
by Florian Hartig (2019): theoreticalecology.wordpress.com/2019/04/14/mediators-con...

#offPolicy #causality #causalInference #stats #statistics #counterFactuals […]

1 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Off-Policy RL with Stale Data Boosts LLM Training

Second‑Moment Trust Policy Optimization (M2PO) cut clipped tokens from ~1.22 % to 0.06 % and matched on‑policy results on LLMs up to 32 B parameters despite data being stale by 256 updates. getnews.me/off-policy-rl-with-stale... #offpolicy #llm

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Off‑Policy Max‑Entropy RL with Future Visitation Rewards

ArXiv posted December 2024 proposes an intrinsic reward via KL‑divergence of future state‑action visitation versus a reference, enabling off‑policy learning from replay buffers. getnews.me/off-policy-max-entropy-r... #maxentropyrl #offpolicy

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Group‑Relative REINFORCE Revealed as an Off‑Policy Method for LLM Training

Group‑Relative REINFORCE can act as an off‑policy method, enabling reuse of existing data and cutting costly rollouts. The paper was submitted in September 2025. getnews.me/group-relative-reinforce... #grouprelativereinforce #offpolicy

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Data Rewriting Improves Stability of Off‑Policy LLM Fine‑Tuning

The data‑rewriting approach for fine‑tuning was tested on five mathematical reasoning datasets and outperformed supervised fine‑tuning. Code and data will be released on GitHub. Read more: getnews.me/data-rewriting-improves-... #offpolicy #ml

0 0 0 0