Robin Jia (@robinjia) — bluesky.baby

Hubble is finally out! We used 200k GPU hours from NAIRR and NVIDIA to build a comprehensive resource for the scientific study of LLM memorization. Fully open-source models & data up to 8B params + 500B tokens with controlled data insertion to study memorization risks 🔭✨

24.10.2025 18:36 👍 7 🔁 1 💬 0 📌 0

Hubble Suite logo (cloth patch with names of key organizations involved: USC, MPI, NVIDIA)

Announcing 🔭Hubble, a suite of open-source LLMs to advance the study of memorization!

Pretrained 1B/8B param models, with controlled insertion of texts designed to emulate key memorization risks: copyright (e.g., book passages), privacy (e.g., synthetic biographies), and test set contamination

24.10.2025 18:21 👍 7 🔁 4 💬 1 📌 2

I had a lot of fun contemplating about memorization questions at the @l2m2workshop.bsky.social panel yesterday together with Niloofar Mireshghallah and Reza Shokri, moderated by
@pietrolesci.bsky.social who did a fantastic job!
#ACL2025

02.08.2025 15:04 👍 12 🔁 2 💬 1 📌 1

Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics Improvements in large language models have led to increasing optimism that they can serve as reliable evaluators of natural language generation outputs. In this paper, we challenge this optimism by th...

Paper link: arxiv.org/abs/2501.14883

30.07.2025 08:16 👍 0 🔁 0 💬 0 📌 0

Automatic metrics for assessing factuality are easy to run and commonly used, but do they work? In < 1 hour, come find the answer at poster 349 in Hall X4, where I’ll be presenting @ameyagodbole.bsky.social ‘s work uncovering inconsistencies, errors, and biases of factuality metrics!

30.07.2025 08:15 👍 2 🔁 0 💬 1 📌 0

I’ll be at ACL 2025 next week where my group has papers on evaluating evaluation metrics, watermarking training data, and mechanistic interpretability. I’ll also be co-organizing the first Workshop on LLM Memorization @l2m2workshop.bsky.social on Friday. Hope to see lots of folks there!

25.07.2025 16:36 👍 2 🔁 0 💬 0 📌 0

LLMs can propose plans and generate action semantics, but struggle with state tracking. Symbolic planners leverage specialized search algorithms, but require predefined action semantics for the environment. PSALM integrates the strengths of both.

Come by @naaclmeeting.bsky.social Poster 6 in Hall 3 from 4-530pm today to see @billzhu.bsky.social's and Ishika Singh's work with me and @robinjia.bsky.social on PSALM: autonomously inducing symbolic pre- and post-conditions of actions with LLMs, symbolic planning, and text environment interaction!

01.05.2025 17:39 👍 6 🔁 1 💬 1 📌 0

Check out @billzhu.bsky.social ‘s excellent work on combining LLMs with symbolic planners at NAACL on Thursday! I will also be at NAACL Friday-Sunday, looking forward to chatting about LLM memorization, interpretability, evaluation, and more

30.04.2025 19:46 👍 3 🔁 0 💬 0 📌 0

At @naaclmeeting.bsky.social this week! I’ll be presenting our work on LLM domain induction with @thomason.bsky.social on Thu (5/1) at 4pm in Hall 3, Section I.

Would love to connect and chat about LLM planning, reasoning, AI4Science, multimodal stuff, or anything else. Feel free to DM!

30.04.2025 18:38 👍 4 🔁 3 💬 0 📌 1

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compare...

Sounds like arxiv.org/abs/2102.07033

17.02.2025 18:07 👍 1 🔁 0 💬 0 📌 0

Excited to share that my intern work at Meta GenAI is accepted to @iclr-conf.bsky.social #ICLR2025

Introducing TLDR: Token-Level Detective Reward Model For Large Vision Language Models.

TLDR provides fine-grained annotations to
each text token.

🔗arXiv: arxiv.org/abs/2410.04734

08.02.2025 05:29 👍 5 🔁 1 💬 1 📌 1

Our workshop on LLM Memorization is coming to ACL 2025! The call for papers is out, please submit both archival and non-archival (work in progress or already published) papers

27.01.2025 23:23 👍 8 🔁 3 💬 0 📌 0

Pre-trained Large Language Models Use Fourier Features to Compute Addition Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-tra...

Links & presentation times:
1. Fourier Features: arxiv.org/abs/2406.03445 Thu, 4:30pm
2. TF + ICL: arxiv.org/abs/2310.17086 Fri, 11am
3. Backdoor detection: arxiv.org/abs/2409.00399 Sat, 1:44pm at AdvML Frontiers
4. LLMs + PDDL: arxiv.org/abs/2406.02791 Sun, 2:30pm at OWA workshop

09.12.2024 22:21 👍 0 🔁 0 💬 0 📌 0

I'll be at #NeurIPS2024! My group has papers analyzing how LLMs use Fourier Features for arithmetic and how TFs learn higher-order optimization for ICL (led by @deqing.bsky.social), plus workshop papers on backdoor detection and LLMs + PDDL (led by @billzhu.bsky.social)

09.12.2024 22:21 👍 23 🔁 3 💬 1 📌 1

A starter pack for #NLP #NLProc researchers! 🎉

go.bsky.app/SngwGeS

04.11.2024 10:01 👍 251 🔁 99 💬 45 📌 13

USC NLP folks are on Bluesky!
Follow my amazing colleagues here

go.bsky.app/KUwSZ6W

12.11.2024 17:44 👍 17 🔁 5 💬 3 📌 2

Started a SoCal AI/ML/NLP researchers starter pack! It's a bit sparse right now, and perhaps more NLP heavy, but hey, nominate yourself and others! go.bsky.app/6QckPj9

19.11.2024 15:28 👍 43 🔁 8 💬 17 📌 1

Robin Jia

Latest posts by Robin Jia @robinjia