awni.bsky.social's Avatar

awni.bsky.social

@awni

phd student @ yale statistics & data science studying the foundations of machine intelligence awni.xyz

10
Followers
43
Following
10
Posts
18.01.2024
Joined
Posts Following

Latest posts by awni.bsky.social @awni

Check out the full paper with Omar Montasser & John Lafferty!

* Paper: arxiv.org/pdf/2505.15927
* Blog: awni.xyz/cot-info/

And come by our poster at NeurIPS in San Diego: neurips.cc/virtual/2025...

#NeurIPS2025 #MachineLearningTheory #LLM #ChainOfThought
[10/10]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

๐Ÿ”ญ Implications for LLM research

* When designing annotation pipelines, investing in rich reasoning traces boosts data efficiency.
* Their value depends on how much internal computation they reveal.
* Enables measuring โ€œtrace qualityโ€ through an information-theoretic lens.

[9/n]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐Ÿงช Theory meets Practice

We empirically validate our theoryโ€™s predictions in simple settings where the CoT information can be computed exactly.

We find that the theory closely predicts the sample-efficiency gains.
[8/n]

25.11.2025 04:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿ”’ CoT Information is fundamental

Our theory provides both upper and lower bounds, showing that CoT information is a fundamental measure of the power of CoT supervision.
[7/n]

25.11.2025 04:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿ” Interpretation

For many reasoning tasks, CoT-Info(ฮต) โ‰ซ ฮต, yielding much faster learning.

The CoT information CoT-Info(ฮต) captures the statistical advantage of CoT data.

CoT-Info(ฮต) / ฮต can be interpreted as the relative value of a CoT sample compared to an end-to-end sample.
[6/n]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Mathematical result on the sample complexity of learning with CoT supervision.

Mathematical result on the sample complexity of learning with CoT supervision.

๐Ÿงฎ The Theory

To distinguish between hypotheses with error ฮต, classical theory tells us we need roughly O(1/ฮต) samples.

We prove that under CoT supervision, the sample complexity improves to O(1/CoTInfo(ฮต)).
[5/n]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Mathematical definition of the CoT information.

Mathematical definition of the CoT information.

๐Ÿง  The Insight: CoT supervision doesnโ€™t just tell the model what to predict; it constrains how it thinks.

We formalize this by introducing the โ€œCoT Informationโ€: a measure of the extra discriminative power gained by observing the reasoning trace, not just the label.
[4/n]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿ’ก Core Problem: Training a model on only Inputโ†’Output (end-to-end) is like teaching a student math by showing them only the final answers.

To learn complex reasoning this way, you need a massive amount of data to rule out all the โ€œwrong waysโ€ to get the โ€œright answer.โ€
[3/n]

25.11.2025 04:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Large language models have been transformed by the shift from โ€œlearn to predict the final answerโ€ to โ€œlearn to predict the reasoning processโ€ via chain-of-thought supervision.

Can we understand why this works from a statistical lens and quantify the advantage?
[2/n]

25.11.2025 04:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
A screenshot of the neurips conference page for this paper.

A screenshot of the neurips conference page for this paper.

๐ŸŒŸ๐Ÿ”— Spotlight #NeurIPS2025 Paper on the Foundations of Chain-of-Thought Learning ๐Ÿ”—๐ŸŒŸ

Excited to share our work developing a learning-theoretic account of the statistical advantage of chain-of-thought supervision in reasoning systems!

Blog: awni.xyz/cot-info

๐Ÿ‘‡๐Ÿงต
[1/n]

25.11.2025 04:27 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0