Spandan Karma Mishra (@spandyie)

Huh. Looks like Plato was right.

A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text.

Implications for philosophy and vector databases alike. arxiv.org/pdf/2505.12540

23.05.2025 02:44 👍 254 🔁 45 💬 9 📌 13

AI Agents vs. Agentic #AI: A Conceptual Taxonomy, Applications and Challenges (preprint) arxiv.org/abs/2505.10468

17.05.2025 13:55 👍 6 🔁 4 💬 0 📌 0

The White House has begun process of looking for new secretary of defense The White House has begun the process of looking for a new secretary of defense, according to a U.S. official who was not authorized to speak publicly.

BREAKING NEWS: The White House has begun the process of looking for a new secretary of defense, according to a U.S. official who was not authorized to speak publicly.

21.04.2025 17:25 👍 37469 🔁 7786 💬 3808 📌 3104

They show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. They call this new paradigm “reasoning to learn”.

27.03.2025 03:54 👍 42 🔁 7 💬 1 📌 1

PapersChat – Chat with Research Papers

PapersChat provides an agentic AI interface for querying papers, retrieving insights from ArXiv & PubMed, and structuring responses efficiently.

github.com/AstraBert/Pa...

10.03.2025 04:47 👍 35 🔁 4 💬 0 📌 1

GitHub - matiasmolinas/evolving-agents: Evolving agents is a production-grade environment for orchestrating, evolving, and managing AI agents Evolving agents is a production-grade environment for orchestrating, evolving, and managing AI agents - matiasmolinas/evolving-agents

Show HN: Evolving Agents Framework
https://github.com/matiasmolinas/evolving-agents

https://news.ycombinator.com/item?id=43310963

10.03.2025 00:45 👍 2 🔁 1 💬 1 📌 0

French Senator Claude Malhuret:

"Washington has become Nero’s court, with an incendiary emperor, submissive courtiers and a jester high on ketamine... We were at war with a dictator, we are now at war with a dictator backed by a traitor."

05.03.2025 15:47 👍 81747 🔁 25786 💬 1670 📌 2826

A few words on DeepSeek new releases. Links are:
- github.com/deepseek-ai/...
- github.com/deepseek-ai/...
- github.com/deepseek-ai/...
and the Ultra-Scale Playbook at huggingface.co/spaces/nanot...

27.02.2025 13:41 👍 51 🔁 5 💬 0 📌 1

Just read the s1: Simple Test-Time Scaling paper. Super interesting approach to improving reasoning models!

TL;DR:
1. SFT on 1k curated examples w/ reasoning traces.
2. Control response length w/ budget forcing:
"Wait" tokens → longer reasoning/self-correction.
"Final Answer:" → enforce stopping.

07.02.2025 13:59 👍 38 🔁 6 💬 2 📌 1

Maybe a hot take, but what about the following advice to the next gen:
Don't get an AI degree; the curriculum will be outdated before you graduate. Instead, study math, stats, or physics as your foundation, and stay current with AI through code-focused books, blogs, and papers.

09.02.2025 15:36 👍 147 🔁 22 💬 12 📌 7

A herd of bison stretching off into the distance on a snowy prairie.

Bison should be allowed to roam free and cattle should be restricted to private land.
All abandoned barbed wire should be removed from public land.
The money today being wasted on public lands grazing should go into building wildlife overpasses and installing wildlife safe guide fencing.

07.02.2025 16:46 👍 7667 🔁 1302 💬 167 📌 67

no pun intended but “Attention is all you need”

29.01.2025 17:52 👍 1 🔁 0 💬 0 📌 0

Not one VC would ever fund a startup to do the kind of hardcore optimization work that DeepSeek did.

Every VC firm should be asking themselves why.

28.01.2025 05:00 👍 105 🔁 11 💬 5 📌 2

Haven’t we been doing the same to Google and Facebook for the past 15 years?

27.01.2025 03:02 👍 2 🔁 0 💬 0 📌 0

Scaling Laws for Pre-training Agents and World Models The performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute. This has been demonstrated in domains from robotics to video games, when generat...

Finally finally finally some scaling curves for imitation learning in the large-scale-data regime: arxiv.org/abs/2411.04434

20.01.2025 14:48 👍 54 🔁 8 💬 2 📌 0

GPT2-Nepali (Pretrained from scratch) · rasbt LLMs-from-scratch · Discussion #485 Hi everyone! 👋 I’m excited to share my recent project: GPT2-Nepali, a GPT-2 model pretrained from scratch for the Nepali language. This project builds upon the GPT-2 model training code detailed in...

And here's a great reader project who trained a tokenizer from scratch on Nepali: github.com/rasbt/LLMs-f...

19.01.2025 16:37 👍 6 🔁 1 💬 0 📌 0

Foundations of Large Language Models This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is st...

Nice and fresh content to understand how Large Language Models work: arxiv.org/abs/2501.09223 #LLM #NLP

19.01.2025 15:17 👍 9 🔁 2 💬 0 📌 0

Mastering Tensor Dimensions in Transformers A Blog post by Hafedh Hichri on Hugging Face

This is a wonderfully simple blog on how tensors flow through a transformer model.

Covering:
- Tokenize
- Embed
- Positional Encoding
- Decoder
- Multi-Head Attention
- Add and normalize
- Feed-Forward
- Model Head
- Cross-Attention

Blog:

14.01.2025 13:00 👍 30 🔁 4 💬 1 📌 0

Free Our Feeds! What is it! @freeourfeeds.com

F.O.F. is an independent group with the goal of running THIS👇 social network totally outside of Bluesky.

It's not us. It's a fully independent version of the network. All the same users and posts. Running cooperatively with us and others.

13.01.2025 21:02 👍 1842 🔁 414 💬 56 📌 55

If you’re an AI startups, or interviewing w/ one ask:

What are you the best in the world at?

Do you offer a service, formula, or delivery method you invented?

Is there something you do that’s patentable or a unique user experience?

Have you identified and isolated a market segment?

If not, walk

05.01.2025 22:33 👍 22 🔁 3 💬 0 📌 0

Happy new year 2025

01.01.2025 18:53 👍 2 🔁 0 💬 0 📌 0

Very interesting paper by Ananda Theertha Suresh et al.

For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence 𝐦𝐨𝐝𝐞𝐥 𝐜𝐨𝐥𝐥𝐚𝐩𝐬𝐞 happens more slowly than intuitively expected)

27.12.2024 23:35 👍 35 🔁 5 💬 1 📌 0

lol wait until they realize Vivek is Indian as well

26.12.2024 17:04 👍 1 🔁 0 💬 0 📌 0

Aranym/40-million-bluesky-posts · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Releasing a dataset of 40 million Bluesky posts!

Collected using the Firehose API, I hope people do some cool ML with it.

Anonymized with a data removal mechanism and includes text, language predictions, and image data.

#ai #ml #NLP

huggingface.co/datasets/Ara...

17.12.2024 15:25 👍 6 🔁 1 💬 1 📌 0

only bay area residents have an exclusive right to refer to Sf as the city

24.12.2024 20:13 👍 2 🔁 0 💬 0 📌 0

Eugene Vinitsky

A short list of tips for keeping a clean, organized ML codebase for new researchers: eugenevinitsky.com/posts/quick-...

18.12.2024 20:00 👍 135 🔁 30 💬 12 📌 3

LLM Research Papers: The 2024 List A curated list of interesting LLM-related research papers from 2024, shared for those looking for something to read over the holidays.

Hey all, I've been a bit quiet the last couple of weeks as I am recovering from an accident & injury.

Unfortunately, I couldn’t write my yearly AI research review this year, but here’s at least a list of bookmarked papers you might find useful: magazine.sebastianraschka.com/p/llm-resear...

22.12.2024 14:02 👍 109 🔁 9 💬 15 📌 1

Title card: Alignment Faking in Large Language Models by Greenblatt et al.

New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:

18.12.2024 17:46 👍 126 🔁 29 💬 5 📌 11

it really depends on the type of spice , cumin or coriander pre-sauté, if it’s Garam masala post sauté to preserve the aroma

10.12.2024 01:23 👍 6 🔁 0 💬 0 📌 0

LLMs might secretly be world models of the internet!

By treating LLMs as simulators that can predict "what would happen if I click this?" the authors built an AI that can navigate websites by imagining outcomes before taking action, performing 33% better than baseline. arxiv.org/pdf/2411.06559

03.12.2024 02:00 👍 87 🔁 9 💬 3 📌 1

Spandan Karma Mishra

Latest posts by Spandan Karma Mishra @spandyie