The polite thing now is to assume the robot. The intimate thing is to be surprised when it’s not.
tomtunguz.com/is-this-toma...
The polite thing now is to assume the robot. The intimate thing is to be surprised when it’s not.
tomtunguz.com/is-this-toma...
Does it matter?
But maybe the people writing “Hi, Tomasz or Tomasz’s agent” have it right. They’re not being rude. They’re being realistic. They’ve adapted to a world where the answer might come from either side of the curtain, & they’ve decided not to care which.
A friend sends voice memos instead of texts now. “So you know it’s actually me,” he said. But how do you know? ElevenLabs can clone a voice from thirty seconds of audio. The ums, the pauses, the little laugh—all reproducible.
Every customer support call is now with an AI agent. The voice sounds real. They are infinitely knowledgeable. The responses are fast. Does it matter that it’s not a person?
Gmail suggests my reply before I’ve thought it. “Sounds good!” “Thanks for sending!” “Let’s circle back next week.” The machine knows what I’d say. Sometimes I click it. Sometimes I wonder if the person on the other end can tell.
Which raises an odd question : what does it mean to write to someone when you expect a machine to answer?
“Hi, Tomasz or Tomasz’s agent.”
I’ve started receiving emails that begin this way. A byproduct, I suppose, of having written so much about AI. People now assume my inbox is monitored by robots.
Three months from data center to laptop. The buy-vs-rent math just changed.
tomtunguz.com/qwen-9b-matc...
Queue them up. Let them run overnight. For complex agentic workflows that spawn dozens of parallel threads, local inference may not be worth the wait. The economics favor depth over breadth : fewer tasks, run longer, run cheaper.
The tradeoff is parallelization. Cloud APIs handle thousands of concurrent requests. A laptop runs one inference at a time. For simple tasks — summarization, drafting, Q&A — that’s fine.
What changes when frontier intelligence runs locally? Everything I send to cloud APIs today — drafting emails, researching companies, writing code, analyzing documents — stays on my machine. No API logs. No third-party retention. No outages. No rate limits.
It isn’t an intelligence compromise. Reasoning, coding, agentic workflows, document processing, instruction following : the 9B model matches December’s frontier across the board.
A $5,000 laptop — a MacBook Pro with enough memory to run Qwen locally — pays for itself after 556 million tokens. At my usage rate, that’s about a month. At 20 million tokens per day, it’s four weeks.
After payback, the marginal cost drops to electricity.
This week, Alibaba released Qwen3.5-9B, an open-source model that matches Claude Opus 4.1 from December 2025. It runs locally on 12GB of RAM. Three months ago, this capability required a data center. Now it requires a power outlet.
My peak days hit 80 million tokens. My average days run 20 million. Cloud inference at frontier-model pricing adds up fast.
That’s running Kimi K2.5, a serverless model via API. At Claude or OpenAI rates — roughly $9 per million tokens blended — equivalent usage would cost $756 for a single day’s work.
I burned 84 million tokens on February 28th. Researching companies, drafting memos, running agents.
I took a photo & shared it with Claude & walked away. Workflows as images work beautifully.
The agents run in the background. The memo sat in my inbox, formatted, sourced, ready to send.
Not prompts. Blueprints.
tomtunguz.com/filling-the-...
This morning’s notebook page :
But this leverage requires planning. Now I sketch the workflow before I touch the machine. I anticipate the decision branches : what if the company isn’t in the CRM? What if the website is down or the call transcript isn’t available? I flag the gaps before the agent encounters them.
A year ago, this was necessary. The models couldn’t hold a complex task in their heads. Now they can.
Micromanagement at 10x speed. The agent would finish a step, then wait. I’d scan the output, type the next instruction, wait again. Prompt, response, prompt, response. I was the bottleneck in my own system.
I hate to micromanage & I’ve been micromanaging AI.
A few months ago, I’d use Claude for a familiar workflow : capturing notes from a meeting, drafting a follow-up email, updating the CRM, writing the investment memo.
The question : how to protect an asset that takes hundreds of millions to develop when it can be copied in a month?
tomtunguz.com/white-label-...
In pharma, the generic window opens after two decades. In AI, it opens in weeks. DeepSeek V3 costs $0.14 per million tokens. GPT-5.2 costs $1.75. Same capability. Different label. The 90% discount isn’t coming. It’s here.
Pharma companies spend billions developing a molecule, then enjoy 20 years of patent protection to recoup R&D costs before generics flood the market. AI follows the same pattern - massive R&D costs upfront, then commoditization. But the timeline is compressed.
In the US, Chinese models also price at a discount. Together AI charges $1.25 per million input tokens for DeepSeek V3. DeepInfra offers $0.21 per million. DeepSeek’s own API charges $0.14 - 12x less than GPT-5.2.
Third, DeepSeek set the floor. They trained V3 for $6 million versus OpenAI’s $100 million+ for GPT-4, price at $0.14 per million input tokens & hit $220 million ARR with 122 employees.
Second, hyperscalers subsidize AI to win cloud customers. Alibaba Cloud cut LLM pricing by up to 97%. Baidu, ByteDance & Tencent spent $1.1B on AI subsidies during Chinese New Year 2026 alone.
First, distillation commoditizes capability. Anthropic accused DeepSeek, Minimax & Moonshot AI of conducting “industrial-scale campaigns” to extract knowledge from Claude. OpenAI made similar accusations to Congress.