Rapidly rebranding all my search benchmarks as eval awareness benchmarks
Rapidly rebranding all my search benchmarks as eval awareness benchmarks
What I learned from being a CTO: there is always someone waiting for me to do something that I haven't done yet
Just over 50% of Gradient Labs is builders now, we wrote up a bit about our ways of working as we navigate this super competitive & ever-changing space
blog.gradient-labs.ai/p/creating-a...
To date, HCI researchers have had no support on signaling their paper's relevance to AI, esp. when that connection is tenuous at best. We introduce a systematic framework to ensure LLMs are mentioned at every stage of paper reportingβfrom framing, to evaluation, to implications.
Progress in customer software has been about adopting the next wave and pushing forward, rather than _being_ the next wave.
Every company that was mobile-first a decade ago leapfrogged everyone else and invented new experiences (but they did not need to build mobile phones themselves)
The second hardest is βwhat would help my users achieve their goal [faster, better, more safely]?β
Product features are useful because they donβt need you to know to ask for them; they empower you to think beyond your base need
The βvibeβ influx is being presented in absolute terms (build with AI or buy), but almost all technology impacts *what* and *how* you build, not whether you build at all
That one means: just because you can, doesnβt mean you should
The hardest problem in product has never been "what should we build" - it remains "what is the highest leverage thing we should build next" (and, critically, "what should we absolutely not build")
Worlds colliding: itβs now easier than ever for a provider to amend their product to your needs, and they should be actively thinking about enabling you to do that
(the prior world being: βyour feature request is in the backlogβ)
To be fair to the vibe camp: the bar for what constitutes a viable βbuyβ idea is moving up
Vibe vs buy: we write software to empower others (what is the *userβs* goal?) and majority of the time the builders and users goals are different
Vibe vs buy, 5 years from now: company (1) is advertising jobs to vibe-build their in-house CRM, company (2) is hiring for their most distinctive AI product. What kind of talent do each one attract?
Vibe vs buy: are the hands-on people in that company's department responsible for (1) optimising their own role for impact, or (2) just getting the work done?
In camp (2) it's technically not their job to be iterating on the tools & products they use
Vibe vs buy: what is the gap between (a) whoever would materialise the software into existence, and (b) whoever would use it? If there is none, maybe the vibed approach is preferred
Vibe vs buy: the point of software products is that they evolve and update, as we learn and grow. Even the most excellent AI generated code is a point-in-time outcome
Vibe vs buy: you can see Anthropic's C-compiler as both (1) a wonderful feat of science & engineering, because it succeeded, and (2) an epic waste of tokens, because we have plenty of these already
Vibe vs buy: when you buy an AI product, particularly with outcome-based pricing, you're also buying accountability (the AI product needs to be good)
Vibe vs buy: if company (a) vibe codes everything they need, across their whole company, and company (b) vehemently focuses on the uniqueness of the product they're building - which one is more likely to succeed?
Vibe vs buy: even if you can code, you're still buying _something_. The choice you're making is the level of abstraction (remember, you can still buy a server rack)
Vibe vs buy: you've never been paying for the code; you're paying for "someone else has thought this problem through in much more depth that I'm willing or able to"
it's been weeks and I haven't seen a single mention of the SaaS-pocalyse here
Okay, this is awkward. Two year old PRs on the docs repo. Completely missed.
For context: this repo had not really been updated since Python 3.8 and 3.9 reached their EOL
The idea that coding agents prove "learn to code was a mistake" is kind of analogous to thinking the printing press meant "psych! y'all wasted a lot of time learning how to write."
Computational thinking is getting easier to pick up, and also providing more leverage than before.
Fascinated that my old modelstore library is still getting love (PRs, downloads), am throwing our friend Claude at some improvements
github.com/operatorai/m...
We started working with Wise many months ago, am super glad thatβs now public π
We've launched our outbound agent today. It's been great seeing this come to life; an AI initiated conversation is so different from the reverse
gradient-labs.ai/product/outb...
Cold reach out with: "I'd like to grab a coffee and understand where your business is headed"
What do I even say to this
I mean if you care more about speed than fidelity, there are model choices you could make to speed things up... theonion.com/ai-chatbot-t...