But for most non-work use such as drafting, summarizing, and everyday “think with me” tasks, it’s more than sufficient. Btw, just to be clear: this isn’t sponsored :)
But for most non-work use such as drafting, summarizing, and everyday “think with me” tasks, it’s more than sufficient. Btw, just to be clear: this isn’t sponsored :)
The pricing is also straightforward: roughly $10 per month, which is refreshingly reasonable for something you’ll likely open dozens of times a day. The main trade-off is the smaller context window. If you’re doing heavy, document-sized workflows, you may notice the limits sooner than you’d like.
I’ve recently started using duck.ai in place of ChatGPT, and it’s been a pleasantly practical switch. It supports a solid lineup of recent models, and the privacy posture feels more reassuring than most—enough that I’d be comfortable recommending it.
Honestly, a traditional Chinese music performance or some live classical music might’ve fit the mood better; something that vibes with the researcher energy, you know? Anyway, just my personal take; huge thanks to the EMNLP 2025 organizers for all their hard work overall! 🙌
Maybe next time a different kind of performer would work better — or maybe by gala time, everyone’s just too tired to dance or cheer anyway. The hall was massive, the music was loud, and the vibe was... chaotic. Makes you wonder who thought that setup was a good idea.
And finally — here’s a short clip from the EMNLP 2025 gala.
They brought in a DJ… which was, well, kind of hilarious. Almost everyone was just standing there filming or staring at the stage, while the poor DJ was giving it their all with zero crowd reaction. 😂
All in all, it was a whirlwind — exciting, chaotic, and a reminder of how fast the field (and crowd) is growing.
Last thought — maybe EMNLP should eventually just pick one permanent venue instead of hopping cities every year. Would save costs, make setup easier, and we could finally get stable facilities. But hey, there’s probably some politics behind the scenes I don’t know about. 😅
At this point, maybe it’s time to bring in professionals from the event industry to handle large-scale logistics — there’s so much potential to make it smoother.
Oh, and let’s talk logistics. Despite the organizers’ hard work (and seriously, props to them), there were way too few power outlets and Wi-Fi spots. The expo hall was huge, yet no place to sit. Same story for meals and even the gala.
Because of that, you see people from every possible field jumping in which is both fascinating and exciting to watch. Still, it feels like core NLP and LM research is getting a bit buried under a mountain of application papers. Not complaining; just feels like we could use a bit more balance again.
While flipping through the papers, I realized just how broad things have gotten. If your work even remotely uses a language model or touches NLP, it basically qualifies for EMNLP these days (mine included 😅).
Honestly, EMNLP feels like a full-blown computer science mega-conference now. WHOVA app showed about 200 Korean attendees, which means there were probably thousands overall — wild. Back in my PhD days, the whole conference might’ve had 500 people tops. The scale (and hype) is on another level now.
Just got back from EMNLP 2025 in Suzhou. Didn’t make it to the workshops or tutorials — too little time, too many cities (had to hop over to Shanghai and Nanjing for work). Busy, busy week.
My dad, explaining AI to my mom: “It doesn’t think. It copies stuff wrong.”
That’s why even today, Vietnamese words like sinh viên (student), thạc sĩ (master’s), and tiến sĩ (doctor) still carry that old imperial-exam flavor — total Confucian vibes.
Btw, got this idea from this post: bsky.app/profile/hear...
Back then, this hierarchy came from the Confucian exam systems in China, Korea, and Vietnam.
China ended it in 1898. Korea (Joseon) ended in 1894. Vietnam kept it till 1918 (!).
So in summary
Saengwon (생원) = undergrad vibe
Seoksa (석사) = master’s vibe
Jinsa (진사) = Ph.D. + civil servant flex
3. Kim Jinsa — the Doctor (進士; tiến sĩ)
→ title for those who passed the highest imperial examination. The elite scholar-officials; like having a Ph.D. and a government job rolled into one.
2. Kim Seoksa — the Master (碩士; thạc sĩ)
→ literally means “great scholar.” In the old exam system, a learned person without an official post yet. Kinda the “still studying, but smart” phase.
If your surname’s Kim and if we went back to the Joseon Dynasty era, people in Korea would’ve called you like this —
1. Kim Saengwon — the Bachelor (生員; sinh viên)
→ used to mean a student who passed the first level of the imperial exam. Basically, a recognized scholar-in-training.
Honestly, with the explosion of papers these days, I kind of feel like we need an “IMDB for research papers” — people just upload everything to arXiv, and then there’s a site purely for reviews and ratings afterward. Wouldn’t that make life so much easier? Just dropping by to say I’m still alive.
Makes me wonder… does anyone actually manage to read all of them? Or even keep up? Anyway, it’s a fascinating time to be in this field. I wasn’t originally an NLP person, but somehow work keeps pulling me deeper into it — resistance is futile, I guess.
It’s been a while since I last went to conferences, but this year I’m finally heading to EMNLP 2025. Apparently, there are tons of papers being presented. I haven’t checked the exact number, but it’s easily in the thousands once you count the ones from the Findings track.
Overall, it’s kind of like working with a super smart PhD student… who also has terrible memory 🙃
That said, once the context starts to slip, it still struggles to solve problems — and that hasn’t really changed. I’m not sure if that’s just an inherent limitation of autoregressive models, but it doesn’t feel like a fix is coming anytime soon.
After trying out GPT-5, I’ve gotta say — its coding skills are miles ahead of the o3/o4 models. It doesn’t just spit out short snippets either; it’ll happily generate full, runnable code, even for really niche, domain-specific problems. The intelligence is impressive.
xjdr & @_xjdr Ok, initial GPT5 / Opus 4.1 eval is over. I could spend the rest of the week on this but i have other things to do sadly. The set up was to have the combination of these 2 models implement DeepSeeks NSA from scratch which has little to no prior art besides the whitepaper to go off (which is why i like it as an eval). For this test, i did not write a single line of code or docs to the repo, just prompts. There were no elaborate scaffolds or rules, i just vibe prompted everything. I refrained from telling either model what to do or how to do it, i just forcefully directed it back to the whitepaper as the spec and had it create tests and rules to accomplish its goal. the only thing omitted from this commit are hundreds of little tests that are mostly represented in the larger tests that were committed. the setup ended up being pretty simple, GPT5 would generate the specs and do reviews via
codex) and Opus 4.1 / Sonnet would write the code (via claude code). to keep things simple, i just had them each up in tux and used capture pane to share context between the 2. Result: Is this production ready? Absolutely not. Is this better than i would get from most researchers / engineers in 48 hours? Absolutely. Is this scalable / could this be done unsupervised? Absolutely not. this would have not worked without me driving it hard and steering it in the right directions. Is GPT5 a good model? It is super autistic so subtle prompts and inferred context is completely lost on it, but it is a superhuman peer reviewer and sysadmin. I can concur that in the coding and math domains, hallucination has been significantly reduced but that feels like it came at the expense of creativity (probably a fine tradeoff for most people). The intelligence gap between GPT5 and Opus via ClaudeCode is noticeably wide (GPT5 is smarter) but the Sonnet / Opus (ultrathink) combo is a hard to beat as a worker bee (codex is not even close). I've tried every other CLI and agent harness and,
I've tried every other CLI and agent harness and, for me, CC is leading in UX by a country mile. I'd feel pretty confident saying the output in this repo is at the pareto frontier of what is possible for a combination of Al models to produce (with expert human supervision) as of today. Its not slop but its not deployable. In the end, both models got stuck on an illegal memory issue (CUDA gunna CUDA) which will most likely require me to dive in and fix the glaring holes in this code (thus negating the test results). That said, if a junior or mid level engineer brought this to me after a weekend worth of work as a starting point for a new research project, i'd be pretty happy and impressed. I get slightly annoyed when people talk about Al coding and dont share the code (i am very guilty of this and as this is something i can finally share, i thought i'd be the change i want to see in the world. I am most likely not going to work on this anymore as i have a working NSA implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code?
I am most likely not going to work on this anymore as i have a working NSA implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code? impressive or terrible? Slop cannon or HOLY SHIT?! github.com/Noumena-Networ...
GPT-5 (codex-cli) vs Opus 4.1 (Claude Code)
@xjdr.bsky.social did an experiment contrasting these two:
GPT-5 is: “super autistic, so subtle prompts[..] are lost on it, but it is a superhuman peer reviewer and sysadmin”
github (code produced): github.com/Noumena-Netw...
OP: x.com/_xjdr/status...
It’s just matrix multiplication
huh
foundation model architecture shift coming?