Hyeongsik Kim's Avatar

Hyeongsik Kim

@yy20716

The official account of Hyeongsik Kim. I enjoy reading rejected papers — they’re full of brilliant ideas that just needed a bit more time (or better reviewer luck). LinkedIn: https://www.linkedin.com/in/hskim0/

14
Followers
34
Following
40
Posts
01.12.2024
Joined
Posts Following

Latest posts by Hyeongsik Kim @yy20716

But for most non-work use such as drafting, summarizing, and everyday “think with me” tasks, it’s more than sufficient. Btw, just to be clear: this isn’t sponsored :)

19.02.2026 00:48 👍 1 🔁 0 💬 0 📌 0

The pricing is also straightforward: roughly $10 per month, which is refreshingly reasonable for something you’ll likely open dozens of times a day. The main trade-off is the smaller context window. If you’re doing heavy, document-sized workflows, you may notice the limits sooner than you’d like.

19.02.2026 00:48 👍 0 🔁 0 💬 1 📌 0
DuckDuckGo AI Chat at DuckDuckGo DuckDuckGo. Privacy, Simplified.

I’ve recently started using duck.ai in place of ChatGPT, and it’s been a pleasantly practical switch. It supports a solid lineup of recent models, and the privacy posture feels more reassuring than most—enough that I’d be comfortable recommending it.

19.02.2026 00:48 👍 1 🔁 0 💬 2 📌 0

Honestly, a traditional Chinese music performance or some live classical music might’ve fit the mood better; something that vibes with the researcher energy, you know? Anyway, just my personal take; huge thanks to the EMNLP 2025 organizers for all their hard work overall! 🙌

10.11.2025 00:55 👍 0 🔁 0 💬 0 📌 0

Maybe next time a different kind of performer would work better — or maybe by gala time, everyone’s just too tired to dance or cheer anyway. The hall was massive, the music was loud, and the vibe was... chaotic. Makes you wonder who thought that setup was a good idea.

10.11.2025 00:55 👍 0 🔁 0 💬 1 📌 0
Video thumbnail

And finally — here’s a short clip from the EMNLP 2025 gala.
They brought in a DJ… which was, well, kind of hilarious. Almost everyone was just standing there filming or staring at the stage, while the poor DJ was giving it their all with zero crowd reaction. 😂

10.11.2025 00:55 👍 0 🔁 0 💬 1 📌 0

All in all, it was a whirlwind — exciting, chaotic, and a reminder of how fast the field (and crowd) is growing.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

Last thought — maybe EMNLP should eventually just pick one permanent venue instead of hopping cities every year. Would save costs, make setup easier, and we could finally get stable facilities. But hey, there’s probably some politics behind the scenes I don’t know about. 😅

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

At this point, maybe it’s time to bring in professionals from the event industry to handle large-scale logistics — there’s so much potential to make it smoother.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

Oh, and let’s talk logistics. Despite the organizers’ hard work (and seriously, props to them), there were way too few power outlets and Wi-Fi spots. The expo hall was huge, yet no place to sit. Same story for meals and even the gala.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

Because of that, you see people from every possible field jumping in which is both fascinating and exciting to watch. Still, it feels like core NLP and LM research is getting a bit buried under a mountain of application papers. Not complaining; just feels like we could use a bit more balance again.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

While flipping through the papers, I realized just how broad things have gotten. If your work even remotely uses a language model or touches NLP, it basically qualifies for EMNLP these days (mine included 😅).

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

Honestly, EMNLP feels like a full-blown computer science mega-conference now. WHOVA app showed about 200 Korean attendees, which means there were probably thousands overall — wild. Back in my PhD days, the whole conference might’ve had 500 people tops. The scale (and hype) is on another level now.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

Just got back from EMNLP 2025 in Suzhou. Didn’t make it to the workshops or tutorials — too little time, too many cities (had to hop over to Shanghai and Nanjing for work). Busy, busy week.

10.11.2025 00:48 👍 0 🔁 0 💬 1 📌 0

My dad, explaining AI to my mom: “It doesn’t think. It copies stuff wrong.”

27.10.2025 00:54 👍 4098 🔁 1006 💬 15 📌 0

That’s why even today, Vietnamese words like sinh viên (student), thạc sĩ (master’s), and tiến sĩ (doctor) still carry that old imperial-exam flavor — total Confucian vibes.

Btw, got this idea from this post: bsky.app/profile/hear...

27.10.2025 19:13 👍 0 🔁 0 💬 0 📌 0

Back then, this hierarchy came from the Confucian exam systems in China, Korea, and Vietnam.

China ended it in 1898. Korea (Joseon) ended in 1894. Vietnam kept it till 1918 (!).

27.10.2025 19:13 👍 0 🔁 0 💬 1 📌 0

So in summary

Saengwon (생원) = undergrad vibe
Seoksa (석사) = master’s vibe
Jinsa (진사) = Ph.D. + civil servant flex

27.10.2025 19:13 👍 0 🔁 0 💬 1 📌 0

3. Kim Jinsa — the Doctor (進士; tiến sĩ)
→ title for those who passed the highest imperial examination. The elite scholar-officials; like having a Ph.D. and a government job rolled into one.

27.10.2025 19:13 👍 0 🔁 0 💬 1 📌 0

2. Kim Seoksa — the Master (碩士; thạc sĩ)
→ literally means “great scholar.” In the old exam system, a learned person without an official post yet. Kinda the “still studying, but smart” phase.

27.10.2025 19:13 👍 0 🔁 0 💬 1 📌 0

If your surname’s Kim and if we went back to the Joseon Dynasty era, people in Korea would’ve called you like this —

1. Kim Saengwon — the Bachelor (生員; sinh viên)
→ used to mean a student who passed the first level of the imperial exam. Basically, a recognized scholar-in-training.

27.10.2025 19:13 👍 1 🔁 0 💬 1 📌 0

Honestly, with the explosion of papers these days, I kind of feel like we need an “IMDB for research papers” — people just upload everything to arXiv, and then there’s a site purely for reviews and ratings afterward. Wouldn’t that make life so much easier? Just dropping by to say I’m still alive.

27.10.2025 05:11 👍 0 🔁 0 💬 0 📌 0

Makes me wonder… does anyone actually manage to read all of them? Or even keep up? Anyway, it’s a fascinating time to be in this field. I wasn’t originally an NLP person, but somehow work keeps pulling me deeper into it — resistance is futile, I guess.

27.10.2025 05:11 👍 0 🔁 0 💬 1 📌 0

It’s been a while since I last went to conferences, but this year I’m finally heading to EMNLP 2025. Apparently, there are tons of papers being presented. I haven’t checked the exact number, but it’s easily in the thousands once you count the ones from the Findings track.

27.10.2025 05:11 👍 0 🔁 0 💬 1 📌 0

Overall, it’s kind of like working with a super smart PhD student… who also has terrible memory 🙃

11.08.2025 17:44 👍 0 🔁 0 💬 0 📌 0

That said, once the context starts to slip, it still struggles to solve problems — and that hasn’t really changed. I’m not sure if that’s just an inherent limitation of autoregressive models, but it doesn’t feel like a fix is coming anytime soon.

11.08.2025 17:44 👍 0 🔁 0 💬 1 📌 0

After trying out GPT-5, I’ve gotta say — its coding skills are miles ahead of the o3/o4 models. It doesn’t just spit out short snippets either; it’ll happily generate full, runnable code, even for really niche, domain-specific problems. The intelligence is impressive.

11.08.2025 17:44 👍 0 🔁 0 💬 1 📌 0
xjdr &
@_xjdr
Ok, initial GPT5 / Opus 4.1 eval is over. I could spend the rest of the week on this but i have other things to do sadly. The set up was to have the combination of these 2 models implement DeepSeeks NSA from scratch which has little to no prior art besides the whitepaper to go off (which is why i like it as an eval).
For this test, i did not write a single line of code or docs to the repo, just prompts. There were no elaborate scaffolds or rules, i just vibe prompted everything.
I refrained from telling either model what to do or how to do it, i just forcefully directed it back to the whitepaper as the spec and had it create tests and rules to accomplish its goal. the only thing omitted from this commit are hundreds of little tests that are mostly represented in the larger tests that were committed.
the setup ended up being pretty simple, GPT5 would generate the specs and do reviews via

xjdr & @_xjdr Ok, initial GPT5 / Opus 4.1 eval is over. I could spend the rest of the week on this but i have other things to do sadly. The set up was to have the combination of these 2 models implement DeepSeeks NSA from scratch which has little to no prior art besides the whitepaper to go off (which is why i like it as an eval). For this test, i did not write a single line of code or docs to the repo, just prompts. There were no elaborate scaffolds or rules, i just vibe prompted everything. I refrained from telling either model what to do or how to do it, i just forcefully directed it back to the whitepaper as the spec and had it create tests and rules to accomplish its goal. the only thing omitted from this commit are hundreds of little tests that are mostly represented in the larger tests that were committed. the setup ended up being pretty simple, GPT5 would generate the specs and do reviews via

codex) and Opus 4.1 / Sonnet would write the code (via claude code). to keep things simple, i just had them each up in tux and used capture pane to share context between the 2.
Result:
Is this production ready? Absolutely not.
Is this better than i would get from most researchers / engineers in 48 hours? Absolutely.
Is this scalable / could this be done
unsupervised? Absolutely not. this would have not worked without me driving it hard and steering it in the right directions.
Is GPT5 a good model? It is super autistic so subtle prompts and inferred context is completely lost on it, but it is a superhuman peer reviewer and sysadmin. I can concur that in the coding and math domains, hallucination has been significantly reduced but that feels like it came at the expense of creativity (probably a fine tradeoff for most people). The intelligence gap between GPT5 and Opus via ClaudeCode is noticeably wide (GPT5 is smarter) but the Sonnet / Opus (ultrathink) combo is a hard to beat as a worker bee (codex is not even close).
I've tried every other CLI and agent harness and,

codex) and Opus 4.1 / Sonnet would write the code (via claude code). to keep things simple, i just had them each up in tux and used capture pane to share context between the 2. Result: Is this production ready? Absolutely not. Is this better than i would get from most researchers / engineers in 48 hours? Absolutely. Is this scalable / could this be done unsupervised? Absolutely not. this would have not worked without me driving it hard and steering it in the right directions. Is GPT5 a good model? It is super autistic so subtle prompts and inferred context is completely lost on it, but it is a superhuman peer reviewer and sysadmin. I can concur that in the coding and math domains, hallucination has been significantly reduced but that feels like it came at the expense of creativity (probably a fine tradeoff for most people). The intelligence gap between GPT5 and Opus via ClaudeCode is noticeably wide (GPT5 is smarter) but the Sonnet / Opus (ultrathink) combo is a hard to beat as a worker bee (codex is not even close). I've tried every other CLI and agent harness and,

I've tried every other CLI and agent harness and, for me, CC is leading in UX by a country mile.
I'd feel pretty confident saying the output in this repo is at the pareto frontier of what is possible for a combination of Al models to produce (with expert human supervision) as of today. Its not slop but its not deployable. In the end, both models got stuck on an illegal memory issue (CUDA gunna CUDA) which will most likely require me to dive in and fix the glaring holes in this code (thus negating the test results). That said, if a junior or mid level engineer brought this to me after a weekend worth of work as a starting point for a new research project, i'd be pretty happy and impressed.
I get slightly annoyed when people talk about Al coding and dont share the code (i am very guilty of this and as this is something i can finally share, i thought i'd be the change i want to see in the world.
I am most likely not going to work on this anymore as i have a working NSA
implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code?

I've tried every other CLI and agent harness and, for me, CC is leading in UX by a country mile. I'd feel pretty confident saying the output in this repo is at the pareto frontier of what is possible for a combination of Al models to produce (with expert human supervision) as of today. Its not slop but its not deployable. In the end, both models got stuck on an illegal memory issue (CUDA gunna CUDA) which will most likely require me to dive in and fix the glaring holes in this code (thus negating the test results). That said, if a junior or mid level engineer brought this to me after a weekend worth of work as a starting point for a new research project, i'd be pretty happy and impressed. I get slightly annoyed when people talk about Al coding and dont share the code (i am very guilty of this and as this is something i can finally share, i thought i'd be the change i want to see in the world. I am most likely not going to work on this anymore as i have a working NSA implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code?

I am most likely not going to work on this anymore as i have a working NSA
implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code?
impressive or terrible? Slop cannon or HOLY SHIT?!
github.com/Noumena-Networ...

I am most likely not going to work on this anymore as i have a working NSA implementation for titan (probably OSS that soon as well) but i guess i am curious what people think about the 100% Al written code? impressive or terrible? Slop cannon or HOLY SHIT?! github.com/Noumena-Networ...

GPT-5 (codex-cli) vs Opus 4.1 (Claude Code)

@xjdr.bsky.social did an experiment contrasting these two:

GPT-5 is: “super autistic, so subtle prompts[..] are lost on it, but it is a superhuman peer reviewer and sysadmin”

github (code produced): github.com/Noumena-Netw...

OP: x.com/_xjdr/status...

11.08.2025 17:19 👍 13 🔁 2 💬 1 📌 2
Post image

It’s just matrix multiplication

10.08.2025 01:56 👍 302 🔁 49 💬 7 📌 11

huh
foundation model architecture shift coming?

09.08.2025 23:26 👍 13 🔁 1 💬 1 📌 0