Of course the LLM will reinforcement-learn its way towards cheating your test suite somehow, so you'll need to stay vigilant of this. Maybe something like what we do in automated student assessment - e.g., a combination of seen and unseen test cases.
13.02.2026 08:25
π 1
π 0
π¬ 2
π 0
(Some form of) TDD seems to be a fairly natural fit for vibe coding. You and the LLM both need a way to validate what the AI has implemented. A good, complete, and ideally simple test suite written upfront seems to be a natural way to provide that.
13.02.2026 08:23
π 1
π 0
π¬ 2
π 0
I got interested and asked Claude to do a topic map of ICSE 2016 vs 2026:
11.02.2026 08:34
π 0
π 0
π¬ 1
π 1
The older I get, the more convinced I become that our entire way of organizing capitalism (through stock markets and large international corporations) is fundamentally flawed and will, soon, flop on its belly. I just hope we are not standing beyond it when it does (but we likely will).
29.01.2026 15:15
π 2
π 0
π¬ 1
π 0
βAI job lossesβ is just a code word for βGlobal recession driven by the USA criminally unregulated financial market job lossesβ.
29.01.2026 12:22
π 7
π 4
π¬ 2
π 0
Redirecting
Some holiday news - paper accepted in Future Generation Computing Systems (FGCS):
doi.org/10.1016/j.fu...
30.12.2025 08:56
π 1
π 0
π¬ 0
π 0
Clair Obscur: Expedition 33 takes home an absurd 9 wins at The Game Awards, more than Baldur's Gate 3 in 2023
This year's Game Awards GOTY (and almost everything else) is the popular French RPG Clair Obscur: Expedition 33.
Don't get me wrong, it's hard not to root for Expedition 33 and their super humble and nice team, but some of these awards are a bit sus. Is it even an indie game? Best art direction in a year with freaking Silksong?
www.pcgamer.com/games/clair-...
12.12.2025 23:06
π 1
π 0
π¬ 0
π 0
Still, I found it fun to observe how PC Gamer dealt with their reviewing snafu over time. In the weeks after release, when it became clear that E33 will be a *big deal*, they took it with humor, but as time went on they adopted more of a policy of "the review shall not be mentioned henceforth".
12.12.2025 22:53
π 0
π 0
π¬ 0
π 0
(Not actual criticism of PC Gamer. Reviewing is inherently subjective, and honestly Expedition 33 is kind of a weird and difficult-to-review game.)
12.12.2025 22:50
π 0
π 0
π¬ 1
π 0
Clair Obscur: Expedition 33 review
Clair Obscur: Expedition 33 is a stylish riff on the JRPG, but its real-time-infused combat is rarely as fun as it looks.
Remember when @pcgamer.com gave #expedition33 70% in their review, calling the game "rarely as fun as it looks"?
Yeah, that didn't age great. #expedition33 is now officially the most decorated game at the Game Awards, ever.
www.pcgamer.com/games/rpg/cl...
12.12.2025 22:48
π 0
π 0
π¬ 1
π 0
Vacancies
Reminder - I still have an opening for a postdoc in my lab (closing date is in one week):
www.chalmers.se/en/about-cha...
11.12.2025 09:31
π 0
π 0
π¬ 0
π 1
It goes against common sense, everything we know about economics tells us it shouldn't work, there is no serious data that suggests it works, and yet it forms the basis of all economic decision making in the west.
Why? Because it would be awfully convenient for the people in power if it *did* work.
09.12.2025 09:50
π 3
π 0
π¬ 0
π 0
At some point in the future, people will read about trickle-down economics and have the same confused reaction that we have when learning how universally accepted catholic indulgences were in the Middle Ages.
09.12.2025 09:48
π 7
π 3
π¬ 1
π 0
I'm nowhere close to a financial expert, but how these things usually go is that nobody is *obviously* overleveraged, but everyone depends on everyone and once the first domino pieces start to fall it triggers a chain reaction at whose end the bank's "safe investments" suddenly appear hazardous.
03.12.2025 09:54
π 0
π 0
π¬ 0
π 0
That's a fairly common "playing both sides" argument. Productivity+++, but really nobody needs to worry for their jobs. These things are not likely to be true at the same time.
25.11.2025 09:46
π 2
π 0
π¬ 0
π 0
I have a new job ad for a postdoc out:
www.chalmers.se/en/about-cha...
Application deadline: Dec. 18th
Find out more about the work of my lab: icet-lab.eu
19.11.2025 12:13
π 1
π 1
π¬ 0
π 0
I heard the term "spec-based programming" from a colleague for the paradigm where you really only provide and refine requirements, and do not care at all about the code. I don't think the tools I am using are there yet.
06.11.2025 11:38
π 1
π 0
π¬ 0
π 0
IDK. My definition of vibe coding is "coding based almost exclusively on prompts, without or with minimal manual editing afterwards". Not sure if that is a standard definition, but it feels right.
06.11.2025 11:27
π 1
π 0
π¬ 1
π 0
(8) An interesting mind shift happens when you vibe code a lot. Code turns into a kind of transient artifact that you just aren't very attached to. Is the code messy? Who cares (as long as it works), you aren't looking very much at it anyway.
This has strong implications for security, safety, etc.
06.11.2025 07:50
π 0
π 0
π¬ 1
π 0
(7) Overall, the final system turns kind of messy, but realistically so did all other research prototypes I implemented by hand. But now, nobody, not even me, really understands the messy system.
06.11.2025 07:47
π 0
π 0
π¬ 1
π 0
(6) Planning mode is great. Claude is surprisingly good at creating, updating, and evaluating a plan of what to do. Complex changes became much more feasible once I started working more with planning mode upfront.
06.11.2025 07:44
π 1
π 0
π¬ 1
π 0
(5) For non-trivial code, you'll still need decent understanding of the solution space. I feel like some of the more hairy implementation issues I could only solve because I implemented similar systems in the past, and could prompt the AI with *very* fine-grained designs.
06.11.2025 07:41
π 1
π 0
π¬ 1
π 0
(4) Somewhat relatedly, AI loves to generate tests alongside changes (good) but they are often not very useful. They often stub out all business logic, turning them into classic "Python isn't broken" kind of tests. Getting it to write (and keep!) useful end-to-end tests seems surprisingly hard.
06.11.2025 07:37
π 1
π 0
π¬ 1
π 0
(3) Do.Not. Trust. the AI when it declares success. Whether something actually *works* you need to check yourself. I'll leave this example here - the AI broken 25 tests, and decided after fixing one of the failures that the rest probably isn't their fault.
06.11.2025 07:33
π 2
π 0
π¬ 1
π 0
(2) Validation is king, but also very hard. Again, these tools produce a lot of code. I quickly realized that reviewing it line-by-line is unrealistic. It may be more realistic when doing small changes in an established system, but in greenfield dev you have to go with the flow.
06.11.2025 07:30
π 2
π 0
π¬ 1
π 0
Lesson Learned (1): you feel more productive than you truly are. These tools produce *a lot* of code in short time, but if you take a step back after a few weeks you notice than a fair bit of it wasn't actually that useful. It still takes time to build something that actually works, and not just 75%
06.11.2025 07:27
π 2
π 0
π¬ 1
π 0
I initially used Gemini (in the console), but eventually moved on to Claude Console. They seem similar, but results from Claude where subjectively better, the tooling seems more mature, and the rates allowed me to work without much interruption. I am using the Pro subscription for USD 25 / month.
06.11.2025 07:25
π 1
π 0
π¬ 1
π 0
For the last couple of weeks I have been trying to vibe-code a relatively complicated research system in the area of Java microbenchmarking in my spare time.
I am slowly reaching the point where the system does something useful, so here are some initial impressions:
06.11.2025 07:22
π 1
π 0
π¬ 1
π 0
People talk a lot about echo chambers on here, but I think it's important to remember that you are not entitled to anybody's attention, independently of how important you or your cause are.
05.11.2025 08:50
π 1
π 0
π¬ 0
π 0
People are saying that AI will transform the way we teach and learn. It has already transformed the way students cheat and, to my surprise, how they apologize for cheating.
30.10.2025 10:50
π 23
π 7
π¬ 0
π 2