Also, Iโm on the job market and really interested in industry positions around network security, measurement and software engineering (remote or based in Toronto). Please feel free to reach out if you have any leads. Thank you!
Also, Iโm on the job market and really interested in industry positions around network security, measurement and software engineering (remote or based in Toronto). Please feel free to reach out if you have any leads. Thank you!
Last, a shoutout to Andrej Karpathy for the excellent writeup on "The Unreasonable Effectiveness of RNNsโ. It was a really fascinating read during my undergrad (esp. the part on Linux source code). The idea for this project came into existence when I read about fuzzing in grad school.
For full details, and to understand the security implications of our findings, please find a preprint here (talhaparacha.com/icse_preprin...). Weโll be at #icse2026 in April to discuss results in person. 10/n
We also experiment with the Temperature parameter to control sampling strategy. We find that sampling in a conservative way makes more instances valid, at the expense of diversity in features. And that the other extreme adds too much randomness, which also hurts testing. 9/n
An example of a useful TLS certificate generated by our pipeline is shown here, where the date is set to be June 31 2037 (June has 30 days!). This certificate is rejected by all TLS libraries except one (another similar discrepancy was due to the use of leap seconds). 8/n
and (c) LLMs do not necessarily outperform RNNs in our experiments. We find the last part particularly interesting, given that RNNs have been available for over two decades and require substantially less resources. 7/n
(b) several models outperform Transcert, the current state-of-the-art (with the main model used in our paper generating 30% more distinct discrepancies), 6/n
We find that (a) our language models trigger significant number of unique discrepancies (26 out of a maximum possible of 30) -- a discrepancy is when a TLS library accepts a certificate, while others reject it, indicating a potential bug 5/n
We train RNNs (small and medium sized) and GPTs (fine-tuned and trained-from-scratch) since it is unclear which approach is better for testing, in contrast to just learning (also highlighted by Godefroid et al. in Learn&Fuzz, see the snippet below for a short discussion). 4/n
Our insight is that language models learn a representation for the textual data they are trained on, and that the learned representation is probabilistic and often imperfect, meaning a sampled instance can considerably break expectations (and may thus, help in testing). 3/n
We train language models on datasets of real-world TLS certificates, to generate synthetic instances for use in differential testing. 2/n
Super excited to share that I'll present our latest research at ICSE 2026. Our work (co-authors Kyle Posluns, @kevin.borgolte.me, @lindorfer.in, and @proffnes.discuss.systems.ap.brid.gy) explores the use of language models for software testing in TLS certificate validation logic. 1/n