paraacha (@talhaparacha.com)

Also, I’m on the job market and really interested in industry positions around network security, measurement and software engineering (remote or based in Toronto). Please feel free to reach out if you have any leads. Thank you!

16.12.2025 19:42 👍 0 🔁 0 💬 0 📌 0

Last, a shoutout to Andrej Karpathy for the excellent writeup on "The Unreasonable Effectiveness of RNNs”. It was a really fascinating read during my undergrad (esp. the part on Linux source code). The idea for this project came into existence when I read about fuzzing in grad school.

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

For full details, and to understand the security implications of our findings, please find a preprint here (talhaparacha.com/icse_preprin...). We’ll be at #icse2026 in April to discuss results in person. 10/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

We also experiment with the Temperature parameter to control sampling strategy. We find that sampling in a conservative way makes more instances valid, at the expense of diversity in features. And that the other extreme adds too much randomness, which also hurts testing. 9/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

An example of a useful TLS certificate generated by our pipeline is shown here, where the date is set to be June 31 2037 (June has 30 days!). This certificate is rejected by all TLS libraries except one (another similar discrepancy was due to the use of leap seconds). 8/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

and (c) LLMs do not necessarily outperform RNNs in our experiments. We find the last part particularly interesting, given that RNNs have been available for over two decades and require substantially less resources. 7/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

(b) several models outperform Transcert, the current state-of-the-art (with the main model used in our paper generating 30% more distinct discrepancies), 6/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

We find that (a) our language models trigger significant number of unique discrepancies (26 out of a maximum possible of 30) -- a discrepancy is when a TLS library accepts a certificate, while others reject it, indicating a potential bug 5/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

We train RNNs (small and medium sized) and GPTs (fine-tuned and trained-from-scratch) since it is unclear which approach is better for testing, in contrast to just learning (also highlighted by Godefroid et al. in Learn&Fuzz, see the snippet below for a short discussion). 4/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

Our insight is that language models learn a representation for the textual data they are trained on, and that the learned representation is probabilistic and often imperfect, meaning a sampled instance can considerably break expectations (and may thus, help in testing). 3/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

We train language models on datasets of real-world TLS certificates, to generate synthetic instances for use in differential testing. 2/n

16.12.2025 19:42 👍 0 🔁 0 💬 1 📌 0

Super excited to share that I'll present our latest research at ICSE 2026. Our work (co-authors Kyle Posluns, @kevin.borgolte.me, @lindorfer.in, and @proffnes.discuss.systems.ap.brid.gy) explores the use of language models for software testing in TLS certificate validation logic. 1/n

16.12.2025 19:42 👍 4 🔁 2 💬 1 📌 1

paraacha

Latest posts by paraacha @talhaparacha.com