That looks really cool, Stu. Will be reading it properly later today.
That looks really cool, Stu. Will be reading it properly later today.
True story.
@jomcinerney.bsky.social proposes that genomes do not encode fixed functions but rather โprobability distributionsโ over functional and phenotypic outcomes, and introduces โgenomic perplexityโ as a measure of gene-context incompatibility.
๐ doi.org/10.1093/molbev/msag041
#evobio #molbio
@guigau.bsky.social Saw PanGBank earlier. Very nice ๐
This looks absolutely great. For those of us interested in pangenomes, I am sure this will be a super place to get data and the interface is very clean (plotly). Congrats to the authors (I don't know if they are on bsky): pangbank.genoscope.cns.fr
But it is an eight year old version of endnote and word is a similar vintage. I think me and them are fused together at this stage. Late-stage calcification ๐
Thanks for the tip. I have done what many people do - bought the book and only part-read it. However, I do need to thinbk more deeply about correcting for phylogeny, so this is a great tip. Many thanks.๐
Itโs the power ballads that were the hardest part. Nobody needs to listen to van Halen at 2pm on a Tuesday.
I'm in a coffee shop wrestling with Endnote, while 1980s power ballads blast out from a giant speaker. The sun is shining for the first time this year, but I got paper submission deadlines. I hope nobody sees me cry.
Thanks to The Leverhulme Trust (RF-2023-408) for supporting this work, and to the reviewers and associate editor whose feedback greatly improved the manuscript.
Paper: doi.org/10.1093/molbev/msag041
This connects to real tools. Transformer-based genome models (DNABERT, Evo) can calculate perplexity directly. AlphaFold confidence scores estimate structural perplexity. Flux Balance Analysis handles metabolic perplexity. The framework is testable now.
The practical shift: instead of asking "what does this gene do?" we should ask "what can this gene become?" Synthetic biology, antimicrobial development, and evolutionary prediction all become questions of context engineering rather than gene optimisation alone.
It also explains open vs closed pangenomes. Open pangenomes (like E. coli) arise when large population sizes can detect small fitness advantages and high environmental variability creates many contexts where accessory genes pay off - despite integration costs.
The framework predicts pangenome structure. Core genes = low perplexity across contexts. Rare accessory genes = high perplexity generally, but strong benefits in specific contexts. The U-shaped frequency distribution falls out naturally.
This explains why HGT has a fitness cost โ not because transferred genes are broken, but because they arrive into a genome optimised for different statistical patterns. Over time, codon adaptation, regulatory rewiring, and compensatory mutations reduce perplexity. The gene becomes "expected."
Perplexity operates across multiple dimensions: codon usage, protein structure, regulatory compatibility, metabolic integration, protein-protein interactions, chromosomal organisation, and gene co-occurrence patterns. Each contributes to the fitness cost of genomic novelty.
This leads to the concept of "genomic perplexity" โ borrowed from information theory. Perplexity measures how "surprised" a model is by a sequence. A horizontally transferred gene landing in a new genome is a high-perplexity token โ statistically unexpected in that context.
I propose that evolution shapes genomes not to encode fixed functions, but to optimise probability distributions of functional outcomes across the contexts organisms actually encounter. Selection acts on these distributions, not on singular gene activities.
Modern language models (transformers) succeed because they learn probability distributions over outcomes given context. They use "attention"-each word's contribution depends on other words in the sequence. Epistasis is the biological equivalent. A gene's effect depends on what else is in the genome.
This is exactly the problem NLP faced for decades. Trying to understand language through fixed word definitions failed. The breakthrough came when researchers stopped assigning fixed meanings and started treating words as things whose meaning emerges from context.
We've long asked "what does this gene do?" But the same gene can be essential in one strain and dispensable in another. SP_0185 in Streptococcus pneumoniae is a magnesium transporter โ lethal to lose in some strains, irrelevant in others. Same gene. Different context. Different function.
New paper out in MBE! ๐งต
"Genomic Perplexity and the Evolution of Context-Dependent Function"
The big idea: genes don't have fixed functions. Function emerges from context - genomic, cellular, environmental. And we can quantify this. academic.oup.com/mbe/article/...
MENI is back! Join us in Dublin this August 2026 for our 3rd Meeting for Microbial Evolution in Ireland. We are delighted to have @rachelmwheatley.bsky.social @drrebeccajhall.bsky.social @jpjhall.bsky.social and @tweethinking.bsky.social join us as keynote speakers this year. miniurl.com/MENI
You can change the denomination to euro. Itโs general for people with a finite amount of time and more than one option for what grant to write.
Very interesting paper. www.science.org/doi/10.1126/...
Ever wondered if writing a grant proposal was actually worth your time? Presenting....The Grant Portfolio Evaluator - (for entertainment only) mol-evol.github.io/grant-evalua...
PanForest: predicting genes in genomes using random forests academic.oup.com/bioinformati... #jcampubs
Dissecting Phylogenetic Support: Unified Decay Indices, AU Tests, and Branch-Site Specific Visualizations. https://www.biorxiv.org/content/10.64898/2025.12.05.692543v1
Oh nooooo. ๐๐๐
its for "entertainment" purposes only :)