#PubDrop
7/6 As always, this study is end-to-end reproducible (arcadia-science.github.io/2025-phyloge...)
6/6 As the field shifts toward MSA-based architectures, the quality of input alignments becomes paramount. MSA Pairformer encodes evolutionary signal well, but future architectures might benefit from learning when to trust that signal.
Box plot of normalized P@L delta across shuffling start layers showing positive cohort recovers performance as shuffling stops while negative cohort degrades.
5/6 We traced this to layers 5-11, which integrate sequence weights blindly, regardless of whether they help or hurt. When they hurt prediction, we could progressively rescue performance by shuffling weights at earlier layers, saving the model from its own bad intuition.
Histogram of delta P@L showing roughly symmetric distribution around zero with slight positive mean.
4/6 But surprisingly, encoding phylogeny β leveraging it. Comparing learned weights against uniform averaging for contact prediction, the aggregate improvement was marginal. Per-MSA, uniform averaging outperformed learned weights about half the time, sometimes by a large margin.
Unrooted tree for LytTR query with leaves colored by median sequence weight showing gradient of attention from query.
3/6 The model consistently upweights evolutionarily close sequences and downweights distant ones, despite never being trained on this. There's also a division of labor across layers: layer 11 acts as a strong phylogenetic filter, while others contribute more modestly.
2/6 Their key innovation is query-biased attention, where the model weights MSA sequences by relevance to the query. We tested whether "relevance" maps onto phylogenetic distance by correlating sequence weights with evolutionary distances across thousands of protein families.
Screenshot of pub title, "MSA-based pLMs encode evolutionary distance but don't reliably exploit it"
1/6 What does MSA Pairformer learns about evolutionary relationships?
@yoakiyama.bsky.social et al achieved impressive contact & variant-fitness predictions w a fraction of ESM2's parameters; we wanted to better understand the mechanism
arcadia-science.github.io/2025-phyloge... π§ͺ𧬠π₯οΈ
A free, open-access library of high-quality organism illustrations for science communication
All hardware designs, software, and documentation are open-source and available on GitHub. We'd love your feedback on what features would make AutoOpenRaman more useful for your research! [8/8]
research.arcadiascience.com/pub/resource...
This resource is perfect for scientists who want to try Raman spectroscopy without committing to expensive commercial systems, or current OpenRAMAN users who want to add automation to their experiments. [7/8]
We demonstrated AutoOpenRaman's capabilities by automatically collecting 200 Raman spectra from Chlamydomonas reinhardtii colonies in 68 minutes across a 96-well plate format. The system successfully detected Ξ²-carotene signatures in the algae. [6/8]
Diagram of AutoOpenRaman software flow and user interface
The software is designed for non-programmers to install and use, built on Β΅Manager (micro-manager.org) for broad device support. Users can swap hardware components as long as Β΅Manager supports them β no code changes needed [5/8]
On the hardware side, we added a motorized XY stage, microscope objective, laser shutter, and neon light source for quick calibration. [4/8]
Diagrams of the AutoOpenRaman spectrometer setup
Our solution: AutoOpenRaman adds automation to OpenRAMAN while keeping it open-source and relatively inexpensive (~$8k in total). [3/8]
The problem: Commercial Raman systems offer automation, but cost $200k+. #OpenRAMAN (open-raman.org) is a powerful, inexpensive DIY alternative, but it doesn't support automation. [2/8]
Introducing AutoOpenRaman, an automated, inexpensive microscope setup for Raman spectroscopy! We recapitulated many features of expensive commercial systems using economical hardware and open-source software π§΅ [1/8]
research.arcadiascience.com/pub/resource...
Chlamydomonas cpc1-1 mutant exhibits unexpected growth phenotypes
After many yrs building internally @arcadiascience.com we're now able to partner more with others on cool organisms. You won't hurt my feelings if you don't read my post, but don't miss out on our first organismal spotlight animation featuring sea squirts -- just skip to the bottom! π§΅
Our blog is now on Substack! Check out our first post there, which includes a fun video we made about why you might want to study sea squirts open.substack.com/pub/arcadias...
AI laboratory for lit review
Want to give input to help your fellow scientists?
Weβve updated our list of preprints in need of feedback: bit.ly/preprintrequests
Weβre having our own preprint commenting event next Friday. Weβll consider reading your work if you add it to the list: bit.ly/submitpreprint
Two Arcadia talks today!
11:15 am β Athena CD β George Sandler
βThe potential of machine learning for genomic prediction in a quantitative genetics frameworkβ
11:45 am β Athena F βΒ Ryan York
βEvolution determines what machines can learnβ
Calling all scientists! π¨ Share feedback on preprints that need it! Check out our updated list: bit.ly/preprintrequests
Weβre hosting our next internal preprint commenting party this Friday, and might review your work! Submit it here: bit.ly/submitpreprint
As #Evol2025 talks wrap up for the day, our mixer begins!
Come upstairs at The Globe any time til 7:30 to learn about opportunities at Arcadia over charcuterie & drinks π·π§
Weβre looking for a Computational Evolutionary Biologist, Evolutionary Cell Biologist, and moreβ¦
π’Β Hey #Evol2025 attendees, weβre hiring!
If youβre interested in #careers at Arcadia, swing by our free mixer upstairs at The Globe today from 5:30 to 7:30. Itβs a 7-minute walk from the Classic Center and drinks are free π»
jobs.lever.co/arcadiascience
First Arcadia talk at @evolmtg.bsky.socialβ¬! #Evol2025
Today βΒ 3 pm βΒ Athena CD
Austin Patton presents βGraph neural networks: A unifying predictive model architecture for evolutionary applicationsβ