I would expect some signal from RNA profiles as well, especially for variants proximal to the gene body. There is work showing that even pseudo-bulk gene expression counts matrices can leak private info, so it should hold for RNA profiles as well.
I would expect some signal from RNA profiles as well, especially for variants proximal to the gene body. There is work showing that even pseudo-bulk gene expression counts matrices can leak private info, so it should hold for RNA profiles as well.
Do you mean sequence to function models like ChromBPNet? Havenβt tried it on this task but I believe they can be made to work for genotype prediction with some tweaks (e.g. log-likelihood of observed coverage under ref and alt profile predictions).
We are hiring a PhD intern for Summer 2026 in ML for regulatory genomics at ReLU/BRAID/Genentech! Work on DNA sequence models for the noncoding genome (e.g. DNA design, models of MPRA and genetic variants)! π₯³
Thanks! Not yet, but weβre trying hard to get it out soon!
An artistic reinterpretation of the Nona model's schematics
Introducing Nona! 𧬠@suragnair.bsky.social 's brilliant idea to unify siloed genomic AI. Nona learns jointly from DNA seq + functional data, enabling new ways of modeling genomic data!
And finally, we have a postdoc opening in our team. Youβll get to work on cutting edge models while gaining a unique perspective on how these tools can shape the future of AI in genomics. Join us! 15/15
careers.gene.com/us/en/job/20...
Earlier this year, we presented Nona @MIA_at_Broad (www.youtube.com/watch?v=l14F...). It also received the Best Talk Award at RegSys, ISMB 2025. 14/
We are working on releasing the code and hope to get it out very soon. In the meanwhile, please don't hesitate to reach out if you have any suggestions or questions. 13/
This would not have been possible without my amazing colleagues from BRAID #Genentech: GΓΆkcen Ehsan Alex @johahi.bsky.social Nate @avantikalal.bsky.social Tommaso Hector Gabriele
You can find the preprint here: www.biorxiv.org/content/10.1...
12/
Working on Nona has been a great learning experience. Each analysis highlights a different aspect of regulatory biology: from predictive modeling and generation to privacy.
Nonaβs flexible masking schemes open new directions, and thereβs much more to explore. 11/
We trained a small fLM on base-resolution ATAC-seq. It can invert the signal to recover genotype information with high accuracy, even with as few as 5 million reads per sample. This has immediate privacy implications for sharing fragment files. 10/
Functional genotyping: scATAC-seq has taken off. Fragment files are the de facto file format. They are treated as privacy-preserving, often shared openly even when raw reads are access-controlled. Using AFGR data, we find that common variants alter base-res ATAC-seq profiles. 9/
fLMs are also discrete diffusion models! Nona fLM can generate DNA under functional constraints, e.g. sequences producing weak, strong, left-skewed, or even double-humped DNase-seq profiles.
They allow parallel decoding, with competitive performance at fewer generation steps. 8/
Functional language models (fLM): DNA LMs are great at capturing co-evolutionary sequence patterns, but can't connect them to cell-type specific regulation. An fLM conditioned on GM12878 DNase-seq picks up more transcription factor motif features than plain LMs. 7/
The context-aware model also improves predictions of promoter expression across diverse integration sites as measured by TRIP-seq experiments. 6/
Turns out the biggest gains are at loci showing outlier chromatin states. Here's an example of a heterochromatinized locus where sequence-only model gets the locus wrong, but context-aware model rescues local prediction. 5/
Context-aware models: We improve local genomic predictions by providing flanking track measurements (~196 kb) as input. This outperforms sequence-to-function models by up to 13% on the test set. What's driving these improvements? 4/
We highlight 3 novel applications. These are diverse, spanning improved local prediction of functional tracks, conditioning DNA language models on functional tracks, and surprisingly, privacy risks in ATAC-seq fragment files. 3/
Multimodal masking provides a unified approach.
Nona operates on both DNA sequence and functional genomics tracks. Task-specific masking configurations recover familiar model types, and its flexibility enables entirely new approaches! 2/
Excited to share Nona: a unifying multimodal masking framework for functional genomics.
Models for DNA have evolved along separate paths: sequence-to-function (AlphaGenome), language models (Evo2), and generative models (DDSM).
Can these be unified under a single paradigm? 1/15
Thanks for sharing- the link isnβt working for me.
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!
Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!
Submission deadline: June 1
More details: mlcb.github.io
Help spread the wordβplease RT! #MLCB2025
Hi Wendy, this internship is for current PhD students only.
We are hiring an intern to work with our team at Genentech next summer, on exciting projects related to deep learning for DNA/RNA sequences. Please share and apply! roche.wd3.myworkdayjobs.com/ROG-A2O-GENE...
An interesting diagnostic application of CRISPR is to activate expression of genes in tissues where they are not normally expressed. This is useful when studying functional consequence of suspect pathogenic variants in genes that are restricted to inaccessible tissues like brain, eyes etc. 1/
Gene regulation involves thousands of proteins that bind DNA, yet comprehensively mapping these is challenging. Our paper in Nature Genetics describes ChIP-DIP, a method for genome-wide mapping of hundreds of DNA-protein interactions in a single experiment.
www.nature.com/articles/s41...
Excited to share our latest preprint on scE2G β a new model to link enhancers to target genes using single-cell data β with state-of-the-art performance across multiple perturbation benchmarks.
biorxiv.org/cgi/content/...
Read more below!
1/12
go.bsky.app/PFpnqeM
By demand, I've created the final starter pack in my ML Personality Starter Pack Series.
I'm uncertain who belongs in this starter pack and so if you think you better fit in the Grumpy ML or Unreasonably Upbeat ML starter packs, let me know.
(Self) nominations welcome
go.bsky.app/5Suyk58