Ethan Weinberger's Avatar

Ethan Weinberger

@ethanweinberger

Ph.D student in Computer Science and Engineering at the University of Washington working with Su-In Lee.

1,113
Followers
358
Following
13
Posts
24.12.2023
Joined
Posts Following

Latest posts by Ethan Weinberger @ethanweinberger

Preview
gReLU: a comprehensive framework for DNA sequence modeling and design - Nature Methods gReLU advances deep-learning-based modeling and analysis of DNA sequences with comprehensive toolsets and versatile applications.

I'm happy to share that our gReLU package is now published in Nature Methods!

www.nature.com/articles/s41...

15.10.2025 21:21 πŸ‘ 20 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0

scverse turns 3!
What started as a shared vision for interoperable single-cell analysis has become a vibrant, global community.
From AnnData to full multimodal pipelines, we’re building the future of everything single-cell and spatial omics, together.
Here’s to what’s next!

17.05.2025 22:08 πŸ‘ 17 πŸ” 5 πŸ’¬ 0 πŸ“Œ 4
Preview
scverse conference 2025 Follow us on our channels to learn more details in the coming weeks

πŸ“£ Mark your calendars! The 2025 edition of the scverse conference will take place on 17-19 November at Stanford University (US) scverse.org/conference20...

Call for abstracts and registrations coming soon!

12.05.2025 22:47 πŸ‘ 12 πŸ” 9 πŸ’¬ 1 πŸ“Œ 2
Preview
Programmatic design and editing of cis-regulatory elements The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...

Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...

24.04.2025 12:59 πŸ‘ 115 πŸ” 37 πŸ’¬ 2 πŸ“Œ 3
Preview
Zero-shot evaluation reveals limitations of single-cell foundation models - Genome Biology Foundation models such as scGPT and Geneformer have not been rigorously evaluated in a setting where they are used without any further training (i.e., zero-shot). Understanding the performance of mode...

genomebiology.biomedcentral.com/articles/10....

Quite an indictment of some of the current single cell "virtual cell" foundation models. Even for the relatively mundane applications, cell labeling, batch correction etc, they are poor compared to much simpler & cheaper methods.

20.04.2025 16:14 πŸ‘ 158 πŸ” 47 πŸ’¬ 6 πŸ“Œ 3
Preview
CODEML Workshop @ ICML 2025 Championing Open-source Development in Machine Learning

First-ever CODE ML workshop at ICML!
July 18 or 19, 2025, Vancouver, Canada

Submit papers on OSS libraries, maintenance, best practices & more.
Format: 4-page non-archival papers
Due: May 19

codeml-workshop.github.io/codeml2025/#...

17.04.2025 17:42 πŸ‘ 10 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Post image

Most people haven’t heard of this test, which is available in the US. It accurately predicts Alzheimer’s (not just if there’s a risk, but when). It is modulated by exercise and likely other lifestyle factors.
Here’s (almost) everything we know about it
erictopol.substack.com/p/the-breakt...

14.04.2025 13:47 πŸ‘ 650 πŸ” 225 πŸ’¬ 30 πŸ“Œ 18
Preview
Deep genomic models of allele-specific measurements Allele-specific quantification of sequencing data, such as gene expression, allows for a causal investigation of how DNA sequence variations influence cis gene regulation. Current methods for analyzin...

Some encouraging news for cross-gene generalization of allele effects in S2F models. www.biorxiv.org/content/10.1...

16.04.2025 01:46 πŸ‘ 15 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0

New preprint out!
This is probably my most important paper. To my deep chagrin, it has no math.
XIST is a non-coding RNA exclusive to XX females. It silences one of the X chromosomes.
So what is it doing in male heart Schwann cells?

15.04.2025 19:39 πŸ‘ 19 πŸ” 11 πŸ’¬ 1 πŸ“Œ 5
Photo of Anne Carpenter with STATus List 2025 wording

Photo of Anne Carpenter with STATus List 2025 wording

As an academic who works on tech to discover causes and cures of disease, contributing to novel drugs reaching patients has been thrilling.
Thanks to @statnews.com naming me to STATUS List 2025 honoring leaders in health, medicine, and science!

#STATUSList
www.statnews.com/status-list/...

10.04.2025 11:10 πŸ‘ 81 πŸ” 5 πŸ’¬ 5 πŸ“Œ 2
Preview
China’s Biotech Advances Threaten U.S. Dominance, Warns Congressional Report Congress should invest at least $15 billion to support biotech research over the next five years and take other steps to bolster manufacturing in the U.S., the report said.

This!!! I hope someone in Washington is listening

www.wsj.com/tech/biotech...

08.04.2025 22:48 πŸ‘ 305 πŸ” 108 πŸ’¬ 15 πŸ“Œ 5
Data collected with the new sequencing platform HyDrop v2 is shown. First, a schematic overview of the bead batches of the microfluidic beads is followed by a tSNE and a barplot showing the costs in comparison to 10x Genomics. 
Then, a track of mouse data (cortex) is shown together with nucleotide contribution scores in the FIRE enhancer in microglia. Here, the HyDrop and 10x based models show the same contributions. 
On the right, the Drosophila embryo collection is explained; in the paper HyDrop v2 and 10x data are compared to sciATAC data. Then, a nucleotide contribution score is also shown, whereas HyDrop v2 and 10x models show the same contribution, just as in mouse.

Data collected with the new sequencing platform HyDrop v2 is shown. First, a schematic overview of the bead batches of the microfluidic beads is followed by a tSNE and a barplot showing the costs in comparison to 10x Genomics. Then, a track of mouse data (cortex) is shown together with nucleotide contribution scores in the FIRE enhancer in microglia. Here, the HyDrop and 10x based models show the same contributions. On the right, the Drosophila embryo collection is explained; in the paper HyDrop v2 and 10x data are compared to sciATAC data. Then, a nucleotide contribution score is also shown, whereas HyDrop v2 and 10x models show the same contribution, just as in mouse.

Our new preprint is out! We optimized our open-source platform, HyDrop (v2), for scATAC sequencing and generated new atlases for the mouse cortex and Drosophila embryo with 607k cells. Now, we can train sequence-to-function models on data generated with HyDrop v2!
www.biorxiv.org/content/10.1...

04.04.2025 08:52 πŸ‘ 55 πŸ” 25 πŸ’¬ 2 πŸ“Œ 2
Nature Biomedical Engineering - Auditing medical machine learning This issue highlights advances in applications of machine learning for diagnosing disease and for sorting and classifying health data, and includes a...

The cover of Nature Biomedical Engineering features work from #UWAllen’s @suinlee.bsky.social on techniques for auditing #AI dermatology image classifiersβ€”one of two projects from the lab highlighted in this issue, alongside a deep learning model for cancer insights. www.nature.com/natbiomedeng...

01.04.2025 21:50 πŸ‘ 2 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Human Body Single-Cell Atlas of 3D Genome Organization and DNA Methylation https://www.biorxiv.org/content/10.1101/2025.03.23.644697v1

24.03.2025 13:34 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Investigating Data Size, Sequence Diversity, and Model Complexity in MPRA-based Sequence-to-Function Prediction We created the MPRA Dataset Collection (MDC), a curated resource of MPRA data from 12 studies comprising over 150 million labeled DNA subsequences. These datasets include both random and natural genom...

Our new pre-print, investigating a few important questions when we train S2F models on different types of MPRA datasets. Congrats to Yilun and @xinmingtu.bsky.social www.biorxiv.org/content/10.1...

15.03.2025 03:02 πŸ‘ 25 πŸ” 11 πŸ’¬ 0 πŸ“Œ 0

Wow. "NIH" canceled my co-mentored (with Dave Sulzer) PhD student's F31 funding. His work is on understanding the genetics and neuroscience of language learning disorders. F31 provides no indirect $ to Columbia, just pays his salary. Not that it should matter, but he's an American citizen. W.T.F.

11.03.2025 12:41 πŸ‘ 521 πŸ” 219 πŸ’¬ 21 πŸ“Œ 17
Portrait of Su-In Lee looking off to the side, holding a pen in front of a whiteboard with part of a handwritten algorithm visible behind her

Portrait of Su-In Lee looking off to the side, holding a pen in front of a whiteboard with part of a handwritten algorithm visible behind her

Congratulations to #UWAllen professor @suinlee.bsky.social on her election as a Fellow of the International Society for Computational Biology! @iscb.bsky.social honored Lee for her pioneering work on explainable #AI for biology and medicine. www.iscb.org/iscb-news-it... #PopulationHealth #ThisIsUW

12.03.2025 17:53 πŸ‘ 11 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Modelling and design of transcriptional enhancers - Nature Reviews Bioengineering Enhancers are genomic elements critical for regulating gene expression. In this Review, the authors discuss how sequence-to-function models can be used to unravel the rules underlying enhancer activit...

Awesome summary of the field. An important point is to separate the design method from the oracle model being used. Sometimes, people say they're proposing a new design method but mean a cool new oracle model.

Modelling and design of transcriptional enhancers

www.nature.com/articles/s44...

03.03.2025 18:58 πŸ‘ 37 πŸ” 15 πŸ’¬ 1 πŸ“Œ 0
Advances in post-Bayesian methods – workshop2025

Workshop on Advances in Post-Bayesian methods (May 15--16, UCL): postbayes.github.io/workshop2025/

26.02.2025 17:29 πŸ‘ 11 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
A scalable approach to investigating sequence-to-expression prediction from personal genomes A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.

Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social

23.02.2025 23:31 πŸ‘ 31 πŸ” 15 πŸ’¬ 0 πŸ“Œ 0

My heart goes out to all of the people at the NIH and CDC who were fired recently. These people weren't fired for being bad at their job or a waste of resources -- they were fired because they were easy to fire by outsiders trying to meet a quota. They worked years/decades.. for this?

16.02.2025 13:31 πŸ‘ 46 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1

Given that science funding is under attack, it might be as good a time as any to reflect on how we spend our precious dollars. Cutting out expenditure publishing papers in overpriced journals might be a good thing to seriously consider once again.

11.02.2025 18:06 πŸ‘ 44 πŸ” 7 πŸ’¬ 1 πŸ“Œ 1

MLCB is an excellent conference and a great opportunity to meet other people in the field. Highly recommend attending!

05.02.2025 17:05 πŸ‘ 15 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0

[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the wordβ€”please RT! #MLCB2025

05.02.2025 02:50 πŸ‘ 41 πŸ” 27 πŸ’¬ 1 πŸ“Œ 4
Screenshot of preprint saying "To benchmark the methods, we evaluated the performance of six methods: Linear, Linear-GPT, CellOracle, GEARS, scGPT, and scFoundation (Methods). We also included a basic approach that averages gene expression across all cells within all known perturbations as the prediction of unseen perturbations (referred to as KnownAverage).

The benchmarking results across the 17 datasets were summarized in Fig. 2b. Notably, the KnownAverage method consistently demonstrated some of the best overall performance across all four types of metrics."

Screenshot of preprint saying "To benchmark the methods, we evaluated the performance of six methods: Linear, Linear-GPT, CellOracle, GEARS, scGPT, and scFoundation (Methods). We also included a basic approach that averages gene expression across all cells within all known perturbations as the prediction of unseen perturbations (referred to as KnownAverage). The benchmarking results across the 17 datasets were summarized in Fig. 2b. Notably, the KnownAverage method consistently demonstrated some of the best overall performance across all four types of metrics."

Mean of the training data still absolutely crushing it for perturbation prediction.
www.biorxiv.org/content/10.1...

24.01.2025 12:59 πŸ‘ 37 πŸ” 7 πŸ’¬ 3 πŸ“Œ 0
Preview
Mapping cells through time and space with moscot - Nature Moscot is an optimal transport approach that overcomes current limitations of similar methods to enable multimodal, scalable and consistent single-cell analyses of datasets across spatial and temporal...

Excited to see Moscot (moscot-tools.org) published in @Nature! We scaled Optimal Transport (OT) in single-cell genomics & added multimodality together with spatiotemporal trajectory inference, finding exciting new biology in the pancreas! πŸš€ Read at www.nature.com/articles/s41...

22.01.2025 22:46 πŸ‘ 125 πŸ” 42 πŸ’¬ 2 πŸ“Œ 2
Preview
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation - Nature Genetics Borzoi adapts the Enformer sequence-to-expression model to directly predict RNA-seq coverage, enabling the in-silico analysis of variant effects across multiple layers of gene regulation.

Congrats to Johannes Linder, David Kelley et al. on the journal publication of Borzoi - a long context sequence models of RNA-seq coverage profiles with many nice applications for transcriptional & post-transcriptional regulation & variant effect prediction.

www.nature.com/articles/s41... 1/

08.01.2025 22:06 πŸ‘ 123 πŸ” 37 πŸ’¬ 2 πŸ“Œ 1
Single cell – ENCODEHomo sapiens clickable body map

Very excited to announce that the single cell/nuc. RNA/ATAC/multi-ome resource from ENCODE4 is now officially public. This includes raw data, processed data, annotations and pseudobulk products. Covers many human & mouse tissues. 1/

www.encodeproject.org/single-cell/...

07.01.2025 21:29 πŸ‘ 287 πŸ” 86 πŸ’¬ 6 πŸ“Œ 0
Preview
A generative framework for enhanced cell-type specificity in rationally designed mRNAs mRNA delivery offers new opportunities for disease treatment by directing cells to produce therapeutic proteins. However, designing highly stable mRNAs with programmable cell type-specificity remains ...

The first preprint of 2025! Together with Matvei, @halfacrocodile.bsky.social, & our amazing team, we are excited to share PARADE: an AI framework for designing mRNA UTRs with enhanced cell-type specificity & stability. www.biorxiv.org/content/10.1...

02.01.2025 13:10 πŸ‘ 81 πŸ” 38 πŸ’¬ 1 πŸ“Œ 5

Our ChromBPNet preprint out!

www.biorxiv.org/content/10.1...

Huge congrats to Anusri! This was quite a slog (for both of us) but we r very proud of this one! It is a long read but worth it IMHO. Methods r in the supp. materials. Bluetorial coming soon below 1/

25.12.2024 23:48 πŸ‘ 231 πŸ” 89 πŸ’¬ 7 πŸ“Œ 5