's Avatar

@dgautheret

Trying to make sense of #RNA information.

126
Followers
128
Following
28
Posts
17.11.2024
Joined
Posts Following

Latest posts by @dgautheret

Post image

Excited to share this preprint that describes my latest work on using GPUs to accelerate processing of RNA-seq data.

The title says it all: "RNA-seq analysis in seconds using GPUs" now on biorxiv www.biorxiv.org/content/10.6... and github github.com/pachterlab/k...

Figure 1 shows they key result

06.03.2026 19:32 πŸ‘ 181 πŸ” 86 πŸ’¬ 6 πŸ“Œ 8
Presentation of scientific work on De Bruijn Graphs applied to the processing of sequencing data in the context of biology. The picture was taken in the conference room of the University of Venice, where a screen displays a slide that introduces De Bruijn Graphs, with the speaker standing in front of it. Being the screen is a large renaissance painting that spans from the floor to the roof.

Presentation of scientific work on De Bruijn Graphs applied to the processing of sequencing data in the context of biology. The picture was taken in the conference room of the University of Venice, where a screen displays a slide that introduces De Bruijn Graphs, with the speaker standing in front of it. Being the screen is a large renaissance painting that spans from the floor to the roof.

I had the occasion of presenting nice results about the detection of biological events in De Bruijn Graph at #DSB2026, in the context of my PhD work on #Vizitig !

Thanks to the organizers and colleagues for this amazing and super-inspiring event (and @camillemrcht.bsky.social for the picture).

20.02.2026 18:34 πŸ‘ 16 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0

Beautiful caveat section !

15.02.2026 16:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Pour l'importance des pesticides dans l'incidence des cancers, voyez plutot ceci. Les expositions professionnelles (amiante, benzene) sont dans la barre bleue Γ  droite, et les pesticides n'apparaissent nulle part faute de donnΓ©es suffisantes.
www.nature.com/articles/s41...

11.02.2026 08:58 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ‘‡ 😨

09.02.2026 12:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

PREPRINT ALERT

I heard you craving for more combinatorics, here are some more for y'all !

04.02.2026 17:22 πŸ‘ 5 πŸ” 4 πŸ’¬ 0 πŸ“Œ 1
Post image

Pour l'importance des facteurs de risque de cancer, voyez plutΓ΄t ceci. La petite zone bleu clair, ce sont toutes les causes professionnelles: amiante, arsenic, etc. Les pesticides n'apparaissent nulle part faute de donnΓ©es suffisantes.
Source: Fink et al. Nature Medicine, 2026

04.02.2026 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

More minimizer papers! πŸ˜†

04.02.2026 10:43 πŸ‘ 3 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

Stay tuned: We are now running Metapuccino on SRA’s 1 million human transcriptomes.

02.11.2025 10:14 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

This ms. covers the full methodology and discusses the limits of NLP and LLMs for NGS metadata completion.

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Usability was a top priority: Metapuccino runs on regular computers with open-source LLMs, but can also scale up on GPUs for large datasets. All it needs is a list of SRA IDs β€” no pre-processed tables required.

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Fiona Hak developed a clever LLM training strategy using the hardest SRA cases β€” the fine-tuned model is available on Hugging Face.

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Metapuccino fills and standardizes 19 key SRA metadata fields in human transcriptomics, using rule-based NLP and a large language model (LLM).

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Even simple tasks, like selecting tumor vs. normal samples for a cancer type, require expert curation across multiple tables, protocols, and abstracts.

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

NCBI’s SRA is a fantastic resource for studying the human transcriptome. But its metadata is messy β€” over 70% of fields are empty, and information is often inconsistent.

02.11.2025 10:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Metappuccino: Large Language Model-driven Reconstruction of Sequence Read Archive Metadata for Cancer Research Motivation: High-throughput RNA-sequencing has significantly advanced transcriptomic profiling in oncology. Millions of RNA-seq datasets have accumulated in public databases such as the Sequence Read ...

www.biorxiv.org/cgi/content/...

What’s behind Metapuccino? β˜•οΈ, by PhD student Fiona Hak, @camillemrcht.bsky.social and Melina Gallopin. A thread πŸ‘‡

02.11.2025 10:14 πŸ‘ 5 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Preview
Metappuccino: Large Language Model-driven Reconstruction of Sequence Read Archive Metadata for Cancer Research Motivation: High-throughput RNA-sequencing has significantly advanced transcriptomic profiling in oncology. Millions of RNA-seq datasets have accumulated in public databases such as the Sequence Read ...

My algorithmic friends (@camillemrcht.bsky.social) doing LLM stuff : www.biorxiv.org/content/10.1...! And also, screaming last names in the author list ;P. Given my level of trust in Camille, though, perhaps it's time for me to engage more seriously with these models in research...

01.11.2025 15:09 πŸ‘ 5 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
PostDoc position in bioinformatics and artificial intelligence. PDF available upon request.

PostDoc position in bioinformatics and artificial intelligence. PDF available upon request.

Interested in #lncRNA and #ArtificiaIntelligence?
In the frame of our recently founded French-Korean bilateral project DHARP, we are recruiting a post-doc in bioinformatics and artificial intelligence in our team at
@ips2parissaclay.bsky.social
Application limit: 01/12/2025

22.10.2025 15:27 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

PubMed is running on autopilot during shutdown, but key independent committee has been abolished www.bmj.com/content/391/... πŸ§ͺ

22.10.2025 17:55 πŸ‘ 7 πŸ” 9 πŸ’¬ 0 πŸ“Œ 2
Illustration of Burrows-Wheeler Transform and many auxiliary structures from the input string how$now$brown$cow$#

Illustration of Burrows-Wheeler Transform and many auxiliary structures from the input string how$now$brown$cow$#

New tool "bwt-svg" for making illustrations of the BWT and the many auxiliary arrays and other structures related to it. Pyodide-based no-installation-necessary interface here: benlangmead.github.io/bwt-svg/. (H/t to @robert.bio for pointing me to pyodide!) Full repo: github.com/benlangmead/....

14.10.2025 20:48 πŸ‘ 40 πŸ” 21 πŸ’¬ 4 πŸ“Œ 1
Post image

The MSc. Bioinformatics students of U. Paris-Saclay are organizing the Junior Conference on Computational Biology (JC2B) 2025: AI and predictive models in bioinformatics
November 13, 2025 - I2BC, CNRS, Gif-sur-Yvette, France
Register for free : bioi2.i2bc.paris-saclay.fr/jc2b/#regist...

01.10.2025 12:38 πŸ‘ 2 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

πŸ¦ πŸ§β€β™€οΈFrom bacterial to human immunity.

We report in @science.org the discovery of a human homolog of SIR2 antiphage proteins that participates in the TLR pathway of animal innate immunity.
Co-led wt @enzopoirier.bsky.social by D. Bonhomme and @hugovaysset.bsky.social

www.science.org/doi/10.1126/...

24.07.2025 18:22 πŸ‘ 262 πŸ” 122 πŸ’¬ 9 πŸ“Œ 11
Post image Post image

Congratulations to Rayan Chiki, (Institut Pasteur) head of the β€œSequence Bioinformatics” unit, for securing the ERC Proof of Concept 2025 for his project ENZYMINER! πŸ‘

β€ͺ@rayan.chiki.bsky.social

#Bioinformatics

24.07.2025 15:10 πŸ‘ 60 πŸ” 13 πŸ’¬ 4 πŸ“Œ 2
Preview
How to speed up peer review: make applicants mark one another β€˜Distributed peer review’ of grants makes process more than twice as fast β€” and includes some cheat-prevention measures.

Ca a l'air bien, non?
www.nature.com/articles/d41...

23.07.2025 07:46 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
K2R: Tinted de Bruijn graphs implementation for efficient read extraction from sequencing datasets AbstractSummary. Biological sequence analysis often relies on reference genomes, but producing accurate assemblies remains a challenge. As a result, de nov

Paper Alert!

Our preprint on the K2R index, being able to efficiently associate kmers to the reads containing them is finally out there!

A thread!
academic.oup.com/bioinformati...

05.07.2025 09:30 πŸ‘ 17 πŸ” 9 πŸ’¬ 1 πŸ“Œ 0
Post image

New ENCODE4 long-read RNA-seq transcripts track forΒ hg38 and mm10. Triplets (e.g. [1,1,3]) indicate start site, exon combination, and stop site for each transcript. Enrichment scores show how these change across tissue and cell line samples.

Read more: genome.ucsc.edu/gold...

16.07.2025 18:27 πŸ‘ 26 πŸ” 7 πŸ’¬ 0 πŸ“Œ 1
Post image

#JOBIM2025 Mathilde Girard ends the session with a simple but effective idea: re oder the reads before using an off the shelf compressor to improve compression gain

10.07.2025 09:34 πŸ‘ 6 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Post image

#JOBIM2025 @bdegardins.bsky.social presents his PhD work on Vizitig, a multi sample graph exploration tool, with a focus on RNA - this afternoon we'll do a demo on pangenomes with the same tool

10.07.2025 08:42 πŸ‘ 10 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Preview
OReO: optimizing read order for practical compression AbstractMotivation. Recent advances in high-throughput and third-generation sequencing technologies have created significant challenges in storing and mana

Paper alert!
We present Oreo a tools that reorder long reads datasets in a way to compress them efficiently with ANY universal compressor like gz, zstd, xz ...
TLDR: You can get state of the art compression WITHOUT a dedicated compressor/decompressor!
academic.oup.com/bioinformati...
A thread!

03.07.2025 10:52 πŸ‘ 23 πŸ” 18 πŸ’¬ 1 πŸ“Œ 1

Preprint alert from the group 🚨 super fast grep-like sequence selection

02.07.2025 13:38 πŸ‘ 6 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0