Florian Trigodet's Avatar

Florian Trigodet

@floriantrigodet

Computational microbiologist. Senior scientist at the Helmholtz Institute for Functional Marine Biodiversity, Oldenburg. Working in Meren lab.

498
Followers
273
Following
23
Posts
16.11.2023
Joined
Posts Following

Latest posts by Florian Trigodet @floriantrigodet

Preview
Troubleshooting common errors in assemblies of long-read metagenomes - Nature Biotechnology Long-read sequence assemblies from metagenomes contain frequent errors.

Troubleshooting common errors in assemblies of long-read metagenomes www.nature.com/articles/s41... #jcampubs

02.01.2026 20:30 πŸ‘ 35 πŸ” 13 πŸ’¬ 0 πŸ“Œ 0
Preview
Troubleshooting common errors in assemblies of long-read metagenomes - Nature Biotechnology Long-read sequence assemblies from metagenomes contain frequent errors.

Troubleshooting common errors in assemblies of long-read metagenomes - @merenbey.bsky.social @banfieldlab.bsky.social go.nature.com/44P7nSm

02.01.2026 16:39 πŸ‘ 43 πŸ” 20 πŸ’¬ 0 πŸ“Œ 2
Preview
Troubleshooting common errors in assemblies of long-read metagenomes - Nature Biotechnology Long-read sequence assemblies from metagenomes contain frequent errors.

Really important read for people working with long-read MAGs.
www.nature.com/articles/s41...

06.01.2026 06:05 πŸ‘ 22 πŸ” 8 πŸ’¬ 0 πŸ“Œ 1

Now published in Nature Biotechnology:
go.nature.com/44P7nSm
If you missed it, the TL;DR is in my April thread below

06.01.2026 09:38 πŸ‘ 59 πŸ” 35 πŸ’¬ 1 πŸ“Œ 0
Post image

We have a date for the free-to-attend #anvio workshop and ECR Symposium for 2026, and we look forward to meeting you at the @hifmb.de in Oldenburg, Germany!

Please find more information on the venue, program, and the application form here, and spread the word πŸ˜‡

anvio.org/workshops/20...

20.11.2025 18:42 πŸ‘ 27 πŸ” 25 πŸ’¬ 1 πŸ“Œ 1
Preview
An exercise on metabolic reconstruction in anvi'o A tutorial on how to run metabolism estimation and enrichment in anvi'o

See this tutorial by Iva Veseli: anvio.org/tutorials/fm...
It contains everything you described above

08.10.2025 07:50 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

With anvi'o you can annotate genomes with multiple annotation sources, including KEGG KOfams.
Anvi'o also include a set of tool to compute metabolic module completeness and copy/numbers (useful for metagenomics).
Better: there is a program to compare func and metabolic enrichment

08.10.2025 07:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I briefly used myloasm on a small project and I found the same read clipping issues as we reported for other assemblers like metaMDBG. Haven't had the time to run a full scale analysis like we did in our preprint.

24.09.2025 10:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
home | GlobDB

I'm happy to announce the latest release of the GlobDB, available at globdb.org.

The GlobDB is a database of "species dereplicated" microbial genomes, and as of release 226 contains twice the number of species-representative genomes (306,260) than the latest GTDB release.

10.06.2025 11:20 πŸ‘ 114 πŸ” 62 πŸ’¬ 3 πŸ“Œ 4
PostDoc in ecology and evolution of plasmids in polar waters at HIFMB (f/d/m) Layout AWI HIPP extern, englisch

We have a new 3-year postdoc position in our group at the @hifmb.de to study plasmids and plasmids systems of the marine environment to survey their utility in microbial responses to environmental change.

Please see the official job ad here, and spread the word:

jobs.awi.de/Vacancies/20...

05.06.2025 08:09 πŸ‘ 79 πŸ” 85 πŸ’¬ 2 πŸ“Œ 5
Preview
April 30, 2025 See you nextΒ week at MVIF!Β  Skin microbiome Large-scale skin metagenomics reveals extensive prevalence, coordination, and functional adaptation of skin microbiome dermotypes across body sites – Che…

New #MicrobiomeDigest: microbiomedigest.com/2025/04/30/a...

β€’HerrgΓ₯rds cheese @jrotwitguez.bsky.social‬
β€’Asian Skin Microbiome Program @cherrychengchenli.bsky.social ‬
β€’long-read assemblers @floriantrigodet.bsky.social
β€’argNorm @vedanthramji.bsky.social

CU next week at @microbiomevif.bsky.social!

30.04.2025 08:11 πŸ‘ 4 πŸ” 5 πŸ’¬ 0 πŸ“Œ 0

In the end, the truth should be in the reads and if multiple long (or short) reads support the joining of two genomic sequence with different GC content, skew, etc; then I would be inclined to trust its reality

29.04.2025 15:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(if all of them happens at the same genomic loci, I would have no doubt that it a case of a chimeric contig)

29.04.2025 15:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

But all of them can occur naturally: recent genomic rearrangement creates shifts in GC skew; GC content can change with HGT or non-coding genes like rRNA; non-specific read recruitment and hypervariable region (insertion/deletion of genes) creates drops in coverage

29.04.2025 15:47 πŸ‘ 2 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

GC skew is a great idea and I think a combination of GC skew, sharp change in GC content and drop of coverage would be great indicators to find chimeric sequences. Especially if they all occurs at the same genomic location

29.04.2025 15:45 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
A reproducible workflow for Trigodet et al., 2025 A bioinformatics workflow for our study long-read assemblers

Thanks a lot for going through it in your journal club! The details of my blast search can be found here: tinyurl.com/ynxwsvwc
In short, I remove the DUST filter and I ask BLAST to only report the first hit. I don't know how it would report no hits if too many hits?

29.04.2025 15:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Genomes from long-read metagenomic assemblies contain rampant errors, highlighting the pressing need for stricter evaluation methods in long-read assembly algorithms. Read more in our paper with the Eren group. @floriantrigodet.bsky.social @merenbey.bsky.social

28.04.2025 21:29 πŸ‘ 42 πŸ” 16 πŸ’¬ 0 πŸ“Œ 2

Misreporting non-circular elements as circular can quickly deteriorate public genome databases. We hope we can work together to ensure assemblers include stricter checks, or offer modes that prioritize caution. We would love to hear your experiences or thoughts!

28.04.2025 08:07 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Preview
A reproducible workflow for Trigodet et al., 2025 A bioinformatics workflow for our study long-read assemblers

We hope to help the community to understand potential issues they may run into, and help the developers to see different perspectives. We have a fully reproducible bioinformatics workflow, and it is easy to add one more assembler to it, or new datasets:

merenlab.org/data/benchma...

28.04.2025 08:07 πŸ‘ 7 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We are aware that developing assembly algorithms, especially for metagenomes, is a notoriously complex and difficult task, and we have a deep appreciation of those who invest their time and skills in creating and maintaining them. A heartfelt THANK YOU. We're here to help, nothing more.

28.04.2025 08:07 πŸ‘ 9 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Post image

And we observed astonishing number of repeats in results. Repeats are common in nature and the improved ability to resolve repeats is one of the strengths of long-read sequencing. But the repeats we found didn't look convincing and likely underlined other issues.

28.04.2025 08:07 πŸ‘ 5 πŸ” 4 πŸ’¬ 2 πŸ“Œ 0
Figure 4. Prototypical mapping artifacts and their putative origin. (A) A chimeric sequence assembled from two subpopulations. At a conserved locus, two subpopulations existed with their own and distinct sequence. The assembled contig contains all or a part of each subpopulation specific sequence resulting in a chimeric construct. (B) Another example of a variable genomic site, but in this example the contig sequence contains the sequence of a very minor subpopulation, supported by only one read. (C) Duplicated sequence only found in the assembly, not supported by any long reads. (D) Two contigs assembled from metaMDBG (left) and metaFlye (left) presenting large regions with no coverage. We blasted these regions back to the long reads and found no hits. Coverage visualization was exported from the anvi’o interactive interface (left) or the IGV software (right) and the read mapping visualization was from IGV as well. Indel smaller than 150bp as well as mismatches are not displayed in the mapping. Red markers at the end of reads indicate read clipping.

Figure 4. Prototypical mapping artifacts and their putative origin. (A) A chimeric sequence assembled from two subpopulations. At a conserved locus, two subpopulations existed with their own and distinct sequence. The assembled contig contains all or a part of each subpopulation specific sequence resulting in a chimeric construct. (B) Another example of a variable genomic site, but in this example the contig sequence contains the sequence of a very minor subpopulation, supported by only one read. (C) Duplicated sequence only found in the assembly, not supported by any long reads. (D) Two contigs assembled from metaMDBG (left) and metaFlye (left) presenting large regions with no coverage. We blasted these regions back to the long reads and found no hits. Coverage visualization was exported from the anvi’o interactive interface (left) or the IGV software (right) and the read mapping visualization was from IGV as well. Indel smaller than 150bp as well as mismatches are not displayed in the mapping. Red markers at the end of reads indicate read clipping.

Accurate reconstruction of genomic variation is essential to associate within-population structural differences to ecological or evolutionary phenotypes. But we observed serious haplotyping errors, where assemblers created chimeric constructs or did things biologists wouldn't expect.

28.04.2025 08:07 πŸ‘ 4 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Figure 3. Premature circularization of a Methanothrix genome. (A) Frequency of circular contigs under 500kbp with a minimum of 3 ribosomal proteins. Each point represents one assembly. (B) A pangenomic analysis of all publicly available Methanothrix genomes from the NCBI’s RefSeq database completed with the so-called circular genome of Methanothrix assembled from the sample AD Sludge by hifiasm-meta (light blue), as well as a contig from the same assembly which correspond to the rest of the missing Methanothrix genome (medium blue) and the combination of these two contigs (dark blue). (C) KEGG metabolic module completion of all genomes and contigs in (B). (D) A schematic representation of the reads mapping over a transposase gene in the prematurely circularized contigs (light blue in B and C) showing the lack of reads support around the gene, the full figure is available in Supplementary Figure 2

Figure 3. Premature circularization of a Methanothrix genome. (A) Frequency of circular contigs under 500kbp with a minimum of 3 ribosomal proteins. Each point represents one assembly. (B) A pangenomic analysis of all publicly available Methanothrix genomes from the NCBI’s RefSeq database completed with the so-called circular genome of Methanothrix assembled from the sample AD Sludge by hifiasm-meta (light blue), as well as a contig from the same assembly which correspond to the rest of the missing Methanothrix genome (medium blue) and the combination of these two contigs (dark blue). (C) KEGG metabolic module completion of all genomes and contigs in (B). (D) A schematic representation of the reads mapping over a transposase gene in the prematurely circularized contigs (light blue in B and C) showing the lack of reads support around the gene, the full figure is available in Supplementary Figure 2

We observed cases of premature circularization. VERY OFTEN. Catching premature circularization can be easy in some cases, but very difficult in others.

28.04.2025 08:07 πŸ‘ 4 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0
Figure 2. Multi-domain and multi-phyla contigs. Six contigs from metaMDBG, metaFlye and hifiasm-meta. For each contigs we displayed the GC content, coverage in the metagenomics reads used for their assembly, gene level taxonomy. For each assembly breakpoint, we display a zoomed-in detail of the read mapping from IGV. In these subplots, red arrows at the end of the mapped read indicate clipping and the coloring at the end of these reads indicates that the following portion of the read mapped to another contig and similar colors indicate that multiple reads continue to map on the same contig. Blue markers indicate large indels (> 150bp)

Figure 2. Multi-domain and multi-phyla contigs. Six contigs from metaMDBG, metaFlye and hifiasm-meta. For each contigs we displayed the GC content, coverage in the metagenomics reads used for their assembly, gene level taxonomy. For each assembly breakpoint, we display a zoomed-in detail of the read mapping from IGV. In these subplots, red arrows at the end of the mapped read indicate clipping and the coloring at the end of these reads indicates that the following portion of the read mapped to another contig and similar colors indicate that multiple reads continue to map on the same contig. Blue markers indicate large indels (> 150bp)

We observed chimeric contigs where the assembly software reported a single contig that brought together sequences from two distinct taxa, sometimes three or more, and even sequences that belonged to distinct domains of life.

28.04.2025 08:07 πŸ‘ 3 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0

Unlike traditional evaluations of assembly software, our evaluation made quite a heavy use of unassembled long-reads to quantify how well the assembled sequences matched to long-reads. They generally worked great, but then sometimes they didn't at all. Here are a few things we observed:

28.04.2025 08:07 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
A schematic representation of long reads mapping to a contig with multiple types of read disagreement with the reference, including indel and single nucleotide variants representing more than half or all the coverage, and clipping events spanning the entire coverage

A schematic representation of long reads mapping to a contig with multiple types of read disagreement with the reference, including indel and single nucleotide variants representing more than half or all the coverage, and clipping events spanning the entire coverage

But then we learned that Jill Banfield's group was dealing with similar issues in soil samples. At that point we decided to take a deeper look at our long-read assemblers, and used the datasets they used, and added some novel ones into the mix to re-evaluate them.

28.04.2025 08:07 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Our pangenomes were certainly raising some red flags. But we were not sure if fractions of genomes as circular elements were a feature of nature that we missed due to the years of short-read assemblies. With changing technology, you sometimes learn things you didn't even know you were missing.

28.04.2025 08:07 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

While examining the assembly results we are initially extremely happy with the very large number of giant and occasionally circular contigs. Although we quickly realised that many of the circular contigs did not make any sense.

28.04.2025 08:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Assemblers play a significant role turning individual reads into long genomic segments, and have tremendous implications on downstream work. Last year we were very excited to apply some of the new assemblers to our PacBio datasets from marine samples.

28.04.2025 08:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

With technologies such as PacBio and ONT, genome-resolved metagenomics is experiencing its second coming. Complete and circular genomes from all domains of life as well as viruses plasmids, all WITHOUT binning seem right around the corner. That is, if we can actually assemble them.

28.04.2025 08:07 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0