Roland Faure's Avatar

Roland Faure

@rfaure

Sequence bioinfomatician, algorithms, methods. Postdoc in Institut Pasteur in Rayan Chikhi's lab

66
Followers
128
Following
17
Posts
01.12.2024
Joined
Posts Following

Latest posts by Roland Faure @rfaure

πŸ“Œ Les #soumissions pour @jobim2026.bsky.social sont ouvertes jusqu'au 15/03/26.

πŸ“ Soumission de travaux originaux, articles longs (+ PCI), activitΓ©s de plateformes et de service, posters et dΓ©monstrations

πŸ“ Plus d’infos sur : jobim2026.sfbi.fr

#JOBIM2026 #bioinfo #Strasbourg

02.02.2026 10:17 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1
Preview
GitHub - ebiggers/libdeflate: Heavily optimized library for DEFLATE/zlib/gzip compression and decompression Heavily optimized library for DEFLATE/zlib/gzip compression and decompression - ebiggers/libdeflate

πŸ—œοΈβš‘ If you use gzip/gunzip a lot in your pipelines, switch to the faster"libdeflate" versions instead! They use modern CPU capabilities to achieve a 2-3x speedup.

libdeflate is in conda, and "libdeflate-gzip" and "libdeflate-gunzip" are drop-in replacements. #unix

github.com/ebiggers/lib...

20.01.2026 01:37 πŸ‘ 71 πŸ” 23 πŸ’¬ 1 πŸ“Œ 0

"..based on a common wavefront design that can be adapted to support a variety of dynamic programming algorithms: local, global, and semi-global alignment of genomic and protein sequences with a variety of commonly used scoring schemes" from
@martinsteinegger.bsky.social andco

20.12.2025 11:05 πŸ‘ 14 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0

Inverted colored de Bruijn Graph for practical kmer sets storage https://www.biorxiv.org/content/10.64898/2025.12.08.692073v1

11.12.2025 00:46 πŸ‘ 8 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1
DSB 2026 Venice - February 18-19 Workshop Data Structures in Bioinformatics

The 12th edition of the 2-days workshop β€œData Structures in Bioinformatics” (DSB) will take place in Venice (Italy) on February 18-19th, 2026: dsb-meeting.github.io/DSB2026/

10.12.2025 14:29 πŸ‘ 10 πŸ” 9 πŸ’¬ 1 πŸ“Œ 1

1/9 Just out:

k-mer indexes are the backbone of fast search in genomic data, but many degrade under small k, subsampling, or high diversity.

With OndΕ™ej SladkΓ½ and @pavelvesely.bsky.social we asked: can we build one that works efficiently for any k-mer set?

05.12.2025 17:42 πŸ‘ 27 πŸ” 13 πŸ’¬ 1 πŸ“Œ 1
Post image

Sorry for the first figure, got a problem of background, here it is:

04.12.2025 14:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - RolandFaure/SNooPy: metagenomic SNP caller metagenomic SNP caller. Contribute to RolandFaure/SNooPy development by creating an account on GitHub.

Coming up with a name was pretty hard, we had a lot of good candidates. We settled on SNooPy, which is a reference to SNPs and to the fact that the tool is 100% python. We thought about metaSNooPy but this went too far πŸ˜…. The github: github.com/RolandFaure/SNooPy

04.12.2025 13:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image

Tests show that:
1/ SNooPy has the best recall in our tests
2/ Using genomic long-read SNP callers does not (always) work well: most tools have very low recall, but DeepVariant perform much better than other tested methods
3/ The recall of all tools is still far from 100%

04.12.2025 13:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We propose a new statistical framework. The idea to distinguish artefacts from SNPs is to look at several loci simultaneously: artefacts will occur on random reads, while SNPs will occur systematically on the reads that come from the same strain.

04.12.2025 13:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Existing long-read SNP callers (DeepVariant, longshot...) have been developed for diploid genomes. Deep-learning methods are trained on [human] genomic data. Statistical methods contain assumption that do not hold for metagenomics.

04.12.2025 13:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Preprint out! Check out our new long-read metagenomic SNP-caller, SNooPy πŸ˜€. Work with Chris Quince. Thread 🧡
πŸ‘‰ www.biorxiv.org/content/10.6...

04.12.2025 13:18 πŸ‘ 13 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0

Preprint Alert!
We present new strategies to accelerate large-scale document comparison using MinHash-like sketches.

A thread:

01.12.2025 14:57 πŸ‘ 12 πŸ” 8 πŸ’¬ 1 πŸ“Œ 0

🧡6/ 6
Since MSRs sketches are sequence, they are super easy to use. I think they could be useful for many other problems, e.g. SNP calling, pangenome graphs, indexing, etc.

03.10.2025 14:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

🧡5/6
The sketching makes assembly extremely fast: a gut metagenome sample of 138Gbp of sequencing data was assembled in less that 2h and 10G RAM on 8 threads ⚑. And thanks to MSRs, *highly similar strains are not collapsed*

03.10.2025 14:51 πŸ‘ 1 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

🧡4/6
Two key properties that make MSRs sketches really cool:
πŸ‘‰ They are alignable sequences: you can just feed them in existing assembler
πŸ‘‰ MSR sketches can *keep all the SNPs*, i.e. two highly similar sequences are (almost) always reduced to different sketches -> useful to separate similar strains

03.10.2025 14:51 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🧡3/ 6
MSRs have been defined by @lblassel.bsky.social @rayanchikhi.bsky.social and @pashadag.bsky.social in pmc.ncbi.nlm.nih.gov/articles/PMC....
Take a sequence, a value of k, and stream all k-mers through a function that output either a base or the empty character, and you got your sketch

03.10.2025 14:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🧡2/6
Conceptually, the assembler is on the same lines as metaMDBG:
1. sketching reads
2. assembly procedure on the sketches
3. reversing to base-space to obtain the final assembly
The main difference is the sketching scheme: we introduce *Mapping-friendly Sequence Reductions (MSR) sketching*

03.10.2025 14:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...

Our preprint on our new metagenomic HiFi assembler Alice is out πŸ₯³ Based on a *new sketching method* (🧡1/6)
πŸ‘‰ Preprint www.biorxiv.org/content/10.1...
πŸ‘‰ Github github.com/rolandfaure/...

03.10.2025 14:51 πŸ‘ 25 πŸ” 21 πŸ’¬ 2 πŸ“Œ 0
Post image

πŸŒŽπŸ‘©β€πŸ”¬ For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.πŸ¦ πŸ„πŸŒ΅

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

03.09.2025 08:39 πŸ‘ 218 πŸ” 118 πŸ’¬ 3 πŸ“Œ 16

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 πŸ‘ 114 πŸ” 80 πŸ’¬ 5 πŸ“Œ 5

Congrats! Nice results πŸŽ‰

16.05.2025 14:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I am happy to share our new preprint introducing MADRe - a pipeline for Metagenomic Assembly-Driven Database Reduction, enabling accurate and computationally efficient strain-level metagenomic classification.

πŸ”—https://www.biorxiv.org/content/10.1101/2025.05.12.653324v1
1/9

16.05.2025 08:36 πŸ‘ 15 πŸ” 8 πŸ’¬ 2 πŸ“Œ 0
Post image

Starting #RECOMBseq with @rayanchikhi.bsky.social 's keynote. Here stressing our responsibility as scientists to enable access to a common good: genomic data

24.04.2025 00:42 πŸ‘ 30 πŸ” 10 πŸ’¬ 1 πŸ“Œ 1
Post image

Side note: you could, speaking purely theoretically, also fit every microbe onto an SD card, which is within the weight limit for a carrier pigeon. For some distances, it would be faster than the internet for transmitting sequence libraries
7/

09.04.2025 21:10 πŸ‘ 45 πŸ” 11 πŸ’¬ 4 πŸ“Œ 2

So glad this is finally out. The method has been instrumental in allowing us to compress the AllTheBacteria data - ~2 million bacterial genomes shrink from 3Terabytes (gzipped) to 100Gb using phylogenetic compression. Great work by @brinda.eu

09.04.2025 22:27 πŸ‘ 126 πŸ” 51 πŸ’¬ 4 πŸ“Œ 1
Preview
GitHub - rrwick/condaenvlist: a simple tool for listing conda environments with descriptions a simple tool for listing conda environments with descriptions - rrwick/condaenvlist

Do you (like me) create a bunch of conda environments, then later forget what they're for, when they were last updated, or which tools are in them?

If so, you might this little project: github.com/rrwick/conda...

27.03.2025 04:34 πŸ‘ 78 πŸ” 40 πŸ’¬ 1 πŸ“Œ 1
Post image

So glad to have participated in #DSB2025, what a great workshop! For some mysterious reason it was the first time I attended after 3 years of sequence research. Thanks to all participants & organizers πŸ˜ƒ

07.03.2025 19:41 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Ragnar's made some incredible optimizations on the computation of minimizers, can't wait to see how these improvements will benefit bioinfo tools!

13.12.2024 15:50 πŸ‘ 4 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Really cool work!

13.12.2024 14:42 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0