Slacken (metagenomic profiler) and Discount (k-mer counter) are now available in Bioconda. Both are Spark-based and designed for extreme scalability. jnpsolutions.io/2026/03/05/s...
Slacken (metagenomic profiler) and Discount (k-mer counter) are now available in Bioconda. Both are Spark-based and designed for extreme scalability. jnpsolutions.io/2026/03/05/s...
How would you design a *multithreaded*, *concurrent* & *dynamic* hash table if you are focused specifically on common k-mer workloads, where streaming query & insertion are common? Jamshed, Prashant and I explore this in kache-hash, a cache-friendly k-mer hash table!
www.biorxiv.org/content/10.6...
Every reputable expert I know considers mRNA vaccine technology to be one of the most revolutionary advances in medicine in our lifetimes. Its inventors won the Nobel Prize in 2023. Shutting it down now is pointless self-harm to humanity.
Explaining in simple terms the two main results achieved by our #metagenomic classifier #Slacken - scaling independently of RAM, and sample-tailored libraries. jnpsolutions.io/2025/07/03/s...
Slacken is available on GitHub (github.com/JNP-Solution...) and reference libraries are available on S3 thanks to AWS Open Data sponsorship.
Feedback very welcome. I'd be happy to answer any questions or assist people in getting started. We want Slacken to be as accessible as possible.
2) We show that dynamically tailoring a genomic reference library to the samples being classified greatly increases the fraction of species and strain level classifications (making them more specific) as well as improving Bracken quantification.
1) We introduce a new implementation of the Kraken 2 method on Apache Spark, which has comparable cost-performance when classifying multiple samples.
Excited to announce our paper "Precise and scalable metagenomic profiling with sample-tailored minimizer libraries". academic.oup.com/nargab/artic...
#metagenomics #kraken2 #slacken
Particularly with a focus on making the software accessible for people with no Spark experience.
Is there any preferred solution for packaging and shipping software based on Apache #Spark, other than Docker images? I found a site called spark-packages.org but that doesn't look like it's been updated for a long time. #bigdata #jvm
I can imagine that people who are entering into software development now might get the false impression that there's only accidental complexity and AI is our only hope to temper it. But you only get to understand simplicity by developing your own taste for it (by fighting complexity for long enough)
One challenge I think young people are facing is that you have to wade through so much accidental complexity before you start seeing the light. It's only in my mid 30's that I think I understood how to value simplicity and elegance. Before that I was not seeing the forest for the trees a lot.
βWhy I stopped using AI code editorsβ
The article is spot on. I've gained my intuition & expertise, aka good taste in software engineering, by suffering through learning and taking care of the nitty-gritty while thinking of abstraction and reuse.
lucianonooijen.com/blog/why-i-s...
CDC datasets have been saved. But you can still help by seeding.
Number theorists: please get in touch. xenaproject.wordpress.com/2025/01/20/t...
This resolves an inherent conflict between scalability and precision in Kraken 2.
Our new #metagenomics paper. What if the Kraken 2 library was specifically built for the samples being classified, every time? We show that this improves precision significantly while also being surprisingly cheap, using a new clone of Kraken 2 based on Apache Spark. www.biorxiv.org/content/10.1...
Precise and scalable metagenomic profiling with sample-tailored minimizer libraries https://www.biorxiv.org/content/10.1101/2024.12.22.629657v1