Matt McGuffie's Avatar

Matt McGuffie

@mcg.bio

mcg.bio bioinformatics @Twist formerly: bioinformatics lead at plasmidsaurus, PhD at University of Texas at Austin bioinformatics, bacteriology, phages, engineered plasmids, synthetic biology, insects

188
Followers
696
Following
4
Posts
23.11.2024
Joined
Posts Following

Latest posts by Matt McGuffie @mcg.bio

Adult male (small) and female (large) of Armillifer sp. Looks like a big and small crinkle cut french fry, also with paired hooks at the front. 

Both photos from this paper:
https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0000320

Adult male (small) and female (large) of Armillifer sp. Looks like a big and small crinkle cut french fry, also with paired hooks at the front. Both photos from this paper: https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0000320

Adult female Linguatula serrata. Looks like a wrinkly tube, a bit wider at the front and tapering to a tail. There are some hooklike bits at the front.

Adult female Linguatula serrata. Looks like a wrinkly tube, a bit wider at the front and tapering to a tail. There are some hooklike bits at the front.

🚨 Don't eat uncooked snake! Your eyes & lungs can be parasitized by pentastomids, which are a CRUSTACEAN. Adults lack so many features that they weren't believed to be arthropods at all until their DNA was sequenced (and sperm morphology, but that was not widely accepted). Yet, they are!
#Crustmas πŸ§ͺ

21.12.2023 16:38 πŸ‘ 360 πŸ” 82 πŸ’¬ 37 πŸ“Œ 36
Preview
mim: A lightweight auxiliary index to enable fast, parallel, gzipped FASTQ parsing The FASTQ file format is the lingua franca of primary data distribution and processing across most of bioinformatics. Over time, the compression, storage, transmission, and decompression of gzip compr...

Woot! The mim preprint is live now. www.biorxiv.org/content/10.1...

Happy Thanksgiving!

Cc @curiouscoding.nl

27.11.2025 17:23 πŸ‘ 25 πŸ” 9 πŸ’¬ 1 πŸ“Œ 1
Preview
Intracellular competition shapes plasmid population dynamics From populations of multicellular organisms to selfish genetic elements, conflicts between levels of biological organization are central to evolution. Plasmids are extrachromosomal, self-replicating g...

Hot off the press! Our latest paper led by @fernpizza.bsky.social, understanding how plasmids evolve inside cells. These small, self-replicating DNA circles live inside bacteria and carry antibiotic resistance genes, but also compete with one another to replicate. 1/
www.science.org/doi/10.1126/...

20.11.2025 21:42 πŸ‘ 437 πŸ” 199 πŸ’¬ 11 πŸ“Œ 18
Average nucleotide identity β€” the backbone of modern ecological genomics - Nature Reviews Genetics In this Journal Club, Luis Orellana recalls a 2005 publication by Konstantinidis and Tiedje that introduced average nucleotide identity as a sequence-based metric to determine the relatedness between ...

The average nucleotide identity (ANI) underpins how we map microbial diversity, compare species, and connect genomes to ecology.
I wrote a short piece reflecting on the discovery and significance of this metric (and really enjoyed digging into the context and story behind it!) #microsky 🧬

30.10.2025 21:11 πŸ‘ 58 πŸ” 22 πŸ’¬ 1 πŸ“Œ 2
Preview
Unannotated translation products are widespread in model E. coli Genomes contain orders of magnitude more open reading frames (ORFs) than known protein coding genes, and recent work suggests there may be unannotated proteins present in even the best studied organisms. To address this gap, we used a high throughput reverse genetic toolkit to construct precise C-terminal fusions of a reporter (and control) to >120,000 ORFs in model E. coli . We found hundreds of unannotated significant hits, and individually detected >50 novel polypeptides by western blot, including ORFs within tRNA loci. Many ORFs overlap annotated genes in the sense orientation, and we found these are likely chimeric polypeptides produced by ribosomal frameshifting. Using degron based knockdowns, we identified unannotated proteins that have putative fitness effects, and we found a novel small protein that displays phenotypes consistent with a role in the mRNA degradosome. The observation of a range of unannotated translation products should lead to better annotation and understanding of the bacterial domain of life and motivates the continued exploration of genomes broadly. ### Competing Interest Statement The authors have declared no competing interest.

Unannotated translation products are widespread in model E. coli | bioRxiv https://www.biorxiv.org/content/10.1101/2025.09.25.678689v1?rss=1

27.09.2025 04:14 πŸ‘ 18 πŸ” 12 πŸ’¬ 0 πŸ“Œ 2

C. elegans is a real animal and we set out to understand how it comes to have its distinctive biogeography. Its ancestral center of diversity is in the higher elevation forests of Hawaii. Its closest relatives are spread across east Asia. Did they travel from Asia? [Preprint 🧡]

24.09.2025 20:33 πŸ‘ 168 πŸ” 79 πŸ’¬ 5 πŸ“Œ 8

Heads up: ignore samtools dot org, similarly minimap2 dot com and likely others. It's owned by a known phishing site and while the binaries they offer look valid currently (but note they may be serving us different binaries to others), that could change.

Ie: it's not us (Samtools team)! Be warned

15.09.2025 08:40 πŸ‘ 146 πŸ” 127 πŸ’¬ 2 πŸ“Œ 5
Video thumbnail

tgv 0.1.0 release: github.com/zeqianli/tgv
- Rich CIGAR and base visualization
- Allele frequency visualization
- VCF and BED file support
- Mouse dragging and hovering
- Filter alignment

Now 90% of what I need from IGV can be done in the terminal.

Some interesting behind-the-scenes:

07.09.2025 23:47 πŸ‘ 11 πŸ” 7 πŸ’¬ 1 πŸ“Œ 0
Preview
tskit_arg_visualizer: interactive plotting of ancestral recombination graphs Summary: Ancestral recombination graphs (ARGs) are a complete representation of the genetic relationships between recombining lineages and are of central importance in population genetics. Recent brea...

Excited to share our new preprint for the tskit_arg_visualizer Python package! ARGs can sometimes feel like a black box, so
@yanwong.bsky.social and I have been developing a method to programmatically drawing these graphs.

πŸ”— arxiv.org/abs/2508.03958

1/6

19.08.2025 14:12 πŸ‘ 63 πŸ” 35 πŸ’¬ 2 πŸ“Œ 2
Post image

Oatk: a de novo assembly tool for complex plant organelle genomes. #DeNovoAssembly #OrganelleGenomes #Bioinformatics #GenomeBiology
genomebiology.biomedcentral.com/articles/10....

11.08.2025 10:30 πŸ‘ 5 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Preview
A diverse and distinct microbiome inside living trees - Nature Microbiome analyses of living trees show that a single tree can host approximately one trillion bacteria, with microbial communities distinctly partitioned between heartwood and sapwood and with minim...

#NatMicroPicks

Hidden microbial world in trees🌳

Living wood hosts trillions of bacteria making trees a complex ecosystems with major roles in forest health and function.

#PlantMicro #MicroSky

www.nature.com/articles/s41...

08.08.2025 14:28 πŸ‘ 74 πŸ” 28 πŸ’¬ 0 πŸ“Œ 4
Post image

StrainR2 accurately deconvolutes strain-level abundances in synthetic microbial communities. #Metagenomics #StrainLevelAbundance #Bioinformatics
academic.oup.com/bioinformati...

09.08.2025 18:05 πŸ‘ 2 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
Protein language models reveal evolutionary constraints on synonymous codon choice Evolution has shaped the genetic code, with subtle pressures leading to preferences for some synonymous codons over others. Codons are translated at different speeds by the ribosome, imposing constrai...

Protein language models reveal evolutionary constraints on synonymous codon choice
#rnasky #microsky "cotranslational localization and translational accuracy, more than cotranslational protein folding, are major drivers of selective pressure on codon choice" in yeast here πŸ’«
doi.org/10.1101/2025...

09.08.2025 18:31 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Announcing taxburst, an update of the Krona software for taxonomy exploration Announcing taxburst for metagenome taxonomy!

taxburst v0.3.0 is now released - this is an update of the Krona visualization system for microbiome/metagenome taxonomy analyses. Enjoy!

08.08.2025 14:19 πŸ‘ 26 πŸ” 17 πŸ’¬ 0 πŸ“Œ 0
Preview
Phyling: phylogenetic inference from annotated genomes Phyling is a fast, scalable, and user-friendly tool supporting phylogenomic reconstruction of species phylogenies directly from protein-encoded genomic data. It identifies orthologous genes by searchi...

This looks like an amazing tool
www.biorxiv.org/content/10.1...

06.08.2025 20:50 πŸ‘ 44 πŸ” 19 πŸ’¬ 3 πŸ“Œ 0
Preview
Scaling down protein language modeling with MSA Pairformer Recent efforts in protein language modeling have focused on scaling single-sequence models and their training data, requiring vast compute resources that limit accessibility. Although models that use ...

Excited to share work with
Zhidian Zhang, @milot.bsky.social, @martinsteinegger.bsky.social, and @sokrypton.org
biorxiv.org/content/10.1...
TLDR: We introduce MSA Pairformer, a 111M parameter protein language model that challenges the scaling paradigm in self-supervised protein language modeling🧡

05.08.2025 06:29 πŸ‘ 97 πŸ” 43 πŸ’¬ 1 πŸ“Œ 1
Preview
GitHub - lh3/longdust: Identify long STRs, VNTRs, satellite DNA and other low-complexity regions in a genome Identify long STRs, VNTRs, satellite DNA and other low-complexity regions in a genome - lh3/longdust

Fun new tool from Heng Li. Thinking maybe I can use this to help find plasmid replication gene correlated repeat regions - though he specifically mentions it's not for tandem repeat regions. Hmm. πŸ–₯️🧬

github.com/lh3/longdust

01.08.2025 11:34 πŸ‘ 7 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Synteny-aware functional annotation of bacteriophage genomes with Phynteny Accurate genome annotation is fundamental to decoding viral diversity and understanding bacteriophage biology; yet, the majority of bacteriophage genes remain functionally uncharacterised. Bacteriopha...

🚨 New preprint 🚨

My phage annotation tool, Phynteny, finally has a preprint and a brand new version powered by a cool AI transformer architecture and protein language models! #phagesky

www.biorxiv.org/content/10.1...

30.07.2025 06:00 πŸ‘ 85 πŸ” 42 πŸ’¬ 2 πŸ“Œ 1

A bit late to joining the Bluesky party, but it's great to see all the amazing scientists who are on this platform! Looking forward to connecting with all of you here (on twitter as @niranjantw ... so keeping the handle consistent).

28.04.2025 01:13 πŸ‘ 5 πŸ” 2 πŸ’¬ 2 πŸ“Œ 0
Post image

AFESM: a metagenomic guide through the protein structure universe! We clustered 821M structures (AFDB&ESMatlas) into 5.12M groups; revealing biome-specific groups, only 1 new fold even after AlphaFold2 re-prediction & many novel domain combos. 🧡
🌐 afesm.foldseek.com
πŸ“„ www.biorxiv.org/content/10.1...

27.04.2025 00:13 πŸ‘ 141 πŸ” 71 πŸ’¬ 4 πŸ“Œ 4
A pack of stickers with the logo of our new tool, K-mer Fast Counter a.k.a. KFC

A pack of stickers with the logo of our new tool, K-mer Fast Counter a.k.a. KFC

I'll be presenting our work on hyper-k-mers at #RECOMB today at 10:40 KST!

You can get a sneak peek at the slides here: igor.martayan.org/slides-recom...

Come say hi if you'd like to chat, or just get one of these cute stickers!

26.04.2025 22:23 πŸ‘ 18 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0

Assemblies of long-read metagenomes suffer from diverse errors https://www.biorxiv.org/content/10.1101/2025.04.22.649783v1

25.04.2025 00:46 πŸ‘ 23 πŸ” 11 πŸ’¬ 0 πŸ“Œ 1
Preview
Genomic divergence across the tree of life | PNAS Nucleotide sequence data are being harnessed to identify species, even in cases in which organisms themselves are neither in hand nor witnessed. Bu...

How #genome-wide sequence divergence maps to species status www.pnas.org/doi/10.1073/... #biodiversity #genomics

06.04.2025 14:05 πŸ‘ 60 πŸ” 33 πŸ’¬ 0 πŸ“Œ 3
Post image

Uncalled4: a toolkit for nanopore signal alignment, analysis and visualization of DNA and RNA modifications.

www.nature.com/articles/s41...

28.03.2025 17:23 πŸ‘ 47 πŸ” 26 πŸ’¬ 1 πŸ“Œ 1
Preview
Telomeric transposons are pervasive in linear bacterial genomes Eukaryotes have linear DNA, and their telomeres are hotspots for transposons, which in some cases took over telomere maintenance. We identified several families of independently evolved telomeric tran...

wow, telomeric transposons in bacteria with linear chromosomes! (of course this was first figured out in flies, inc by Bob Levis, who i was happy to see few days ago at the fly meeting). πŸͺ°

www.science.org/doi/10.1126/...

www.sciencedirect.com/science/arti...

www.sciencedirect.com/science/arti...

27.03.2025 20:55 πŸ‘ 62 πŸ” 36 πŸ’¬ 0 πŸ“Œ 1
Preview
Genetics, ecology and evolution of phage satellites Nature Reviews Microbiology, Published online: 27 March 2025; doi:10.1038/s41579-025-01156-zIn this Review, PenadΓ©s et al. explore the genetics, potential origins and life cycle of phage satellites, and they discuss the impact of these elements on the evolution of other mobile genetic elements and their host bacteria.

New online! Genetics, ecology and evolution of phage satellites

27.03.2025 11:26 πŸ‘ 42 πŸ” 28 πŸ’¬ 0 πŸ“Œ 2
Preview
GitHub - rrwick/condaenvlist: a simple tool for listing conda environments with descriptions a simple tool for listing conda environments with descriptions - rrwick/condaenvlist

Do you (like me) create a bunch of conda environments, then later forget what they're for, when they were last updated, or which tools are in them?

If so, you might this little project: github.com/rrwick/conda...

27.03.2025 04:34 πŸ‘ 78 πŸ” 40 πŸ’¬ 1 πŸ“Œ 1
Preview
GitHub - yangao07/longcallD: A local-haplotagging-based small and structural variant caller A local-haplotagging-based small and structural variant caller - yangao07/longcallD

longcallD is a new variant caller for genomic long reads. It jointly calls phased small and structural variants. Single binary, one command line for the whole process. Comparable accuracy to mainstream callers. Great work by Yan Gao. github.com/yangao07/lon...

24.03.2025 16:53 πŸ‘ 105 πŸ” 49 πŸ’¬ 3 πŸ“Œ 3
Bakta database This data repository contains the mandatory DB for Bakta. It is available in two versions: the default (db.tar.gz or) and a lightweight alternative (db-light.tar.gz). Bakta is a tool for the rapid & s...

🦠🧬πŸ–₯️ New Bakta DB v6.0 released!

After a year, it was time for a Bakta database update - and it's a huge one:
- IPS: 330.9M
- PSC: 135.3M
- PSCC: 37M

doi.org/10.5281/zeno...

πŸ‘‡ 1/6

06.03.2025 08:59 πŸ‘ 12 πŸ” 9 πŸ’¬ 1 πŸ“Œ 0
Post image

Interested in bacterial genomes?

Hundreds of thousands, even millions?

All annotated, taxonomically classified, integrated with metadata.

Easily searchable, viewable, downloadable, in sync with #AllTheBacteria.

Then BakRep is for you! Poster P-CM-102 @vaam-microbes.bsky.social #VAAM25

25.03.2025 09:45 πŸ‘ 18 πŸ” 9 πŸ’¬ 2 πŸ“Œ 0