Some Open Problems in Probability that are Relevant to Applied Statistics (my talk this Wed noon at the Columbia statistics department student seminar)
statmodeling.stat.columbia.edu/2026/02/10/m...
Some Open Problems in Probability that are Relevant to Applied Statistics (my talk this Wed noon at the Columbia statistics department student seminar)
statmodeling.stat.columbia.edu/2026/02/10/m...
New paper on the problem of "missing regulation" (limited overlap between GWAS signals and eQTLs) from Shamil Sunyaev's lab. Led by Noah Connally.
Our work on the generalizability of polygenic scores (PGS) from the @arbelharpak.bsky.social Lab is now officially out!
We examine the accuracy of PGS predictions at the individual level. We make 3 observations that expose gaps in our understanding of PGS βportability.β
rdcu.be/e0LAr
(1/27)
Insightful paper on the importance of phenotypic scale when testing for interactions involving genetic variants (specifically, GxE effects). From Iain Mathieson's and Andy Dahl's labs, and led by Manuela Costantino.
Registration for the 2026 NY Area Population Genetics meeting is now open, at events.simonsfoundation.org/e0mEoL?rt=8k.... Registration is free but required; if you are submitting an abstract, note that the deadline is *January 30th*.
Happy to highlight an essay I wrote together with @marcdemanuel.bsky.social,
@natanaels.bsky.social and Anastasia Stolyarova, trying to think through what sets the mutation rate of a cell type in an animal species: www.biorxiv.org/content/10.6... 1/n
GWAS has been an incredible discovery tool for human genetics: it regularly identifies *causal* links from 1000s of SNPs to any given trait. But mechanistic interpretation is usually difficult.
Our latest work on causal models for this is out yesterday:
www.nature.com/articles/s41...
A shortπ§΅:
Delighted that our paper about the distribution of genomic spans of clades/edges in genealogies (ARGs), and using this for detecting inversions and other SVs (and other phenomena that cause local disruption of recombination) is out in MBE academic.oup.com/mbe/article/... (1/n)
SuSiE 2.0: improved methods and implementations for genetic fine-mapping and phenotype prediction https://www.biorxiv.org/content/10.1101/2025.11.25.690514v1
π¨ New preprint from the lab!
Weβre excited to share βImproving population-scale disease prediction through multi-omics integrationβ by Ng et al. www.medrxiv.org/content/10.1...
...for interactions involving the HLA region in collaboration with @y-luo.bsky.social.
Thanks for reading!
...might allow for detecting many more of these effects.
I'd like to thank Sile Hu for his help and Simon Myers for his supervision. π
I'm also very grateful to @mollyprz.bsky.social for generous financial support in the final stages of the project.
In ongoing work, we are testing...
...are partly mediated through modulating the effects of other SNPs.
Another takeaway is that we find more interactions for molecular phenotypes than for more complex and polygenic phenotypes (probably due to greater statistical power to detect them), and so novel proteomics datasets...
...functional relationships between genes.
Moreover, many phenotypes (more than half of those we analysed) show interactions, and in fact some well-known hits from standard GWASs (at FTO for obesity or TCF7L2 for diabetes, for example) have effects on disease-relevant phenotypes that...
...the Wnt signalling pathway (itself important in diabetes aetiology) which points to the potential relevance of this interaction in the architecture of this disease.
Our results show that, even though interactions explain very little phenotypic variance, they can be useful by pointing to...
...to partition PGSs and test the same 144 hits for interactions with partitioned scores. We identify 12 interactions, including one between the strongest T2D-associated SNP found to date (at TCF7L2) and the KDM2A TF for HbA1c levels. KDM2A has been found to interact physically with TCF7L2 within...
...and IL33 for eosinophil levels, which could reflect a functional interaction between these genes recently implicated in eosinophilic asthma.
We then look for interactions that are more precise than SNP-by-PGS but broader than SNP-by-SNP: we use data on transcription factor binding motifs...
This plot displays graphically the results of a regression model of a phenotype (serum alkaline phosphatase levels) on five genetic variants at the PIGC, ABO, TREH, FUT6 and FUT2 genes. For each variant, both its main effect (minor allele count/dosage) and its square (to account for possible dominance/recessiveness) are included, as well as an interaction term with every other variant. Five genome-wide significant interactions between variants are detected, of which two are novel.
...for SNP-by-SNP interactions but within a much smaller search space, and allows us to find 38 pairs (of which 32 are novel to our knowledge).
Our results recover and extend a known network involving ABO, FUT2 and TREH for alkaline phosphatase. Another highlight is an interaction between ALOX15...
Scatter plot showing the effect size of an IL33 SNP on eosinophil count (y-axis) varying according to a polygenic score (PGS) for that phenotype (x-axis). The average effect of this SNP in five quintiles of the PGS is plotted, showing that its effect on the trait increases with this score.
...the effect of the PGS on the trait; or the effect of a SNP varies depending on polygenic background. Our signals include well-know disease risk variants at APOE, FTO and TCF7L2.
We then take these 144 associations and look for pairwise interactions genome-wide. This is a classic search...
A matrix is shown in which the rows correspond to different phenotypes within the 'blood cell count' category in the UK Biobank, and the columns to different genetic variants (and associated genes) for which an interaction with a polygenic score was found. The cells of the matrix are coloured to indicate which phenotypeβvariant pairs show a significant interaction with the polygenic score for that phenotype.
We develop a method to test for interactions between SNPs and polygenic scores (PGSs) and apply it to 97 quantitative phenotypes in the @ukbiobank.bsky.social, identifying 144 associations for 52 different traits.
These can be interpreted in two equivalent ways: the genotype at a locus alters...
...a linear model of genotype > phenotype.
Interactions can help with understanding biological mechanisms by identifying different parts of the genome whose statistical effects on a phenotype are interdependent β and which are therefore likely to also interact functionally within a pathway.
GWASs have been hugely successful in finding genetic associations but understanding the function of associated loci remains a great challenge.
We address this question from the angle of genetic interactions (epistasis): statistical interaction terms between genetic variants in...
Excited to share a preprint of my PhD project looking at interactions between SNPs and polygenic scores in the UK Biobank!
A thread... π§΅
www.medrxiv.org/content/10.1...
New study in #GENETICS from @anaignatieva.bsky.social and @linoafferreira.bsky.social shows how ancestral recombination graphs can help detect "phantom" genetic interaction signals that arise due to genealogy and not because of epistasis. buff.ly/TQARoDp
Our paper about how ancestral recombination graphs can be used to detect "phantom" genetic interaction signals (that arise due to the genealogy, rather than "real" epistasis) is out in Genetics! Nice thread here by @linoafferreira.bsky.social
academic.oup.com/genetics/adv...
Thank you!
We hope this approach will enable others to search for (and perhaps find!) epistatic effects in cis, and through this to learn more about the genetic basis of complex phenotypes.
Thanks for reading! (end π§΅)
In contrast, our method only requires publicly available WGS data for samples of similar ancestral background (we use 1KGP) whose information is efficiently encoded in the form of an ARG. This makes it applicable in settings where WGS data is not available (including for non-human species).
it was very difficult to rigorously test for such effects.
WGS data or dense imputation panels allowed for checking whether any neighbouring variant accounted for a putative interaction but only if this data was available for the same sample in which the epistasis testing was done.
evidence *against* the existence of a problematic variant.
This allows us to quantify the observed evidence either for or against a potential interaction being real.
Epistasis between variants in cis could be common (or at least less rare than that between variants farther apart) but until now...