The Europe PMC and @opentargets.org collab, Lit-OTAR, powers drug discovery by mining biomedical literature at scale.
Finding ~48.5M unique links, enriching key databases and updated daily to fuel therapeutic R&D.
Read more 👉 europepmc.org/article/MED/...
#AI4Science #BioNLP #DrugDiscovery
Still, normalisation was strong overall 💪
Lit-OTAR even helped identify new disease synonyms, like 'T2D’, enhanced data processing, and improve analyses with FAERS 🎯
But disease names remain tricky, more variation = more missed matches, highlighting areas for future work!
#AI4Science #BioNLP
Now onto entity normalisation
We mapped entities:
🧬 Genes → Ensembl
🦠 Diseases → EFO
💊 Drugs → ChEMBL
Over 220M disease mentions were tagged, ~76.6% successfully normalised. But that’s only 7.6% of unique terms... the long tail of rare or variant terms is real 😅
#AI4Science #BioNLP
What about other models?
SpaCy was faster but slightly less precise, still a solid choice for lightweight applications.
The old dictionary method? High recall in some spots, but much lower precision.
QEB8L struck the best balance, with high overlap to our gold standard.
#AI4Science #BioNLP
Let’s talk results!
First up: Entity Recognition
BioBERT topped the charts for precision (0.90-0.93) 🔬
But it’s computationally heavy…
So we optimised Bioformer-8L into QEB8L = 10× faster, 77MB model size, and still scoring 0.85–0.94 precision and 0.88 - 0.89 F1 🙌
#AI4Science #BioNLP #TextMining
But there’s a trade-off.
This approach can miss associations across sentences, for example coreference or inferred context, which can affect the comprehensiveness.
But this approach is flexible, scalable, and customisable by you!
#AI4Science #BioNLP #TextMining
For entity recognition, we trained models using a combined dataset from Europe PMC and CHEMDNER to detect:
🧬 genes/proteins
🦠 diseases
💊 chemicals/drugs
🧫 organisms
We tested models BioBERT, Bioformer & custom SpaCy
#AI4Science #BioNLP
Can we use #AI to revolutionise drug discovery?
Meet Lit-OTAR, a deep learning framework built by Europe PMC and @opentargets.org
It’s identified >48.5 million unique associations, accelerating the drug discovery process!
Read the thread for a paper summary👇
#AI4Science #BioNLP #DrugDiscovery
Time for #BioNLP! @javiersanzcruza.bsky.social kicks off the workshop with our work on accelerating crossencoders for biomedical entity linking #acl2025
Had fun talking about biomedical information extraction with @javiersanzcruza.bsky.social today. Check out the materials from our #ISMBECCB2025 virtual tutorial. It's a mini intro course on #BioNLP! ai4biomed.org/ismb2025tuto...
🚨 New paper out! 🚨
We built a co-citation-based transformer model that supercharges scientific document embeddings — optimized for the complexity of biomedical literature.
📖: www.nature.com/articles/s41...
#AI4Science #BioNLP #Transformers #SemanticSearch #VectorDatabases #ML4Biomed