#BioNLP — Bluesky Posts

3 months ago

Lit-OTAR framework for extracting biological evidences from literature. Free full text in Europe PMC

The Europe PMC and @opentargets.org collab, Lit-OTAR, powers drug discovery by mining biomedical literature at scale.

Finding ~48.5M unique links, enriching key databases and updated daily to fuel therapeutic R&D.

Read more 👉 europepmc.org/article/MED/...
#AI4Science #BioNLP #DrugDiscovery

0 0 0 0

Europe PMC

@europepmc.org

3 months ago

Still, normalisation was strong overall 💪

Lit-OTAR even helped identify new disease synonyms, like 'T2D’, enhanced data processing, and improve analyses with FAERS 🎯

But disease names remain tricky, more variation = more missed matches, highlighting areas for future work!

#AI4Science #BioNLP

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

Now onto entity normalisation

We mapped entities:
🧬 Genes → Ensembl
🦠 Diseases → EFO
💊 Drugs → ChEMBL
Over 220M disease mentions were tagged, ~76.6% successfully normalised. But that’s only 7.6% of unique terms... the long tail of rare or variant terms is real 😅

#AI4Science #BioNLP

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

What about other models?

SpaCy was faster but slightly less precise, still a solid choice for lightweight applications.

The old dictionary method? High recall in some spots, but much lower precision.

QEB8L struck the best balance, with high overlap to our gold standard.

#AI4Science #BioNLP

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

Let’s talk results!

First up: Entity Recognition
BioBERT topped the charts for precision (0.90-0.93) 🔬
But it’s computationally heavy…

So we optimised Bioformer-8L into QEB8L = 10× faster, 77MB model size, and still scoring 0.85–0.94 precision and 0.88 - 0.89 F1 🙌
#AI4Science #BioNLP #TextMining

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

But there’s a trade-off.

This approach can miss associations across sentences, for example coreference or inferred context, which can affect the comprehensiveness.

But this approach is flexible, scalable, and customisable by you!

#AI4Science #BioNLP #TextMining

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

For entity recognition, we trained models using a combined dataset from Europe PMC and CHEMDNER to detect:

🧬 genes/proteins
🦠 diseases
💊 chemicals/drugs
🧫 organisms
We tested models BioBERT, Bioformer & custom SpaCy
#AI4Science #BioNLP

0 0 1 0

Europe PMC

@europepmc.org

3 months ago

Can we use #AI to revolutionise drug discovery?

Meet Lit-OTAR, a deep learning framework built by Europe PMC and @opentargets.org

It’s identified >48.5 million unique associations, accelerating the drug discovery process!

Read the thread for a paper summary👇

#AI4Science #BioNLP #DrugDiscovery

5 2 1 1

Jake Lever

@jakelever.bsky.social

7 months ago

Time for #BioNLP! @javiersanzcruza.bsky.social kicks off the workshop with our work on accelerating crossencoders for biomedical entity linking #acl2025

1 0 1 0

Jake Lever

@jakelever.bsky.social

8 months ago

Biomedical text mining for knowledge extraction @ ISMB 2025 A hands-on tutorial introducing biomedical natural language processing including named entity recognition, relation extraction and the power and limits of large language models.

Had fun talking about biomedical information extraction with @javiersanzcruza.bsky.social today. Check out the materials from our #ISMBECCB2025 virtual tutorial. It's a mini intro course on #BioNLP! ai4biomed.org/ismb2025tuto...

2 1 0 0

@gleghornlab.bsky.social

10 months ago

Contrastive learning and mixture of experts enables precise vector embeddings in biological databases - Scientific Reports Scientific Reports - Contrastive learning and mixture of experts enables precise vector embeddings in biological databases

🚨 New paper out! 🚨
We built a co-citation-based transformer model that supercharges scientific document embeddings — optimized for the complexity of biomedical literature.

📖: www.nature.com/articles/s41...

#AI4Science #BioNLP #Transformers #SemanticSearch #VectorDatabases #ML4Biomed

1 0 0 0