Glasgow IR Group (@irglasgow)

🎄 PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether

25.12.2025 07:56 👍 1 🔁 1 💬 0 📌 0

🎄 PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether

25.12.2025 07:56 👍 1 🔁 1 💬 0 📌 0

🎄 PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIR’24 paper QT5 trains a T5 model to filter passages at indexing time—easy to integrate, and works with dense, PISA, or SPLADE indexes too.

24.12.2025 09:23 👍 4 🔁 3 💬 1 📌 0

🎄 PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIR’24 paper QT5 trains a T5 model to filter passages at indexing time—easy to integrate, and works with dense, PISA, or SPLADE indexes too.

24.12.2025 09:23 👍 4 🔁 3 💬 1 📌 0

🎄PyTerrier Advent 23/25: You’ve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.

23.12.2025 11:54 👍 3 🔁 3 💬 1 📌 0

🎄PyTerrier Advent 23/25: You’ve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.

23.12.2025 11:54 👍 3 🔁 3 💬 1 📌 0

🎄 PyTerrier Advent 22/25: A more complex pipeline—knowledge-graph–enhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.

22.12.2025 12:25 👍 4 🔁 4 💬 1 📌 0

🎄 PyTerrier Advent 22/25: A more complex pipeline—knowledge-graph–enhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.

22.12.2025 12:25 👍 4 🔁 4 💬 1 📌 0

🎄PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.

21.12.2025 11:25 👍 2 🔁 2 💬 1 📌 0

🎄PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.

21.12.2025 11:25 👍 2 🔁 2 💬 1 📌 0

🎄PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)

20.12.2025 10:54 👍 2 🔁 2 💬 1 📌 0

🎄PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)

20.12.2025 10:54 👍 2 🔁 2 💬 1 📌 0

🎄 PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.

19.12.2025 10:24 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.

19.12.2025 10:24 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 18/25: In RAG, the reader runs the LLM—but your pipeline shouldn’t depend on the LLM stack.

PyTerrier-RAG separates Reader from Backend, letting you swap vLLM ↔ HF with one line while keeping the same pipeline (and even share a Backend with other stages).

18.12.2025 14:23 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 18/25: In RAG, the reader runs the LLM—but your pipeline shouldn’t depend on the LLM stack.

PyTerrier-RAG separates Reader from Backend, letting you swap vLLM ↔ HF with one line while keeping the same pipeline (and even share a Backend with other stages).

18.12.2025 14:23 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.

17.12.2025 14:30 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.

17.12.2025 14:30 👍 0 🔁 1 💬 1 📌 0

🎄PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.

You can see that we really work to keep the look-and-feel uniform between implementations

16.12.2025 09:32 👍 2 🔁 1 💬 1 📌 0

🎄PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.

You can see that we really work to keep the look-and-feel uniform between implementations

16.12.2025 09:32 👍 2 🔁 1 💬 1 📌 0

🎄PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).

Try it 👉 github.com/cmacdonald/p...

15.12.2025 10:51 👍 1 🔁 2 💬 1 📌 0

🎄PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).

Try it 👉 github.com/cmacdonald/p...

15.12.2025 10:51 👍 1 🔁 2 💬 1 📌 0

🎄 PyTerrier Advent 14/25: So we’ve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval – e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)

14.12.2025 18:31 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 10/25: Dense retrieval often improves with pseudo-relevance feedback (Rocchio-style).

In PyTerrier_DR it’s easy: encode query, retrieve docs, a transformer to mix doc vectors w/ the query vector, and then re-retrieve.
pyterrier.readthedocs.io/en/latest/ex...

10.12.2025 10:17 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 14/25: So we’ve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval – e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)

14.12.2025 18:31 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIR’23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generate→score→filter→index.
📄https://arxiv.org/pdf/2301.03266

13.12.2025 11:18 👍 3 🔁 3 💬 1 📌 0

🎄 PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIR’23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generate→score→filter→index.
📄https://arxiv.org/pdf/2301.03266

13.12.2025 11:18 👍 3 🔁 3 💬 1 📌 0

🎄 PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpus—perfectly intuitive as PyTerrier’s pipelines can be applied at indexing time too!

12.12.2025 10:18 👍 0 🔁 1 💬 1 📌 0

🎄 PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpus—perfectly intuitive as PyTerrier’s pipelines can be applied at indexing time too!

12.12.2025 10:18 👍 0 🔁 1 💬 1 📌 0

🎄PyTerrier Advent 11/25: Want to use an external search services with PyTerrier? No problemo! It has integrations with APIs for Semantic Scholar, ChatNoir (thanks to Jan Heinrich Merker!), Pinecone, and others!

11.12.2025 09:53 👍 1 🔁 1 💬 1 📌 0

Glasgow IR Group

Latest posts by Glasgow IR Group @irglasgow