Glasgow IR Group's Avatar

Glasgow IR Group

@irglasgow

Glasgow Information Retrieval Group at the University of Glasgow

560
Followers
61
Following
41
Posts
17.11.2024
Joined
Posts Following

Latest posts by Glasgow IR Group @irglasgow

Post image

๐ŸŽ„ PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether

25.12.2025 07:56 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether

25.12.2025 07:56 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIRโ€™24 paper QT5 trains a T5 model to filter passages at indexing timeโ€”easy to integrate, and works with dense, PISA, or SPLADE indexes too.

24.12.2025 09:23 ๐Ÿ‘ 4 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIRโ€™24 paper QT5 trains a T5 model to filter passages at indexing timeโ€”easy to integrate, and works with dense, PISA, or SPLADE indexes too.

24.12.2025 09:23 ๐Ÿ‘ 4 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

๐ŸŽ„PyTerrier Advent 23/25: Youโ€™ve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.

23.12.2025 11:54 ๐Ÿ‘ 3 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

๐ŸŽ„PyTerrier Advent 23/25: Youโ€™ve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.

23.12.2025 11:54 ๐Ÿ‘ 3 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 22/25: A more complex pipelineโ€”knowledge-graphโ€“enhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.

22.12.2025 12:25 ๐Ÿ‘ 4 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 22/25: A more complex pipelineโ€”knowledge-graphโ€“enhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.

22.12.2025 12:25 ๐Ÿ‘ 4 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.

21.12.2025 11:25 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.

21.12.2025 11:25 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)

20.12.2025 10:54 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)

20.12.2025 10:54 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.

19.12.2025 10:24 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.

19.12.2025 10:24 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

๐ŸŽ„ PyTerrier Advent 18/25: In RAG, the reader runs the LLMโ€”but your pipeline shouldnโ€™t depend on the LLM stack.

PyTerrier-RAG separates Reader from Backend, letting you swap vLLM โ†” HF with one line while keeping the same pipeline (and even share a Backend with other stages).

18.12.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

๐ŸŽ„ PyTerrier Advent 18/25: In RAG, the reader runs the LLMโ€”but your pipeline shouldnโ€™t depend on the LLM stack.

PyTerrier-RAG separates Reader from Backend, letting you swap vLLM โ†” HF with one line while keeping the same pipeline (and even share a Backend with other stages).

18.12.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.

17.12.2025 14:30 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.

17.12.2025 14:30 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.

You can see that we really work to keep the look-and-feel uniform between implementations

16.12.2025 09:32 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.

You can see that we really work to keep the look-and-feel uniform between implementations

16.12.2025 09:32 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image

๐ŸŽ„PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).

Try it ๐Ÿ‘‰ github.com/cmacdonald/p...

15.12.2025 10:51 ๐Ÿ‘ 1 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image

๐ŸŽ„PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).

Try it ๐Ÿ‘‰ github.com/cmacdonald/p...

15.12.2025 10:51 ๐Ÿ‘ 1 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 14/25: So weโ€™ve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval โ€“ e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)

14.12.2025 18:31 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 10/25: Dense retrieval often improves with pseudo-relevance feedback (Rocchio-style).

In PyTerrier_DR itโ€™s easy: encode query, retrieve docs, a transformer to mix doc vectors w/ the query vector, and then re-retrieve.
pyterrier.readthedocs.io/en/latest/ex...

10.12.2025 10:17 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 14/25: So weโ€™ve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval โ€“ e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)

14.12.2025 18:31 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIRโ€™23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generateโ†’scoreโ†’filterโ†’index.
๐Ÿ“„https://arxiv.org/pdf/2301.03266

13.12.2025 11:18 ๐Ÿ‘ 3 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIRโ€™23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generateโ†’scoreโ†’filterโ†’index.
๐Ÿ“„https://arxiv.org/pdf/2301.03266

13.12.2025 11:18 ๐Ÿ‘ 3 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpusโ€”perfectly intuitive as PyTerrierโ€™s pipelines can be applied at indexing time too!

12.12.2025 10:18 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„ PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpusโ€”perfectly intuitive as PyTerrierโ€™s pipelines can be applied at indexing time too!

12.12.2025 10:18 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

๐ŸŽ„PyTerrier Advent 11/25: Want to use an external search services with PyTerrier? No problemo! It has integrations with APIs for Semantic Scholar, ChatNoir (thanks to Jan Heinrich Merker!), Pinecone, and others!

11.12.2025 09:53 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0