Evaluating the Search Agent in a Parallel World
Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable cha...
Evaluating the Search Agent in a Parallel World
Evaluates search agents in a controlled "parallel world" isolated from a model's parametric memory, using atomic facts as ground truth to expose bottlenecks in query formulation and evidence coverage.
๐ arxiv.org/abs/2603.04751
06.03.2026 04:02
๐ 0
๐ 0
๐ฌ 0
๐ 0
Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks
Information retrieval (IR) benchmarks typically follow the Cranfield paradigm, relying on static and predefined corpora. However, temporal changes in technical corpora, such as API deprecations and co...
Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks
Investigates how temporal corpus drift affects IR benchmarks, finding that retrieval benchmarks re-judged with evolving corpora remain reliable.
๐ arxiv.org/abs/2603.04532
๐จ๐ฝโ๐ป github.com/fresh-stack/...
06.03.2026 04:00
๐ 0
๐ 0
๐ฌ 0
๐ 0
Scaling Laws for Reranking in Information Retrieval
Scaling laws have been observed across a wide range of tasks, such as natural language generation and dense retrieval, where performance follows predictable patterns as model size, data, and compute g...
Scaling Laws for Reranking in Information Retrieval
Presents a study of scaling laws for rerankers, showing that NDCG follows predictable power laws across model size, data, and compute.
๐ arxiv.org/abs/2603.04816
๐จ๐ฝโ๐ป github.com/rahulseethar...
06.03.2026 03:58
๐ 0
๐ 0
๐ฌ 0
๐ 0
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs). To enhance trust, natural language claims from diverse sources, including human-written text,...
Leveraging LLM Parametric Knowledge for Fact Checking Without Retrieval
Introduces a retrieval-free fact-checking method that exploits interactions between LLMs' internal layer representations to verify claim factuality.
๐ arxiv.org/abs/2603.05471
๐ค huggingface.co/collections/...
06.03.2026 03:56
๐ 0
๐ 0
๐ฌ 0
๐ 0
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further ...
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
Tencent introduces a self-evolving search agent that improves RAG-based question answering through memory purification, atomic query generation, and dense reinforcement learning rewards.
๐ arxiv.org/abs/2603.03293
05.03.2026 05:28
๐ 0
๐ 0
๐ฌ 0
๐ 0
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but l...
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
Presents a plug-and-play memory module for LLM agents that structures episodic experience into a knowledge-centric graph, enabling efficient retrieval across diverse tasks.
๐ arxiv.org/abs/2603.03296
๐จ๐ฝโ๐ป github.com/TIMAN-group/...
05.03.2026 05:27
๐ 0
๐ 0
๐ฌ 0
๐ 0
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between co...
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
Proposes a lightweight proxy model that handles long-term memory retrieval for LLMs, trained via RL with a task-outcome-oriented reward.
๐ arxiv.org/abs/2603.03379
๐จ๐ฝโ๐ป github.com/plageon/MemS...
05.03.2026 05:25
๐ 0
๐ 0
๐ฌ 0
๐ 0
Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG
Retrieval-augmented generation (RAG) is a common way to ground language models in external documents and up-to-date information. Classical retrieval systems relied on lexical methods such as BM25, whi...
Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG
Shows that performance gaps between BM25 and modern multimodal retrievers on multilingual and visually rich benchmarks are largely driven by OCR quality and text preprocessing.
๐ arxiv.org/abs/2603.04238
05.03.2026 05:19
๐ 0
๐ 0
๐ฌ 0
๐ 0
AgentIR: Reasoning-Aware Retrival for Deep Research Agents
Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, De...
AgentIR: Reasoning-Aware Retrival for Deep Research Agents
Jointly embeds an AI agent's reasoning traces alongside its queries and presents a data synthesis method to train retrievers specifically for Deep Research agents.
๐ arxiv.org/abs/2603.04384
๐จ๐ฝโ๐ป texttron.github.io/AgentIR/
05.03.2026 05:18
๐ 0
๐ 0
๐ฌ 0
๐ 0
SOLAR: SVD-Optimized Lifelong Attention for Recommendation
Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-conte...
SOLAR: SVD-Optimized Lifelong Attention for Recommendation
Kuaishou introduces a lossless low-rank attention mechanism that reduces complexity from O(Nยฒd) to O(Ndr) while preserving softmax, enabling lifelong sequence modeling at ten-thousand scale.
๐ arxiv.org/abs/2603.02561
04.03.2026 03:58
๐ 0
๐ 0
๐ฌ 0
๐ 0
AlphaFree: Recommendation Free from Users, IDs, and GNNs
Can we design effective recommender systems free from users, IDs, and GNNs? Recommender systems are central to personalized content delivery across domains, with top-K item recommendation being a fund...
AlphaFree: Recommendation Free from Users, IDs, and GNNs
Proposes a lightweight recommendation method that eliminates user embeddings, item IDs, and GNNs by using language representations and contrastive learning.
๐ arxiv.org/abs/2603.02653
๐จ๐ฝโ๐ป github.com/minseojeonn/...
04.03.2026 03:57
๐ 0
๐ 0
๐ฌ 0
๐ 0
APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation
Generative recommendation has recently emerged as a promising paradigm in sequential recommendation. It formulates the task as an autoregressive generation process, predicting discrete tokens of the n...
APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation
Addresses training-inference inconsistency in generative recommenders with prefix-level optimization that aligns training with beam search decoding.
๐ arxiv.org/abs/2603.02730
๐จ๐ฝโ๐ป github.com/yuyq18/APAO
04.03.2026 03:55
๐ 0
๐ 0
๐ฌ 0
๐ 0
Model Editing for New Document Integration in Generative Information Retrieval
Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs). Despite its promise, existing GR models exhibit poor generalization to ne...
Model Editing for New Document Integration in Generative Information Retrieval
Introduces a model editing method that efficiently adapts generative retrieval models to new documents via hybrid-label adaptive training.
๐ arxiv.org/abs/2603.02773
๐จ๐ฝโ๐ป github.com/zhangzhen-re...
04.03.2026 03:53
๐ 0
๐ 0
๐ฌ 0
๐ 0
Reproducing and Comparing Distillation Techniques for Cross-Encoders
Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right st...
Reproducing and Comparing Distillation Techniques for Cross-Encoders
Provides a benchmark of cross-encoder training strategies across 9 encoder backbones, finding that pairwise and listwise objectives consistently outperform pointwise
๐ arxiv.org/abs/2603.03010
๐จ๐ฝโ๐ป github.com/xpmir/cross-...
04.03.2026 03:49
๐ 0
๐ 0
๐ฌ 0
๐ 0
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two c...
DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent
Baidu presents a 9K question benchmark across three difficulty levels for evaluating deep-research agents, along with an open-source RL training framework.
๐ arxiv.org/abs/2603.01152
๐จ๐ฝโ๐ป github.com/Applied-Mach...
03.03.2026 07:34
๐ 0
๐ 0
๐ฌ 0
๐ 0
GAM-RAG: Gain-Adaptive Memory for Evolving Retrieval in Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) grounds large language models with external evidence, but many implementations rely on pre-built indices that remain static after construction. Related queries the...
GAM-RAG: Gain-Adaptive Memory for Evolving Retrieval in Retrieval-Augmented Generation
Introduces a training-free RAG framework that accumulates retrieval experience from recurring queries using gain-adaptive memory updates.
๐ arxiv.org/abs/2603.01783
๐จ๐ฝโ๐ป anonymous.4open.science/r/GAM_RAG-2EF6
03.03.2026 07:31
๐ 0
๐ 0
๐ฌ 0
๐ 0
ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents
Effective memory management is essential for large language model (LLM) agents handling long-term interactions. Current memory frameworks typically treat agents as passive "recorders" and retrieve inf...
ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents
Alibaba transforms dialogue history into a causal and semantic knowledge graph, enabling LLM agents to reason over past interactions.
๐ arxiv.org/abs/2603.00026
๐จ๐ฝโ๐ป github.com/nju-websoft/...
03.03.2026 07:27
๐ 0
๐ 0
๐ฌ 0
๐ 0
MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation
Recommender systems (RecSys) are increasingly emphasizing scaling, leveraging larger architectures and more interaction data to improve personalization. Yet, despite the optimizer's pivotal role in tr...
MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation
Introduces the Muon optimizer to recommender system training, reducing converged training steps by 32% while improving ranking quality.
๐ arxiv.org/abs/2603.00416
๐จ๐ฝโ๐ป anonymous.4open.science/r/MuonRec-E447
03.03.2026 07:24
๐ 0
๐ 0
๐ฌ 0
๐ 0