Sumit (@reachsumit.com)

Evaluating the Search Agent in a Parallel World Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable cha...

Evaluating the Search Agent in a Parallel World

Evaluates search agents in a controlled "parallel world" isolated from a model's parametric memory, using atomic facts as ground truth to expose bottlenecks in query formulation and evidence coverage.

📝 arxiv.org/abs/2603.04751

06.03.2026 04:02 👍 0 🔁 0 💬 0 📌 0

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented re...

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Ant Group introduces a hybrid reward framework for RAG that uses contrastive log-likelihood estimation to reward grounded generation.

📝 arxiv.org/abs/2603.04406

06.03.2026 04:01 👍 0 🔁 0 💬 0 📌 0

KARL: Knowledge Agents via Reinforcement Learning We present a system for training enterprise search agents via reinforcement learning that achieves state-of-the-art performance across a diverse suite of hard-to-verify agentic search tasks. Our work ...

KARL: Knowledge Agents via Reinforcement Learning

Databricks presents a system for training enterprise search agents via RL that achieves state-of-the-art performance on grounded reasoning tasks.

📝 arxiv.org/abs/2603.05218

06.03.2026 04:01 👍 0 🔁 0 💬 0 📌 0

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks Information retrieval (IR) benchmarks typically follow the Cranfield paradigm, relying on static and predefined corpora. However, temporal changes in technical corpora, such as API deprecations and co...

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks

Investigates how temporal corpus drift affects IR benchmarks, finding that retrieval benchmarks re-judged with evolving corpora remain reliable.

📝 arxiv.org/abs/2603.04532
👨🏽‍💻 github.com/fresh-stack/...

06.03.2026 04:00 👍 0 🔁 0 💬 0 📌 0

Scaling Laws for Reranking in Information Retrieval Scaling laws have been observed across a wide range of tasks, such as natural language generation and dense retrieval, where performance follows predictable patterns as model size, data, and compute g...

Scaling Laws for Reranking in Information Retrieval

Presents a study of scaling laws for rerankers, showing that NDCG follows predictable power laws across model size, data, and compute.

📝 arxiv.org/abs/2603.04816
👨🏽‍💻 github.com/rahulseethar...

06.03.2026 03:58 👍 0 🔁 0 💬 0 📌 0

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval Modern e-commerce search is inherently multimodal: customers make purchase decisions by jointly considering product text and visual informations. However, most industrial retrieval and ranking systems...

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval

Target presents a multimodal two-tower retrieval framework that fuses product text & image signals with a mixture-of-modality-experts architecture with bilinear interaction.

📝 arxiv.org/abs/2603.04836

06.03.2026 03:57 👍 0 🔁 0 💬 0 📌 0

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs). To enhance trust, natural language claims from diverse sources, including human-written text,...

Leveraging LLM Parametric Knowledge for Fact Checking Without Retrieval

Introduces a retrieval-free fact-checking method that exploits interactions between LLMs' internal layer representations to verify claim factuality.

📝 arxiv.org/abs/2603.05471
🤗 huggingface.co/collections/...

06.03.2026 03:56 👍 0 🔁 0 💬 0 📌 0

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further ...

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

Tencent introduces a self-evolving search agent that improves RAG-based question answering through memory purification, atomic query generation, and dense reinforcement learning rewards.

📝 arxiv.org/abs/2603.03293

05.03.2026 05:28 👍 0 🔁 0 💬 0 📌 0

PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but l...

PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents

Presents a plug-and-play memory module for LLM agents that structures episodic experience into a knowledge-centric graph, enabling efficient retrieval across diverse tasks.

📝 arxiv.org/abs/2603.03296
👨🏽‍💻 github.com/TIMAN-group/...

05.03.2026 05:27 👍 0 🔁 0 💬 0 📌 0

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between co...

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Proposes a lightweight proxy model that handles long-term memory retrieval for LLMs, trained via RL with a task-outcome-oriented reward.

📝 arxiv.org/abs/2603.03379
👨🏽‍💻 github.com/plageon/MemS...

05.03.2026 05:25 👍 0 🔁 0 💬 0 📌 0

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems Most large-scale recommender systems follow a multi-stage cascade of retrieval, pre-ranking, ranking, and re-ranking. A key challenge at the pre-ranking stage arises from the heterogeneity of training...

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems

ByteDance presents a pre-ranking framework that resolves gradient conflicts from heterogeneous training samples.

📝 arxiv.org/abs/2603.03770
👨🏽‍💻 github.com/Toutiao-Rec/...

05.03.2026 05:24 👍 0 🔁 0 💬 0 📌 0

SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders While Transformers have achieved remarkable success in LLMs through superior scalability, their application in industrial-scale ranking models remains nascent, hindered by the challenges of high featu...

SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders

Alibaba presents a ranking model that tackles feature sparsity and low label density via local attention, query pruning, generative pre-training, and sparse MoE.

📝 arxiv.org/abs/2603.03988

05.03.2026 05:23 👍 0 🔁 0 💬 0 📌 0

Constraint-Aware Generative Re-ranking for Multi-Objective Optimization in Advertising Feeds Optimizing reranking in advertising feeds is a constrained combinatorial problem, requiring simultaneous maximization of platform revenue and preservation of user experience. Recent generative ranking...

Constraint-Aware Generative Re-ranking for Multi-Objective Optimization in Advertising Feeds

Bilibili presents a unified generative reranking framework that transforms constrained ad feed optimization into bounded autoregressive decoding.

📝 arxiv.org/abs/2603.04227

05.03.2026 05:21 👍 0 🔁 0 💬 0 📌 0

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG Retrieval-augmented generation (RAG) is a common way to ground language models in external documents and up-to-date information. Classical retrieval systems relied on lexical methods such as BM25, whi...

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG

Shows that performance gaps between BM25 and modern multimodal retrievers on multilingual and visually rich benchmarks are largely driven by OCR quality and text preprocessing.

📝 arxiv.org/abs/2603.04238

05.03.2026 05:19 👍 0 🔁 0 💬 0 📌 0

AgentIR: Reasoning-Aware Retrival for Deep Research Agents Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, De...

AgentIR: Reasoning-Aware Retrival for Deep Research Agents

Jointly embeds an AI agent's reasoning traces alongside its queries and presents a data synthesis method to train retrievers specifically for Deep Research agents.

📝 arxiv.org/abs/2603.04384
👨🏽‍💻 texttron.github.io/AgentIR/

05.03.2026 05:18 👍 0 🔁 0 💬 0 📌 0

SOLAR: SVD-Optimized Lifelong Attention for Recommendation Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-conte...

SOLAR: SVD-Optimized Lifelong Attention for Recommendation

Kuaishou introduces a lossless low-rank attention mechanism that reduces complexity from O(N²d) to O(Ndr) while preserving softmax, enabling lifelong sequence modeling at ten-thousand scale.

📝 arxiv.org/abs/2603.02561

04.03.2026 03:58 👍 0 🔁 0 💬 0 📌 0

AlphaFree: Recommendation Free from Users, IDs, and GNNs Can we design effective recommender systems free from users, IDs, and GNNs? Recommender systems are central to personalized content delivery across domains, with top-K item recommendation being a fund...

AlphaFree: Recommendation Free from Users, IDs, and GNNs

Proposes a lightweight recommendation method that eliminates user embeddings, item IDs, and GNNs by using language representations and contrastive learning.

📝 arxiv.org/abs/2603.02653
👨🏽‍💻 github.com/minseojeonn/...

04.03.2026 03:57 👍 0 🔁 0 💬 0 📌 0

APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation Generative recommendation has recently emerged as a promising paradigm in sequential recommendation. It formulates the task as an autoregressive generation process, predicting discrete tokens of the n...

APAO: Adaptive Prefix-Aware Optimization for Generative Recommendation

Addresses training-inference inconsistency in generative recommenders with prefix-level optimization that aligns training with beam search decoding.

📝 arxiv.org/abs/2603.02730
👨🏽‍💻 github.com/yuyq18/APAO

04.03.2026 03:55 👍 0 🔁 0 💬 0 📌 0

Model Editing for New Document Integration in Generative Information Retrieval Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs). Despite its promise, existing GR models exhibit poor generalization to ne...

Model Editing for New Document Integration in Generative Information Retrieval

Introduces a model editing method that efficiently adapts generative retrieval models to new documents via hybrid-label adaptive training.

📝 arxiv.org/abs/2603.02773
👨🏽‍💻 github.com/zhangzhen-re...

04.03.2026 03:53 👍 0 🔁 0 💬 0 📌 0

OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation The end-to-end generative paradigm is revolutionizing advertising recommendation systems, driving a shift from traditional cascaded architectures towards unified modeling. However, practical deploymen...

OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation

Tencent deeply integrates generation and ranking in ad recommendation through value-aware multi-task decoupling and coarse-to-fine target awareness.

📝 arxiv.org/abs/2603.02999

04.03.2026 03:51 👍 0 🔁 0 💬 0 📌 0

Reproducing and Comparing Distillation Techniques for Cross-Encoders Recent advances in Information Retrieval have established transformer-based cross-encoders as a keystone in IR. Recent studies have focused on knowledge distillation and showed that, with the right st...

Reproducing and Comparing Distillation Techniques for Cross-Encoders

Provides a benchmark of cross-encoder training strategies across 9 encoder backbones, finding that pairwise and listwise objectives consistently outperform pointwise

📝 arxiv.org/abs/2603.03010
👨🏽‍💻 github.com/xpmir/cross-...

04.03.2026 03:49 👍 0 🔁 0 💬 0 📌 0

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, tool-i...

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Introduces an Agentic RL framework that uses retrieval of off-policy step-level traces to expand exploration during training.

📝 arxiv.org/abs/2603.03078

04.03.2026 03:47 👍 0 🔁 0 💬 0 📌 0

Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation Item-side fairness is crucial for ensuring the fair exposure of long-tail items in interactive recommender systems. Existing approaches promote the exposure of long-tail items by directly incorporatin...

Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation

Proposes a hierarchical RL framework that progressively guides user preferences toward long-tail items to improve item-side fairness while maintaining user satisfaction.

📝 arxiv.org/abs/2603.03094

04.03.2026 03:46 👍 1 🔁 0 💬 0 📌 0

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent Deep-research agents are capable of executing multi-step web exploration, targeted retrieval, and sophisticated question answering. Despite their powerful capabilities, deep-research agents face two c...

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent

Baidu presents a 9K question benchmark across three difficulty levels for evaluating deep-research agents, along with an open-source RL training framework.

📝 arxiv.org/abs/2603.01152
👨🏽‍💻 github.com/Applied-Mach...

03.03.2026 07:34 👍 0 🔁 0 💬 0 📌 0

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations Harnessing the full potential of visually-rich documents requires retrieval systems that understand not just text, but intricate layouts, a core challenge in Visual Document Retrieval (VDR). The preva...

Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations

Alibaba uses document parsing to create layout-informed multi-vector representations for visual document retrieval.

📝 arxiv.org/abs/2603.01666
👨🏽‍💻 github.com/TIGER-AI-Lab...

03.03.2026 07:32 👍 0 🔁 0 💬 0 📌 0

GAM-RAG: Gain-Adaptive Memory for Evolving Retrieval in Retrieval-Augmented Generation Retrieval-Augmented Generation (RAG) grounds large language models with external evidence, but many implementations rely on pre-built indices that remain static after construction. Related queries the...

GAM-RAG: Gain-Adaptive Memory for Evolving Retrieval in Retrieval-Augmented Generation

Introduces a training-free RAG framework that accumulates retrieval experience from recurring queries using gain-adaptive memory updates.

📝 arxiv.org/abs/2603.01783
👨🏽‍💻 anonymous.4open.science/r/GAM_RAG-2EF6

03.03.2026 07:31 👍 0 🔁 0 💬 0 📌 0

Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive eviden...

Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research

Introduces benchmark of 300 expert-written questions requiring 100+ retrieval steps and 1,000+ web pages.

📝 arxiv.org/abs/2603.00582
👨🏽‍💻 cnsdqd-dyb.github.io/Super-Resear...

03.03.2026 07:29 👍 0 🔁 0 💬 0 📌 0

MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains With the increasing demand for step-wise, cross-modal, and knowledge-grounded reasoning, multimodal large language models (MLLMs) are evolving beyond the traditional fixed retrieve-then-generate parad...

MC-SEARCH: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains

Introduces a benchmark for agentic multimodal RAG with long, step-wise annotated reasoning chains.

📝 arxiv.org/abs/2603.00873
👨🏽‍💻 mc-search-project.github.io

03.03.2026 07:28 👍 0 🔁 0 💬 0 📌 0

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents Effective memory management is essential for large language model (LLM) agents handling long-term interactions. Current memory frameworks typically treat agents as passive "recorders" and retrieve inf...

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Alibaba transforms dialogue history into a causal and semantic knowledge graph, enabling LLM agents to reason over past interactions.

📝 arxiv.org/abs/2603.00026
👨🏽‍💻 github.com/nju-websoft/...

03.03.2026 07:27 👍 0 🔁 0 💬 0 📌 0

MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation Recommender systems (RecSys) are increasingly emphasizing scaling, leveraging larger architectures and more interaction data to improve personalization. Yet, despite the optimizer's pivotal role in tr...

MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation

Introduces the Muon optimizer to recommender system training, reducing converged training steps by 32% while improving ranking quality.

📝 arxiv.org/abs/2603.00416
👨🏽‍💻 anonymous.4open.science/r/MuonRec-E447

03.03.2026 07:24 👍 0 🔁 0 💬 0 📌 0

Sumit

Latest posts by Sumit @reachsumit.com