Why is Rabtriever faster than a cross-encoder reranker?

Cross-encoders process each query-document pair together, creating quadratic complexity in document length. Rabtriever encodes queries and documents independently (dual-encoder), reducing complexity to linear. The student learns to approximate the teacher's cross-encoder reasoning without the computational overhead of joint encoding.

What is JEPA and how does it help distillation?

JEPA (Joint-Embedding Predictive Architecture) inserts a lightweight, trainable predictor between frozen LLM layers. It projects the query embedding into a hidden space where it matches the teacher's embedding. This design allows the student to learn contextual awareness from the teacher without unfreezing expensive LLM parameters.

Does Rabtriever work on standard retrieval benchmarks?

Yes. Rabtriever generalizes to traditional benchmarks like MS MARCO and BEIR with performance comparable to strong dual-encoder baselines. It was also tested on rationale-specific tasks (empathetic conversation, robotic manipulation), showing broad applicability beyond cross-encoder use cases.

← Content

AI · 4 min read · April 28, 2026

Efficient Rationale Retrieval via Student-Teacher Distillation

Rabtriever reduces computational cost of LLM-based document ranking by distilling cross-encoder knowledge into independent query-document encoders.

Source: arxiv/cs.LG · Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji · open original ↗ ↗

Share: X LinkedIn

Rabtriever distills expensive cross-encoder rerankers into efficient dual-encoder retrievers using JEPA, cutting complexity from quadratic to linear.

— Traditional rationale-based retrieval requires cross-encoding query-document pairs, creating high computational overhead.
— Rabtriever trains a generative reranker as teacher, then distills its contextual knowledge into a student dual-encoder.
— JEPA framework inserts a lightweight predictor between frozen LLM layers to project query embeddings into teacher-aligned space.
— Auxiliary reverse-KL loss on logits improves on-policy sampling efficiency during distillation.
— Reduces document-length complexity from quadratic to linear while maintaining comparable relevance judgments.
— Tested on rationale tasks (empathetic conversation, robotic manipulation) and standard benchmarks (MS MARCO, BEIR).
— Student model generalizes across diverse retrieval domains with minor accuracy loss versus teacher.

Frequently asked

Cross-encoders process each query-document pair together, creating quadratic complexity in document length. Rabtriever encodes queries and documents independently (dual-encoder), reducing complexity to linear. The student learns to approximate the teacher's cross-encoder reasoning without the computational overhead of joint encoding.

#retrieval #distillation #llm #ranking #efficiency

Efficient Rationale Retrieval via Student-Teacher Distillation

Frequently asked

Synthetic Computers Enable Agent Training at Scale

ActiNet: Self-Supervised Model Improves Wrist Activity Classification

Mixed Precision Training Stabilizes Neural ODEs