Cross-encoder
A reranking model that scores query-document pairs jointly rather than encoding them independently. Used as an optional second-pass step in the memory gateway to improve precision after an initial BM25 and k-NN retrieval pass returns a candidate set.
A cross-encoder takes a query and a candidate document and scores their relevance jointly - allowing the model to attend to interactions between the two that a bi-encoder (which produces independent embeddings) cannot capture. In the memory gateway's retrieval pipeline, this appears as a two-stage process: a fast first pass using BM25 and k-NN retrieval produces a candidate set (typically the top 50-100 results), then the cross-encoder re-scores each candidate against the query and reorders the list. The result is higher precision in the top few results at the cost of additional latency. The memory gateway makes cross-encoder reranking configurable: it adds value when retrieval precision matters more than speed, but can be disabled when working memory is under budget pressure and must be populated quickly.