Learned sparse retrieval

Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents. It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE and its successor SPLADE v2. Others include DeepCT, uniCOIL, EPIC, DeepImpact, TILDE and TILDEv2, Sparta, SPLADE-max, and DistilSPLADE-max.

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license. But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.