Blogs Infinispan kNN Vector Search

Infinispan kNN Vector Search

With Infinispan 15.0.0.Dev06, we have started to expose vector search capabilities using Infinispan’s indexed queries. Using the newly introduced kNN predicate, it is possible to find and order results by the k nearest neighbors of a given vector.

Mapping the embeddings

The new @Vector indexing annotation is used to mark a field as an embedding. Embeddings are vector representations of data, according to a defined model.

The vector dimension is mandatory and should be defined at mapping time. Other options that can be specified during mapping are:

  • the similarity (distance) function

  • the beam width

  • the maximum number of connections.

Bear in mind that these values affect the performance of the approximation algorithm that is used to compute the kNN search.

We support byte[] embeddings. Here is an example of mapping:

@Vector(dimension = 3)
public byte[] getByteVector() {
   return byteVector;

That corresponds to the Proto schema:

 * @Vector(dimension=3)
optional bytes byteVector = 2;

We also support float[] embeddings. Here is an example of mapping:

@Vector(dimension = 3)
public float[] getFloatVector() {
   return floatVector;

That corresponds to the Proto schema:

 * @Vector(dimension=3)
repeated float floatVector = 3;

Searching the embeddings

The following query shows how to perform a kNN search using a supplied vector and a specific distance

from Item i where i.byteVector <-> [7,7,7]~3

The query can be parameterized in several ways:

query = cache.query("from org.infinispan.query.model.Item i where i.byteVector <-> [:a,:b,:c]~3");
query.setParameter("a", 0);
query.setParameter("b", 2);
query.setParameter("c", 3);
hits = query.list();
assertThat(hits).extracting("code").containsExactly("c2", "c1", "c3"); // the order matters

Or you can pass the entire vector as a single parameter:

query = cache.query("from org.infinispan.query.model.Item i where i.floatVector <-> [:a]~:b");
query.setParameter("a", new float[]{7.1f, 7.0f, 3.1f});
query.setParameter("b", 3);
hits = query.list();
assertThat(hits).extracting("code").containsExactly("c5", "c6", "c4");

If the cache is distributed, the query will be a broadcast query, and it will aggregate all the results from all the nodes that contain shards of the indexes that are related to the search. When we get the result as usual we get all the metadata from the corresponding entities, so that the returning items can easily relate to the application domain.

Get it, Use it, Ask us!

We’re hard at work on new features, improvements and fixes, so watch this space for more announcements!

Please, download and test the latest release.

The source code is hosted on GitHub. If you need to report a bug or request a new feature, look for a similar one on our JIRA issues tracker. If you don’t find any, create a new issue.

If you have questions, are experiencing a bug or want advice on using Infinispan, you can use GitHub discussions. We will do our best to answer you as soon as we can.

The Infinispan community uses Zulip for real-time communications. Join us using either a web-browser or a dedicated application on the Infinispan chat.

Fabio Massimo Ercoli

Member of the Infinispan @core team at Red Hat. Accountable for indexing, query, serialization, transcoding, metrics and tracing for the hybrid cloud. Former Hibernate @core team, working on NoSql & searching. Former Red Hat consultant, working on intense data applications and solutions. An open source enthusiast and continuous learner.