Blogs Infinispan 15 indexing & query news

Infinispan 15 indexing & query news

June 10, 2024 Tags: search vector score knn indexing embeddings

By Fabio Massimo Ercoli

A short while back we released Infinispan 15 which delivered many improvements to the query API. This blog is an in-depth dive into some of these:

Rest queries with projections: more projection types are supported using the REST API.
Query cache API: regular and continuous queries can be defined directly from the cache API.
Filter elements for kNN queries: it is now possible to filter out the set of entities on which to apply a kNN-vector search.
Index by keys - query by keys: indexes can be defined also on the keys, not only on the values, so that we can query also the keys values.

Rest queries with projections

Recently we introduced score and version projections, which are added to the already available entity and field projections.

The same projections are now available using the REST query API.

Following an example of entity, query and result:

@Proto
@Indexed(index = "play")
public record Game(
   @Keyword(projectable = true, sortable = true)
   String name,
   @Text @ProtoField(2)
   String description) {
}

select g, g.description, version(g), score(g) from Game g where g.description : 'bla3'

{
    "hit_count": 1,
    "hit_count_exact": true,
    "hits": [
        {
            "hit": {
                "version()": 7,
                "description": "bla bla3",
                "*": {
                    "_type": "Game",
                    "name": "bla3",
                    "description": "bla bla3"
                },
                "score()": 0.90565
            }
        }
    ]
}

In this case we have requested the version projection using the projection function version(g) and receiving the corresponding result in the version() attribute of the returned hit. Similarly, a score() projection is produced. The special attribute * corresponds to the entity projection result, in which the special field _type corresponds to the type of the entity.

Query cache API

With Infinispan 15, both embedded and remote caches can be queried with the same API method cache#query. The use of the search factory to access the query APIs is no longer required and it is now deprecated. Here is an example:

Query<Person> query = myCache.query("FROM space.Person WHERE name = 'user1' AND age > 20");

From this point forward we will use the usual API to configure and run the query. For instance:

query.startOffset(10);
query.maxResults(10);
QueryResult<Person> = query.execute();

Similarly, it is possible to get a continuous query instance (both for remote and embedded cache), using the method cache#continuousQuery:

ContinuousQuery<Integer, Person> continuousQuery = myCache.continuousQuery();

From which as usual it will be possible to define the continuous callback:

continuousQuery.addContinuousQueryListener(query, new ContinuousQueryListener<>() {

      @Override
      public void resultJoining(Integer key, Object value) {
         // handle entity creations
      }
      @Override
      public void resultUpdated(Integer key, Object value) {
         // handle entry updates
      }
      @Override
      public void resultLeaving(Integer key) {
         // handle entry leavings
      }
   });

kNN queries can be run filtering the population on which to apply the search. A kNN filter is defined using any kind of predicate (included boolean expressions) provided by the Infinispan query language.

For instance, let’s consider the following entity:

@Proto
@Indexed
public record Item(
   @Keyword
   String code,
   @Vector(dimension = 3)
   float[] floatVector,
   @Text
   String description) {
}

Suppose that we want to limit the vector search only to record with the word cat in the description. We can do like this:

Query<Object[]> query = remoteCache.query(
      "select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering i.description : 'cat'");
query.setParameter("a", new float[]{7.0f, 7.0f, 7.0f});
query.setParameter("k", 3);

List<Object[]> hits = query.list();

This is example shows a combination of full text search and vector search.

Boolean composite predicates are also supported. In the following we will limit the search only to the items having the term cat in their description and having code w739.

Query<Object[]> query = remoteCache.query(
      "select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering (i.description : 'cat' or i.code : 'w739')");
query.setParameter("a", new float[]{7.0f, 7.0f, 7.0f});
query.setParameter("k", 3);

List<Object[]> hits = query.list();

Index by keys - query by keys

In case of complex keys, e.g., keys that are entities themselves, it is now possible to define indexes on the keys as well.

Once this is done, we will be able to run queries targeting fields from both keys and values, on both projections and selections.

As an example let’s consider a cache having keys of the type PlaceKey and values of type Place. A possible indexing mapping to enable the index by the keys is the following:

@Proto
@Indexed
public record PlaceKey(
   @Basic(projectable = true, sortable = true)
   Integer row,
   @Basic(projectable = true, sortable = true)
   Integer column) {
}

@Proto
@Indexed(keyEntity = "model.PlaceKey")
public record Place(
   @Basic
   String code,
   @Text
   String description) {

   @ProtoSchema(includeClasses = {Place.class, PlaceKey.class}, schemaPackageName = "model")
   public interface PlaceSchema extends GeneratedSchema {
      PlaceSchema INSTANCE = new PlaceSchemaImpl();
   }
}

Notice that the type of the key must be declared in the main entity definition using the keyEntity attribute of the @Indexing annotation.

After that it is possible to search for all the cache entries having the field column in their keys equals to 77 and containing the term cat in the field description of their values, projecting the field row of the keys and the field code of the value, using for instance the following query:

RemoteCache<PlaceKey, Place> cache = remoteCacheManager.getCache();
Query<Object[]> query = cache.query("select p.key.row, p.code from model.Place p where p.key.column = 77 and p.description : 'cat'");
List<Object[]> list = query.list();

Get it, Use it, Ask us!

We’re hard at work on new features, improvements and fixes, so watch this space for more announcements!

Please, download and test the latest release.

The source code is hosted on GitHub. If you need to report a bug or request a new feature, look for a similar one on our GitHub issues tracker. If you don’t find any, create a new issue.

If you have questions, are experiencing a bug or want advice on using Infinispan, you can use GitHub discussions. We will do our best to answer you as soon as we can.

The Infinispan community uses Zulip for real-time communications. Join us using either a web-browser or a dedicated application on the Infinispan chat.

Fabio Massimo Ercoli

Member of the Infinispan @core team at Red Hat. Accountable for indexing, query, serialization, transcoding, metrics and tracing for the hybrid cloud. Former Hibernate @core team, working on NoSql & searching. Former Red Hat consultant, working on intense data applications and solutions. An open source enthusiast and continuous learner.