Infinispan 15 indexing & query news
A short while back we released Infinispan 15 which delivered many improvements to the query API. This blog is an in-depth dive into some of these:
-
Rest queries with projections: more projection types are supported using the REST API.
-
Query cache API: regular and continuous queries can be defined directly from the cache API.
-
Filter elements for kNN queries: it is now possible to filter out the set of entities on which to apply a kNN-vector search.
-
Index by keys - query by keys: indexes can be defined also on the keys, not only on the values, so that we can query also the keys values.
Rest queries with projections
Recently we introduced score and version projections, which are added to the already available entity and field projections.
The same projections are now available using the REST query API.
Following an example of entity, query and result:
@Proto
@Indexed(index = "play")
public record Game(
@Keyword(projectable = true, sortable = true)
String name,
@Text @ProtoField(2)
String description) {
}
select g, g.description, version(g), score(g) from Game g where g.description : 'bla3'
{
"hit_count": 1,
"hit_count_exact": true,
"hits": [
{
"hit": {
"version()": 7,
"description": "bla bla3",
"*": {
"_type": "Game",
"name": "bla3",
"description": "bla bla3"
},
"score()": 0.90565
}
}
]
}
In this case we have requested the version projection using the projection function version(g)
and receiving the corresponding result in the version()
attribute of the returned hit
.
Similarly, a score()
projection is produced.
The special attribute *
corresponds to the entity projection result, in which the special field _type
corresponds to the type of the entity.
Query cache API
With Infinispan 15, both embedded and remote caches can be queried with the same API method cache#query
.
The use of the search factory to access the query APIs is no longer required and it is now deprecated.
Here is an example:
Query<Person> query = myCache.query("FROM space.Person WHERE name = 'user1' AND age > 20");
From this point forward we will use the usual API to configure and run the query. For instance:
query.startOffset(10);
query.maxResults(10);
QueryResult<Person> = query.execute();
Similarly, it is possible to get a continuous query instance (both for remote and embedded cache),
using the method cache#continuousQuery
:
ContinuousQuery<Integer, Person> continuousQuery = myCache.continuousQuery();
From which as usual it will be possible to define the continuous callback:
continuousQuery.addContinuousQueryListener(query, new ContinuousQueryListener<>() {
@Override
public void resultJoining(Integer key, Object value) {
// handle entity creations
}
@Override
public void resultUpdated(Integer key, Object value) {
// handle entry updates
}
@Override
public void resultLeaving(Integer key) {
// handle entry leavings
}
});
Filter elements for kNN queries
kNN queries can be run filtering the population on which to apply the search. A kNN filter is defined using any kind of predicate (included boolean expressions) provided by the Infinispan query language.
For instance, let’s consider the following entity:
@Proto
@Indexed
public record Item(
@Keyword
String code,
@Vector(dimension = 3)
float[] floatVector,
@Text
String description) {
}
Suppose that we want to limit the vector search only to record with the word cat
in the description. We can do like this:
Query<Object[]> query = remoteCache.query(
"select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering i.description : 'cat'");
query.setParameter("a", new float[]{7.0f, 7.0f, 7.0f});
query.setParameter("k", 3);
List<Object[]> hits = query.list();
This is example shows a combination of full text search and vector search.
Boolean composite predicates are also supported.
In the following we will limit the search only to the items having the term cat
in their description and having code w739
.
Query<Object[]> query = remoteCache.query(
"select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering (i.description : 'cat' or i.code : 'w739')");
query.setParameter("a", new float[]{7.0f, 7.0f, 7.0f});
query.setParameter("k", 3);
List<Object[]> hits = query.list();
Index by keys - query by keys
In case of complex keys, e.g., keys that are entities themselves, it is now possible to define indexes on the keys as well.
Once this is done, we will be able to run queries targeting fields from both keys and values, on both projections and selections.
As an example let’s consider a cache having keys of the type PlaceKey
and values of type Place
.
A possible indexing mapping to enable the index by the keys is the following:
@Proto
@Indexed
public record PlaceKey(
@Basic(projectable = true, sortable = true)
Integer row,
@Basic(projectable = true, sortable = true)
Integer column) {
}
@Proto
@Indexed(keyEntity = "model.PlaceKey")
public record Place(
@Basic
String code,
@Text
String description) {
@ProtoSchema(includeClasses = {Place.class, PlaceKey.class}, schemaPackageName = "model")
public interface PlaceSchema extends GeneratedSchema {
PlaceSchema INSTANCE = new PlaceSchemaImpl();
}
}
Notice that the type of the key must be declared in the main entity definition using the keyEntity
attribute of
the @Indexing
annotation.
After that it is possible to search for all the cache entries having the field column
in their keys equals to 77
and
containing the term cat
in the field description
of their values, projecting the field row
of the keys and the field code
of the value, using for instance the following query:
RemoteCache<PlaceKey, Place> cache = remoteCacheManager.getCache();
Query<Object[]> query = cache.query("select p.key.row, p.code from model.Place p where p.key.column = 77 and p.description : 'cat'");
List<Object[]> list = query.list();
Get it, Use it, Ask us!
We’re hard at work on new features, improvements and fixes, so watch this space for more announcements!Please, download and test the latest release.
The source code is hosted on GitHub. If you need to report a bug or request a new feature, look for a similar one on our JIRA issues tracker. If you don’t find any, create a new issue.
If you have questions, are experiencing a bug or want advice on using Infinispan, you can use GitHub discussions. We will do our best to answer you as soon as we can.
The Infinispan community uses Zulip for real-time communications. Join us using either a web-browser or a dedicated application on the Infinispan chat.
Fabio Massimo Ercoli
Member of the Infinispan @core team at Red Hat. Accountable for indexing, query, serialization, transcoding, metrics and tracing for the hybrid cloud. Former Hibernate @core team, working on NoSql & searching. Former Red Hat consultant, working on intense data applications and solutions. An open source enthusiast and continuous learner.