Use the Ickle query language with Infinispan caches to efficiently and quickly gain real-time insights into your data. Learn how to configure indexing and perform queries on remote and embedded caches.
1. Indexing Infinispan caches
Infinispan can create indexes of values in your caches to improve query performance, providing faster results than non-indexed queries. Indexing also lets you use full-text search capabilities in your queries.
Infinispan uses Apache Lucene technology to index values in caches. |
1.1. Configuring Infinispan to index caches
Enable indexing in your cache configuration and specify which entities Infinispan should include when creating indexes.
You should always configure Infinispan to index caches when using queries. Indexing provides a significant performance boost to your queries, allowing you to get faster insights into your data.
-
Enable indexing in your cache configuration.
<distributed-cache> <indexing> <!-- Indexing configuration goes here. --> </indexing> </distributed-cache>
Adding an
indexing
element to your configuration enables indexing without the need to include theenabled=true
attribute.For remote caches adding this element also implicitly configures encoding as ProtoStream.
-
Specify the entities to index with the
indexed-entity
element.<distributed-cache> <indexing> <indexed-entities> <indexed-entity>...</indexed-entity> </indexed-entities> </indexing> </distributed-cache>
Protobuf messages
-
Specify the message declared in the schema as the value of the
indexed-entity
element, for example:<distributed-cache> <indexing> <indexed-entities> <indexed-entity>org.infinispan.sample.Car</indexed-entity> <indexed-entity>org.infinispan.sample.Truck</indexed-entity> </indexed-entities> </indexing> </distributed-cache>
This configuration indexes the
Book
message in a schema with thebook_sample
package name.package book_sample; /* @Indexed */ message Book { /* @Text(projectable = true) */ optional string title = 1; /* @Text(projectable = true) */ optional string description = 2; // no native Date type available in Protobuf optional int32 publicationYear = 3; repeated Author authors = 4; } message Author { optional string name = 1; optional string surname = 2; }
Java objects
-
Specify the fully qualified name (FQN) of each class that includes the
@Indexed
annotation.
<distributed-cache>
<indexing>
<indexed-entities>
<indexed-entity>book_sample.Book</indexed-entity>
</indexed-entities>
</indexing>
</distributed-cache>
import org.infinispan.configuration.cache.*;
ConfigurationBuilder config=new ConfigurationBuilder();
config.indexing().enable().storage(FILESYSTEM).path("/some/folder").addIndexedEntity(Book.class);
1.1.1. Index configuration
Infinispan configuration controls how indexes are stored and constructed.
Index storage
You can configure how Infinispan stores indexes:
-
On the host file system, which is the default and persists indexes between restarts.
-
In JVM heap memory, which means that indexes do not survive restarts.
You should store indexes in JVM heap memory only for small datasets.
<distributed-cache>
<indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
<!-- Indexing configuration goes here. -->
</indexing>
</distributed-cache>
<distributed-cache>
<indexing storage="local-heap">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Index path
Specifies a filesystem path for the index when storage is 'filesystem'. The value can be a relative or absolute path. Relative paths are created relative to the configured global persistent location, or to the current working directory when global state is disabled.
By default, the cache name is used as a relative path for index path.
When setting a custom value, ensure that there are no conflicts between caches using the same indexed entities. |
Index startup mode
When Infinispan starts caches it can perform operations to ensure the index is consistent with data in the cache. By default it:
-
Automatically clear (purge) or reindex the cache.
-
If data is volatile and the index is persistent then Infinispan performs the clear (purge) the indexes when it starts.
-
If data is persistent and the index is volatile then Infinispan reindex the cache when it starts.
-
The purge operation is performed synchronously, since it is usually very fast. So by the time the cache finishes to start, the operation will be completed. The cache becomes available only when the purge completes. The reindex operation is performed asynchronously, since it might take a longer time to complete, depending on the size of the cache. If an indexed query is performed during the reindex the result could be partial. It is always possible to check if a reindex is ongoing accessing to the query statistics. |
But you can manually configure it to:
-
Purge the index when the cache starts.
-
Reindex the cache when it starts.
-
No indexing operation takes place when a cache starts
In the case of a manual configuration can lead to possible inconsistencies, a log message will be presented when the cache starts. |
<distributed-cache>
<indexing storage="filesystem" startup-mode="purge">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
a warning message will be logged when the cache is startede starts
<distributed-cache>
<indexing storage="local-heap" startup-mode="reindex">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Indexing mode
indexing-mode
controls how cache operations are propagated to the indexes.
auto
-
Infinispan immediately applies any changes to the cache to the indexes. This is the default mode.
manual
-
Infinispan updates indexes only when the reindex operation is explicitly invoked. Configure
manual
mode, for example, when you want to perform batch updates to the indexes.
Set the indexing-mode
to manual
:
<distributed-cache>
<indexing indexing-mode="manual">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Use Java Entities
If the cache is protostream-encoded and the indexes initialized from a Infinispan server instance, the indexed entities must be the indexed Protobuf messages defined on some Proto schema. It is possible to change this behavior forcing the indexes be defined on the indexed entities that are discovered from the java entities locally accessible from the server VM. Useful in case we want to run embedded queries from a server task, in the case the cache is Protobuf encoded.
<distributed-cache>
<indexing use-java-embedded-entities="true">
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Index reader
The index reader is an internal component that provides access to the indexes to perform queries. As the index content changes, Infinispan needs to refresh the reader so that search results are up to date. You can configure the refresh interval for the index reader. By default Infinispan reads the index before each query if the index changed since the last refresh.
<distributed-cache>
<indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
<!-- Sets an interval of one second for the index reader. -->
<index-reader refresh-interval="1s"/>
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Index writer
The index writer is an internal component that constructs an index composed of one or more segments (sub-indexes) that can be merged over time to improve performance. Fewer segments usually means less overhead during a query because index reader operations need to take into account all segments.
Infinispan uses Apache Lucene internally and indexes entries in two tiers: memory and storage. New entries go to the memory index first and then, when a flush happens, to the configured index storage. Periodic commit operations occur that create segments from the previously flushed data and make all the index changes permanent.
The |
<distributed-cache>
<indexing storage="filesystem" path="${java.io.tmpdir}/baseDir">
<index-writer commit-interval="2s"
low-level-trace="false"
max-buffered-entries="32"
queue-count="1"
queue-size="10000"
ram-buffer-size="400"
thread-pool-size="2">
<index-merge calibrate-by-deletes="true"
factor="3"
max-entries="2000"
min-size="10"
max-size="20"/>
</index-writer>
<!-- Additional indexing configuration goes here. -->
</indexing>
</distributed-cache>
Attribute | Description |
---|---|
|
Amount of time, in milliseconds, that index changes that are buffered in memory are flushed to the index storage and a commit is performed. Because operation is costly, small values should be avoided. The default is 1000 ms (1 second). |
|
Maximum number of entries that can be buffered in-memory before they are flushed to the index storage. Large values result in faster indexing but use more memory. When used in combination with the |
|
Maximum amount of memory that can be used for buffering added entries and deletions before they are flushed to the index storage. Large values result in faster indexing but use more memory. For faster indexing performance you should set this attribute instead of |
|
This configuration is ignored since Infinispan 15.0. The indexing engine now uses the Infinispan thread pools. |
|
Default 4. Number of internal queues to use for each indexed type. Each queue holds a batch of modifications that is applied to the index and queues are processed in parallel. Increasing the number of queues will lead to an increase of indexing throughput, but only if the bottleneck is CPU. |
|
Default 4000. Maximum number of elements each queue can hold. Increasing the |
|
Enables low-level trace information for indexing operations. Enabling this attribute substantially degrades performance. You should use this low-level tracing only as a last resource for troubleshooting. |
To configure how Infinispan merges index segments, you use the index-merge
sub-element.
Attribute | Description |
---|---|
|
Maximum number of entries that an index segment can have before merging. Segments with more than this number of entries are not merged. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often. |
|
Number of segments that are merged at once. With smaller values, merging happens more often, which uses more resources, but the total number of segments will be lower on average, increasing search performance. Larger values (greater than 10) are best for heavy writing scenarios. |
|
Minimum target size of segments, in MB, for background merges. Segments smaller than this size are merged more aggressively. Setting a value that is too large might result in expensive merge operations, even though they are less frequent. |
|
Maximum size of segments, in MB, for background merges. Segments larger than this size are never merged in the background. Settings this to a lower value helps reduce memory requirements and avoids some merging operations at the cost of optimal search speed. This attribute is ignored when forcefully merging an index and |
|
Maximum size of segments, in MB, for forced merges and overrides the |
|
Whether the number of deleted entries in an index should be taken into account when counting the entries in the segment. Setting |
Index sharding
When you have a large amount of data, you can configure Infinispan to split index data into multiple indexes called shards. Enabling data distribution among shards improves performance. By default, sharding is disabled.
Use the shards
attribute to configure the number of indexes.
The number of shards must be greater then 1
.
<distributed-cache>
<indexing>
<index-sharding shards="6" />
</indexing>
</distributed-cache>
1.2. Infinispan native indexing annotations
When you enable indexing in caches, you configure Infinispan to create indexes. You also need to provide Infinispan with a structured representation of the entities in your caches so it can actually index them.
Overview of the Infinispan indexing annotations
- @Indexed
-
Indicates entities, or Protobuf message types, that Infinispan indexes.
To indicate the fields that Infinispan indexes use the indexing annotations. You can use these annotations the same way for both embedded and remote queries.
- @Basic
-
Supports any type of field. Use the
@Basic
annotation for numbers and short strings that don’t require any transformation or processing. - @Decimal
-
Use this annotation for fields that represent decimal values.
- @Keyword
-
Use this annotation for fields that are strings and intended for exact matching. Keyword fields are not analyzed or tokenized during indexing.
- @Text
-
Use this annotation for fields that contain textual data and are intended for full-text search capabilities. You can use the analyzer to process the text and to generate individual tokens.
- @Vector
-
Use this annotation to mark vector fields representing embeddings, on which can be defined kNN-predicates.
- @Embedded
-
Use this annotation to mark a field as an embedded object within the parent entity. The
NESTED
structure preserves the original object relationship structure while theFLATTENED
structure makes the leaf fields multivalued of the parent entity. The default structure used by@Embedded
isNESTED
.
Each of the annotations supports a set of attributes that you can use to further describe how the entity is indexed.
Annotation | Supported attributes |
---|---|
@Basic |
searchable, sortable, projectable, aggregable, indexNullAs |
@Decimal |
searchable, sortable, projectable, aggregable, indexNullAs, decimalScale |
@Keyword |
searchable, sortable, projectable, aggregable, indexNullAs, normalizer, norms |
@Text |
searchable, projectable, norms, analyzer, searchAnalyzer, termVector |
@Vector |
searchable, projectable, dimension, similarity, beamWidth, maxConnections |
Using Infinispan annotations
You can provide Infinispan with indexing annotations in two ways:
-
Annotate your Java classes or fields directly using the Infinispan annotations.
You then generate or update your Protobuf schema,.proto
files, before uploading them to Infinispan Server. -
Annotate Protobuf schema directly with
@Indexed
and@Basic
,@Keyword
or@Text
.
You then upload your Protobuf schema to Infinispan Server.For example, the following schema uses the
@Text
annotation:/** * @Text(projectable = true) */ required string street = 1;
1.3. Rebuilding indexes
Rebuilding an index reconstructs it from the data stored in the cache. You should rebuild indexes when you change things like the definitions of indexed types or analyzers. Likewise, you can rebuild indexes after you delete them for whatever reason.
Rebuilding indexes can take a long time to complete because the process takes place for all data in the grid. While the rebuild operation is in progress, queries might also return fewer results. |
Rebuild indexes in one of the following ways:
-
Call the
reindexCache()
method to programmatically rebuild an index from a Hot Rod Java client:remoteCacheManager.administration().reindexCache("MyCache");
For remote caches you can also rebuild indexes from Infinispan Console.
-
Call the
index.run()
method to rebuild indexes for embedded caches as follows:Indexer indexer = Search.getIndexer(cache); CompletionStage<Void> future = index.run();
-
Check the status of reindexing operation with the
reindexing
attribute of the index statistics.
-
1.4. Updating index schema
The update index schema operation lets you add schema changes with a minimal downtime. Instead of removing previously indexed data and recreating the index schema, Infinispan adds new fields to the existing schema. Updating index schema is much faster than rebuilding the index but you can update schema only when your changes do not affect fields that were already indexed.
You can update index schema only when your changes does not affect previously indexed fields. When you change index field definitions or when you delete fields, you must rebuild the index. |
-
Update index schema for a given cache:
-
Call the
updateIndexSchema()
method to programmatically update the index schema from a Hot Rod Java client:remoteCacheManager.administration().updateIndexSchema("MyCache");
For remote caches, you can update index schema from the Infinispan Console or using the REST API.
-
1.5. Non-indexed queries
Infinispan recommends indexing caches for the best performance for queries. However you can query caches that are non-indexed.
-
For embedded caches, you can perform non-indexed queries on Plain Old Java Objects (POJOs).
-
For remote caches, you must use ProtoStream encoding with the
application/x-protostream
media type to perform non-indexed queries.
2. Creating Ickle queries
Infinispan provides an Ickle query language that lets you create relational and full-text queries.
2.1. Ickle queries
To use the API, call the cache .query()
method and provide the query string.
For instance:
// Remote Query using protobuf
Query<Transaction> q = remoteCache.query("from sample_bank_account.Transaction where amount > 20");
// Embedded Query using Java Objects
Query<Book> q = cache.query("from org.infinispan.sample.Book where price > 20");
// Execute the query
QueryResult<Book> queryResult = q.execute();
A query will always target a single entity type and is evaluated over the contents of a single cache. Running a query over multiple caches or creating queries that target several entity types (joins) is not supported. |
Executing the query and fetching the results is as simple as invoking the execute()
method of the Query
object. Once
executed, calling execute()
on the same instance will re-execute the query.
2.1.1. Pagination
You can limit the number of returned results by using the Query.maxResults(int maxResults)
. This can be used in
conjunction with Query.startOffset(long startOffset)
to achieve pagination of the result set.
// sorted by year and match all books that have "clustering" in their title
// and return the third page of 10 results
Query<Book> query = cache.query("FROM org.infinispan.sample.Book WHERE title like '%clustering%' ORDER BY year").startOffset(20).maxResults(10)
If you don’t explicitly set the |
2.1.2. Number of hits
The QueryResult
object includes the .hitCount()
method, which returns a hit count value that represents the total number of results from a query, regardless of any pagination parameter.
Additionally, QueryResult
object contains a boolean value returned by the .isExact()
method which indicates whether the hit count number is exact or a lower bound.
The hit count is only available for indexed queries for performance reasons.
Hit count accuracy
You can limit the required accuracy of hit counts by setting hit-count-accuracy
attribute.
When dealing with large data sets, precise hit counts can impact performance.
Setting a limit to the hit count accuracy, lets you achieve faster query responses while ensuring that the provided hit counts remain sufficiently accurate for your application’s needs.
The default accuracy of the hit-count-accuracy
attribute is limited to 10000
.
This means that for any query, Infinispan provides exact hit count up to maximum of 10,000.
If the effective hit count is higher than 10,000, Infinispan returns a lower bound estimate of the count.
You can change the default limit by setting the query.hit-count-accuracy
cache property.
Alternatively, it can be set on each query instance.
When the actual hit count exceeds the limit set by the hit-count-accuracy
, the .isExact()
method or the hit_count_exact
JSON field will be false
, indicating that the returned hit count is an estimation.
Setting this value to Integer.MAX
would return accurate results for any query, but this can severely impact query performance.
For optimal performance set the property value slightly above the expected hit count. If you do not require accurate hit counts, set it to a low value.
2.1.3. Iteration
The Query
object has the .iterator()
method to obtain the results lazily. It returns an instance of CloseableIterator
that must be closed after usage.
The iteration support for Remote Queries is currently limited, as it will first fetch all entries to the client before iterating. |
2.1.4. Named query parameters
Instead of building a new Query object for every execution it is possible to include named parameters in the query which
can be substituted with actual values before execution. This allows a query to be defined once and be efficiently
executed many times. Parameters can only be used on the right-hand side of an operator and are defined when the query is
created by supplying an object produced by the org.infinispan.query.dsl.Expression.param(String paramName)
method to
the operator instead of the usual constant value. Once the parameters have been defined they can be set by invoking either
Query.setParameter(parameterName, value)
or Query.setParameters(parameterMap)
as shown in the examples below.
// Defining a query to search for various authors and publication years
Query<Book> query = cache.query("SELECT title FROM org.infinispan.sample.Book WHERE author = :authorName AND publicationYear = :publicationYear").build();
// Set actual parameter values
query.setParameter("authorName", "Doe");
query.setParameter("publicationYear", 2010);
// Execute the query
List<Book> found = query.execute().list();
Alternatively, you can supply a map of actual parameter values to set multiple parameters at once:
Map<String, Object> parameterMap = new HashMap<>();
parameterMap.put("authorName", "Doe");
parameterMap.put("publicationYear", 2010);
query.setParameters(parameterMap);
A significant portion of the query parsing, validation and execution planning effort is performed during the first execution of a query with parameters. This effort is not repeated during subsequent executions leading to better performance compared to a similar query using constant values instead of query parameters. |
2.1.5. Query execution
The Query
API provides two methods for executing Ickle queries on a cache:
-
Query.execute()
runs a SELECT statement and returns a result. -
Query.executeStatement()
runs a DELETE statement and modifies data.
You should always invoke |
2.2. Ickle query language syntax
The Ickle query language is subset of the JPQL query language, with some extensions for full-text.
The parser syntax has some notable rules:
-
Whitespace is not significant.
-
Wildcards are not supported in field names.
-
A field name or path must always be specified, as there is no default field.
-
&&
and||
are accepted instead ofAND
orOR
in both full-text and JPA predicates. -
!
may be used instead ofNOT
. -
A missing boolean operator is interpreted as
OR
. -
String terms must be enclosed with either single or double quotes.
-
Fuzziness and boosting are not accepted in arbitrary order; fuzziness always comes first.
-
!=
is accepted instead of<>
. -
Boosting cannot be applied to
>
,>=
,<
,<=
operators. Ranges may be used to achieve the same result.
2.2.1. Filtering operators
Ickle support many filtering operators that can be used for both indexed and non-indexed fields.
Operator | Description | Example |
---|---|---|
|
Checks that the left operand is equal to one of the elements from the Collection of values given as argument. |
|
|
Checks that the left argument (which is expected to be a String) matches a wildcard pattern that follows the JPA rules. |
|
|
Checks that the left argument is an exact match of the given value. |
|
|
Checks that the left argument is different from the given value. |
|
|
Checks that the left argument is greater than the given value. |
|
|
Checks that the left argument is greater than or equal to the given value. |
|
|
Checks that the left argument is less than the given value. |
|
|
Checks that the left argument is less than or equal to the given value. |
|
|
Checks that the left argument is between the given range limits. |
|
2.2.2. Boolean conditions
Combining multiple attribute conditions with logical conjunction (and
) and disjunction (or
) operators in order to
create more complex conditions is demonstrated in the following example. The well known operator precedence rule for
boolean operators applies here, so the order of the operators is irrelevant. Here and
operator still has higher priority than or
even though or
was invoked first.
# match all books that have "Data Grid" in their title
# or have an author named "Manik" and their description contains "clustering"
FROM org.infinispan.sample.Book WHERE title LIKE '%Data Grid%' OR author.name = 'Manik' AND description like '%clustering%'
Boolean negation has highest precedence among logical operators and applies only to the next simple attribute condition.
# match all books that do not have "Data Grid" in their title and are authored by "Manik"
FROM org.infinispan.sample.Book WHERE title != 'Data Grid' AND author.name = 'Manik'
2.2.3. Nested conditions
Changing the precedence of logical operators is achieved with parenthesis:
# match all books that have an author named "Manik" and their title contains
# "Data Grid" or their description contains "clustering"
FROM org.infinispan.sample.Book WHERE author.name = 'Manik' AND ( title like '%Data Grid%' OR description like '% clustering%')
2.2.4. Projections with SELECT statements
In some use cases returning the whole domain object is overkill if only a small subset of the attributes are actually
used by the application, especially if the domain entity has embedded entities. The query language allows you to specify
a subset of attributes (or attribute paths) to return - the projection. If projections are used then the QueryResult.list()
will not return the whole domain entity but will return a List
of Object[]
, each slot in the array corresponding to
a projected attribute.
# match all books that have "Data Grid" in their title or description
# and return only their title and publication year
SELECT title, publicationYear FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' OR description like '%Data Grid%'
Project cache entry version
It is possible to project the cache entry version, using the version
projection function.
# return the title, publication year and the cache entry version
SELECT b.title, b.publicationYear, version(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'
Project cache entry value
It is possible to project the cache entry value together with other projections.
It can be used for instance to project the cache entry value together with the cache entry version
in the same Object[]
returned hit.
# return the cache entry value and the cache entry version
SELECT b, version(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'
Project the score
If the query is indexed, it is possible to project the score obtained by each matching together with other projections.
It can be used for instance to project the cache entry value together with the score
in the same Object[]
returned hit.
# return the cache entry value and the the score of the matching
SELECT b, score(b) FROM org.infinispan.sample.Book b WHERE b.title like '%Data Grid%'
Sorting
Ordering the results based on one or more attributes or attribute paths is done with the ORDER BY
clause. If multiple sorting criteria
are specified, then the order will dictate their precedence.
# match all books that have "Data Grid" in their title or description
# and return them sorted by the publication year and title
FROM org.infinispan.sample.Book WHERE title like '%Data Grid%' ORDER BY publicationYear DESC, title ASC
2.2.5. Grouping and aggregation
Infinispan has the ability to group query results according to a set of grouping fields and construct aggregations of the results from each group by applying an aggregation function to the set of values that fall into each group. Grouping and aggregation can only be applied to projection queries (queries with one or more field in the SELECT clause).
The supported aggregations are: avg
, sum
, count
, max
, and min
.
The set of grouping fields is specified with the GROUP BY
clause and the order used for defining grouping fields is
not relevant. All fields selected in the projection must either be grouping fields
or else they must be aggregated using one of the grouping functions described below. A projection field can be
aggregated and used for grouping at the same time. A query that selects only grouping fields but no aggregation fields
is legal.
Example: Grouping Books by author and counting them.
SELECT author, COUNT(title) FROM org.infinispan.sample.Book WHERE title LIKE '%engine%' GROUP BY author
A projection query in which all selected fields have an aggregation function applied and no fields are used for grouping is allowed. In this case the aggregations will be computed globally as if there was a single global group. |
Aggregations
You can apply the following aggregation functions to a field:
Aggregation function | Description |
---|---|
|
Computes the average of a set of numbers. Accepted values are primitive numbers and instances of |
|
Counts the number of non-null rows and returns a |
|
Returns the greatest value found. Accepted values must be instances of |
|
Returns the smallest value found. Accepted values must be instances of |
|
Computes the sum of a set of Numbers. If there are no non-null values the result is |
Field Type | Return Type |
---|---|
Integral (other than BigInteger) |
Long |
Float or Double |
Double |
BigInteger |
BigInteger |
BigDecimal |
BigDecimal |
Evaluation of queries with grouping and aggregation
Aggregation queries can include filtering conditions, like usual queries. Filtering can be performed in two stages: before
and after the grouping operation. All filter conditions defined before invoking the groupBy()
method will be applied
before the grouping operation is performed, directly to the cache entries (not to the final projection). These filter
conditions can reference any fields of the queried entity type, and are meant to restrict the data set that is going to
be the input for the grouping stage. All filter conditions defined after invoking the groupBy()
method will be applied to
the projection that results from the projection and grouping operation. These filter conditions can either reference any
of the groupBy()
fields or aggregated fields. Referencing aggregated fields that are not specified in the select clause
is allowed; however, referencing non-aggregated and non-grouping fields is forbidden. Filtering in this phase will
reduce the amount of groups based on their properties. Sorting can also be specified similar to usual queries. The
ordering operation is performed after the grouping operation and can reference any of the groupBy()
fields or aggregated
fields.
2.2.6. DELETE statements
You can delete entities from Infinispan caches with the following syntax:
DELETE FROM <entityName> [WHERE condition]
-
Reference only single entities with
<entityName>
. DELETE queries cannot use joins. -
WHERE conditions are optional.
DELETE queries cannot use any of the following:
-
Projections with SELECT statements
-
Grouping and aggregation
-
ORDER BY clauses
Invoke the |
2.3. Full-text queries
You can perform full-text searches with the Ickle query language.
2.3.1. Fuzzy queries
To execute a fuzzy query add ~
along with an integer, representing the distance from the term used, after the term.
For instance
FROM sample_bank_account.Transaction WHERE description : 'cofee'~2
2.3.2. Range queries
To execute a range query define the given boundaries within a pair of braces, as seen in the following example:
FROM sample_bank_account.Transaction WHERE amount : [20 to 50]
2.3.3. Phrase queries
A group of words can be searched by surrounding them in quotation marks, as seen in the following example:
FROM sample_bank_account.Transaction WHERE description : 'bus fare'
2.3.4. Proximity queries
To execute a proximity query, finding two terms within a specific distance, add a ~
along with the distance after the phrase.
For instance, the following example will find the words canceling and fee provided they are not more than 3 words apart:
FROM sample_bank_account.Transaction WHERE description : 'canceling fee'~3
2.3.5. Wildcard queries
To search for "text" or "test", use the ?
single-character wildcard search:
FROM sample_bank_account.Transaction where description : 'te?t'
To search for "test", "tests", or "tester", use the *
multi-character wildcard search:
FROM sample_bank_account.Transaction where description : 'test*'
2.3.6. Regular expression queries
Regular expression queries can be performed by specifying a pattern between /
. Ickle uses Lucene’s regular expression syntax, so to search for the words moat
or boat
the following could be used:
FROM sample_library.Book where title : /[mb]oat/
2.3.7. Boosting queries
Terms can be boosted by adding a ^
after the term to increase their relevance in a given query, the higher the boost factor the more relevant the term will be. For instance to search for titles containing beer and wine with a higher relevance on beer, by a factor of 3, the following could be used:
FROM sample_library.Book WHERE title : beer^3 OR wine
2.4. Vector search queries
You can perform vector kNN searches with the Ickle query language using the special operator <->
to define predicates.
This is an example of kNN query:
from play.Item i where i.myVector <-> [7,7,7]~3
This query will find the items that have the myVector
fields that are within 3
nearest neighbourhood from the vector [7,7,7]
.
Notice that in order to use this kind of search the entity, in our example play.Item
, has to be @Indexed
and
the field, in our example myVector
, should be annotated with @Vector
.
We support two kinds of vector field types:
-
byte
/Byte
(to work with byte vectors) -
float
/Float
(to work with float vectors)
You can have different vector fields on the same entity, but in any case you can have only one vector predicate on your queries.
2.4.1. Vector search parameters
Both the k-value and the vector can be passed as query parameter.
The k-value scalar can be expressed with the usual placeholder :k
in the Ickle text.
For the vector we can use either a placeholder for each term of the vector:
Query<Item> query = cache.query("from play.Item i where i.floatVector <-> [:a,:b,:c]~:k");
query.setParameter("a", 1);
query.setParameter("b", 4.3);
query.setParameter("c", 3.3);
query.setParameter("k", 4);
Or a placeholder can be used for the entire vector:
Query<Item> query = cache.query("from play.Item i where i.floatVector <-> [:a]~:k");
query.setParameter("a", new float[]{7.1f, 7.0f, 3.1f});
query.setParameter("k", 3);
2.4.2. Score projection with vector search
Is very common also to return the score of the computation, using the score projection. In the case of vector search the query will be like the following:
Query<Object[]> query = cache.query("select i, score(i) from play.Item i where i.floatVector <-> [:a]~:k");
query.setParameter("a", new float[]{7.1f, 7.0f, 3.1f});
query.setParameter("k", 3);
List<Object[]> resultList = query.list();
In this case the first element of each array will contain the entity, and the second element will contain the score of the matching.
2.4.3. Filtering entities
Instead of applying the kNN search to the entire population of entities of a given type, you can limiting the searching set applying classic predicates (match, full-text-search, range, …) to the kNN queries, defining what we call a filtering clause.
A filtering clause can contain any kind of predicates with the only exception of kNN predicates that cannot be included.
For instance the following query:
Query<Object[]> query = remoteCache.query(
"select score(i), i from Item i where i.floatVector <-> [:a]~:k filtering (i.buggy : 'cat' or i.text : 'code')");
query.setParameter("a", new float[]{7, 7, 7});
query.setParameter("k", 3);
Will return the nearest 3 items from the point [7,7,7]
selecting only the items that have a text containing the term cat
or code
.
The filtering queries is a way to apply the classic indexed search to the new vector search.
2.4.4. Vector field attributes
It is always required to specify the dimension of the vector field.
The other mapping attributes are optional, since Infinispan will have a default for each of them. You can configure them, for instance, to tune the desired accuracy / performance.
Similarity
Different VectorSimilarity
algorithms are supported
Value | Distance | Score | Note |
---|---|---|---|
|
\(d(x,y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2 } \) |
\(s = \frac{1}{1+d^2}\) |
This is the Infinispan default |
|
\(d(x,y) = \sum_{i=1}^{n} x_i \cdot y_i \) |
\(s = \frac{1}{1+d}\) |
To use this similarity efficiently, both index and search vectors must be normalized |
|
\(d(x,y) = \sum_{i=1}^{n} x_i \cdot y_i \) |
s = \begin{cases} \frac{1}{1-d} & \text{if d < 0}\\ d+1 & \text{otherwise} \end{cases} |
This similarity does not require vector normalization |
|
\(d(x,y) = \frac{1 - \sum_{i=1} ^{n} x_i \cdot y_i }{ \sqrt{ \sum_{i=1} ^{n} x_i^2 } \sqrt{ \sum_{i=1} ^{n} y_i^2 }} \) |
\(s = \frac{1}{1+d}\) |
This similarity cannot be of |
2.5. Spatial queries
It is possible to define spatial fields in the index domain that can be queried using spatial predicates.
A spatial field denotes a spatial point that is represented by a pair of geographical coordinatees: the latitude and the longitude.
When entities are added to an indexed cache, and their type is configured to be indexed, the mapped spatial fields will be included in the indexes and available for querying.
Spatial queries are not supported by non-indexed queries. |
2.5.1. Spatial fields mapping
Spatial mapping: @GeoPoint
This option uses a type-level indexing annotation @GeoPoint
for each spatial field to be defined.
One or more `@GeoPoint`s can be added to the indexed entity.
The only mandatory attribute is the fieldName
attribute and it is used to denote the field name in the index domain.
If more than one @GeoPoint
are defined on the same entity, their names must be different.
For each of them, we require to have on the same entity two double-typed fields:
-
one annotated with
@Latitude
, that will store the latitude of the given point -
one annotated with
@Longitude
, that will store the longitude of the given point
Where @Latitude
and @Longitude
must have fieldName
attribute
equals to the fieldName
attribute of the corresponding @GeoPoint
annotation.
Here is an example of an entity defining two points: departure
and arrival
.
@Proto (1)
@Indexed
@GeoPoint(fieldName = "departure", projectable = true, sortable = true) (2)
@GeoPoint(fieldName = "arrival", projectable = true, sortable = true) (3)
public record TrainRoute(
@Keyword(normalizer = "lowercase") String name,
@Latitude(fieldName = "departure") Double departureLat, (4)
@Longitude(fieldName = "departure") Double departureLon, (5)
@Latitude(fieldName = "arrival") Double arrivalLat, (6)
@Longitude(fieldName = "arrival") Double arrivalLon (7)
) {
}
1 | The @Proto annotation, indicating that the entity is supposed to be used for remote queries |
2 | A departure point. Optionally sortable and projectable. |
3 | An arrival point. Optionally sortable and projectable. |
4 | The departure 's latitude. |
5 | The departure 's longitude. |
6 | The arrival 's latitude. |
7 | The arrival 's longitude. |
Also a single point can be defined, here is an example:
@Proto
@Indexed
@GeoPoint(fieldName = "location", projectable = true, sortable = true) (1)
public record Restaurant(
@Keyword(normalizer = "lowercase", projectable = true, sortable = true) String name,
@Text String description,
@Text String address,
@Latitude(fieldName = "location") Double latitude, (2)
@Longitude(fieldName = "location") Double longitude, (3)
@Basic Float score
) {
@ProtoSchema( (4)
includeClasses = { Restaurant.class, TrainRoute.class }, (5)
schemaFileName = "geo.proto",
schemaPackageName = "geo",
syntax = ProtoSyntax.PROTO3
)
public interface RestaurantSchema extends GeneratedSchema {
RestaurantSchema INSTANCE = new RestaurantSchemaImpl();
}
}
1 | A location point. Optionally sortable and projectable. |
2 | The location 's latitude. |
3 | The location 's longitude. |
4 | Generate a protobuf schema for the specified entities |
Spatial mapping: @GeoField
Alternatively, the special field types *.LatLng
can be used to define spatial fields.
The spatial fields (or the corresponding properties) must be also annotated with @GeoField
.
Here is an example of embedded query mapping using @GeoField
annotations on LatLng
fields:
import org.infinispan.api.annotations.indexing.model.LatLng; (1)
@Indexed
public record Hiking(@Keyword String name, @GeoField LatLng start, @GeoField LatLng end) { (2)
}
1 | For embedded queries, use org.infinispan.api.annotations.indexing.model.LatLng as type for any spatial field. |
2 | Annotate the spatial fields with @GeoField . |
Here is an example of remote query mapping using @GeoField
annotations on LatLng
fields:
import org.infinispan.commons.api.query.geo.LatLng; (1)
@Proto
@Indexed
public record ProtoHiking(@Keyword String name, @GeoField LatLng start, @GeoField LatLng end) { (2)
@ProtoSchema(
dependsOn = LatLng.LatLngSchema.class, (3)
includeClasses = ProtoHiking.class,
schemaFileName = "hiking.proto",
schemaPackageName = "geo",
syntax = ProtoSyntax.PROTO3
)
public interface ProtoHikingSchema extends GeneratedSchema {
ProtoHikingSchema INSTANCE = new ProtoHikingSchemaImpl();
}
}
1 | For remote queries, use org.infinispan.commons.api.query.geo.LatLng as type for any spatial field. |
2 | Annotate the spatial fields with @GeoField . |
3 | The schema must be generated with the explicit dependency on LatLng.LatLngSchema.class . |
The corresponding proto schema for remote queries will be:
syntax = "proto3";
package geo;
import "latlng.proto";
/**
* @Indexed
*/
message ProtoHiking {
/**
* @Keyword
*/
string name = 1;
/**
* @GeoField
*/
google.type.LatLng start = 2; (1)
/**
* @GeoField
*/
google.type.LatLng end = 3; (1)
}
1 | The Google standard type google.type.LatLng will be used to store spatial fields in the data domain. |
2.5.2. Spatial predicates
Spatial predicate: circle
This predicate selects entities having points within a given distance from a center point.
Query<Restaurant> query = cache.query("from geo.Restaurant r where r.location within circle(41.91, 12.46, :distance)"); (1)
query.setParameter("distance", 100); (2)
List<Restaurant> list = query.list();
1 | We select all the restaurants within 100 meters from a given center, in this case 41.91, 12.46. |
2 | Parameters can be extracted for all the values passed to the within circle predicate. |
By default, meters will be applied as distance unit. It is possible to change it, for instance using kilometers:
Query<Restaurant> query = cache.query("from geo.Restaurant r where r.location within circle(41.91, 12.46, :distance km)"); (1)
query.setParameter("distance", 0.1); (2)
List<Restaurant> list = query.list();
1 | We select all the restaurants within 0.1 Km from a given center, in this case 41.91, 12.46. |
2 | Parameters can be extracted for all the values passed to the within circle predicate. |
Spatial predicate: box
This predicate selects entities having points contained in a given rectangle (or box).
The within box
predicate has arity 4 and it takes as argument:
-
The latitude of the top left box point.
-
The longitude of the top left box point.
-
The latitude of the bottom right box point.
-
The longitude of the bottom right box point.
Query<Restaurant> query = cache.query("from geo.Restaurant r where r.location within box(41.91, 12.45, 41.90, 12.46)"); (1)
List<Restaurant> list = query.list();
1 | We select all the restaurants contained in the given box. |
Spatial predicate: polygon
This predicate selects entities having points within an arbitrary polygon.
The within polygon
predicate has n-arity, and each argument is a geo point expressed in terms of latitude and longitude.
Each point is enclosed by brackets.
Query<Restaurant> query = cache.query("from geo.Restaurant r where r.location within polygon((41.91, 12.45), (41.91, 12.46), (41.90, 12.46), (41.90, 12.46))"); (1)
List<Restaurant> list = query.list();
1 | We select all the restaurants contained in the polygon identified by the provided vertex points. |
2.5.3. Spatial sorting
Use any spatial field to sort the query result. In particular the result can be sorted according to the distance from a given point (the query point), by default in ascending order, to any spatial point owned by the entity.
In this case the spatial field must be sortable, by setting the attribute |
Here is an example of usage:
Query<Restaurant> query = cache.query("from geo.Restaurant r order by distance(r.location, 41.91, 12.46)"); (1)
List<Restaurant> list = query.list();
1 | Restaurants are ordered according to their distances from the given query point (41.91, 12.46). |
2.5.4. Spatial projections
Project the distance between a given point (query point) and an entity’s spatial field by using the distance
predicate in the from
clause.
In this case the spatial field must be projectable, setting the |
Here is an example of usage:
Query<Object[]> projectQuery = remoteCache.query("select r.name, distance(r.location, 41.91, 12.46) from geo.Restaurant r");
List<Object[]> projectList = projectQuery.list();
1 | Pair the restaurant names with their respective distances from the query point (41.91, 12.46). |
Also for spatial projections it is possible to change the default unit measure (meters). Here is an example:
Query<Object[]> projectQuery = remoteCache.query("select r.name, distance(r.location, 41.91, 12.46, yd) from geo.Restaurant r");
List<Object[]> projectList = projectQuery.list();
1 | Pair the restaurant names with their respective distances (using yards as unit measure) from the query point (41.91, 12.46). |
3. Querying remote caches
You can index and query remote caches on Infinispan Server.
3.1. Querying caches from Hot Rod Java clients
Infinispan lets you programmatically query remote caches from Java clients through the Hot Rod endpoint.
This procedure explains how to index query a remote cache that stores Book
instances.
-
Add the ProtoStream processor to your
pom.xml
.
Infinispan provides this processor for the @ProtoField
annotations so you can generate Protobuf schemas and perform queries.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>...</version>
<configuration>
<annotationProcessorPaths>
<annotationProcessorPath>
<groupId>org.infinispan.protostream</groupId>
<artifactId>protostream-processor</artifactId>
<version>...</version>
</annotationProcessorPath>
</annotationProcessorPaths>
</configuration>
</plugin>
</plugins>
</build>
-
Add indexing annotations to your class, as in the following example:
Book.javaimport org.infinispan.api.annotations.indexing.Basic; import org.infinispan.api.annotations.indexing.Indexed; import org.infinispan.api.annotations.indexing.Text; import org.infinispan.protostream.annotations.ProtoFactory; import org.infinispan.protostream.annotations.ProtoField; @Indexed public class Book { @Text @ProtoField(number = 1) final String title; @Text @ProtoField(number = 2) final String description; @Basic @ProtoField(number = 3, defaultValue = "0") final int publicationYear; @ProtoFactory Book(String title, String description, int publicationYear) { this.title = title; this.description = description; this.publicationYear = publicationYear; } // public Getter methods omitted for brevity }
-
Implement the
SerializationContextInitializer
interface in a new class and then add the@ProtoSchema
annotation.-
Reference the class that includes the
@ProtoField
annotations with theincludeClasses
parameter. -
Define a name for the Protobuf schema that you generate and filesystem path with the
schemaFileName
andschemaFilePath
parameters. -
Specify the package name for the Protobuf schema with the
schemaPackageName
parameter.RemoteQueryInitializer.javaimport org.infinispan.protostream.SerializationContextInitializer; import org.infinispan.protostream.annotations.ProtoSchema; @ProtoSchema( includeClasses = { Book.class }, schemaFileName = "book.proto", schemaFilePath = "proto/", schemaPackageName = "book_sample") public interface RemoteQueryInitializer extends SerializationContextInitializer { }
-
-
Compile your project.
The code examples in this procedure generate a
proto/book.proto
schema and anRemoteQueryInitializerImpl.java
implementation of the annotatedBook
class.
Create a remote cache that configures Infinispan to index your entities.
For example, the following remote cache indexes the Book
entity in the book.proto
schema that you generated in the previous step:
<replicated-cache name="books">
<indexing>
<indexed-entities>
<indexed-entity>book_sample.Book</indexed-entity>
</indexed-entities>
</indexing>
</replicated-cache>
The following RemoteQuery
class does the following:
-
Registers the
RemoteQueryInitializerImpl
serialization context with a Hot Rod Java client. -
Registers the Protobuf schema,
book.proto
, with Infinispan Server. -
Adds two
Book
instances to the remote cache. -
Performs a full-text query that matches books by keywords in the title.
package org.infinispan;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import org.infinispan.client.hotrod.RemoteCache;
import org.infinispan.client.hotrod.RemoteCacheManager;
import org.infinispan.client.hotrod.Search;
import org.infinispan.client.hotrod.configuration.ConfigurationBuilder;
import org.infinispan.query.dsl.Query;
import org.infinispan.query.dsl.QueryFactory;
import org.infinispan.query.remote.client.ProtobufMetadataManagerConstants;
public class RemoteQuery {
public static void main(String[] args) throws Exception {
ConfigurationBuilder clientBuilder = new ConfigurationBuilder();
// RemoteQueryInitializerImpl is generated
clientBuilder.addServer().host("127.0.0.1").port(11222)
.security().authentication().username("user").password("user")
.addContextInitializers(new RemoteQueryInitializerImpl());
RemoteCacheManager remoteCacheManager = new RemoteCacheManager(clientBuilder.build());
// Grab the generated protobuf schema and registers in the server.
Path proto = Paths.get(RemoteQuery.class.getClassLoader()
.getResource("proto/book.proto").toURI());
String protoBufCacheName = ProtobufMetadataManagerConstants.PROTOBUF_METADATA_CACHE_NAME;
remoteCacheManager.getCache(protoBufCacheName).put("book.proto", Files.readString(proto));
// Obtain the 'books' remote cache
RemoteCache<Object, Object> remoteCache = remoteCacheManager.getCache("books");
// Add some Books
Book book1 = new Book("Infinispan in Action", "Learn Infinispan with using it", 2015);
Book book2 = new Book("Cloud-Native Applications with Java and Quarkus", "Build robust and reliable cloud applications", 2019);
remoteCache.put(1, book1);
remoteCache.put(2, book2);
// Execute a full-text query
Query<Book> query = remoteCache.query("FROM book_sample.Book WHERE title:'java'");
List<Book> list = query.execute().list(); // Voila! We have our book back from the cache!
}
}
3.2. Querying ProtoStream common types
Perform Ickle queries on caches that store data as ProtoStream common types such as BigInteger
and BigDecimal
.
-
Add indexing annotations to your class, as in the following example:
@Indexed public class CalculusIndexed { @Basic @ProtoField(value = 1) public BigInteger getPurchases() { return purchases; } @Decimal // the scale is 2 by default @ProtoField(value = 2) public BigDecimal getProspect() { return prospect; } }
-
Set the
dependsOn
attribute toCommonTypes.class
to indicate that the generated Protobuf schema can reference and useCommonTypes
types such asBigInteger
andBigDecimal
:@ProtoSchema(includeClasses = CalculusIndexed.class, dependsOn = CommonTypes.class, schemaFilePath = "/protostream", schemaFileName = "calculus-indexed.proto", schemaPackageName = "lab.indexed") public interface CalculusIndexedSchema extends GeneratedSchema { }
-
Perform queries:
Query<Product> query = cache.query("from lab.indexed.CalculusIndexed c where c.purchases > 9"); QueryResult<Product> result = query.execute(); // play with the result query = cache.query("from lab.indexed.CalculusIndexed c where c.prospect = 2.2"); result = query.execute(); // play with the result
3.3. Querying caches from Infinispan Console and CLI
Infinispan Console and the Infinispan Command Line Interface (CLI) let you query indexed and non-indexed remote caches. You can also use any HTTP client to index and query caches via the REST API.
This procedure explains how to index and query a remote cache that stores Person
instances.
-
Have at least one running Infinispan Server instance.
-
Have Infinispan credentials with create permissions.
-
Add indexing annotations to your Protobuf schema, as in the following example:
package org.infinispan.example; /* @Indexed */ message Person { /* @Basic */ optional int32 id = 1; /* @Keyword(projectable = true) */ required string name = 2; /* @Keyword(projectable = true) */ required string surname = 3; /* @Basic(projectable = true, sortable = true) */ optional int32 age = 6; }
From the Infinispan CLI, use the
schema
command with the--upload=
argument as follows:schema --upload=person.proto person.proto
-
Create a cache named people that uses ProtoStream encoding and configures Infinispan to index entities declared in your Protobuf schema.
The following cache indexes the
Person
entity from the previous step:<distributed-cache name="people"> <encoding media-type="application/x-protostream"/> <indexing> <indexed-entities> <indexed-entity>org.infinispan.example.Person</indexed-entity> </indexed-entities> </indexing> </distributed-cache>
From the CLI, use the
create cache
command with the--file=
argument as follows:create cache --file=people.xml people
-
Add entries to the cache.
To query a remote cache, it needs to contain some data. For this example procedure, create entries that use the following JSON values:
PersonOne{ "_type":"org.infinispan.example.Person", "id":1, "name":"Person", "surname":"One", "age":44 }
PersonTwo{ "_type":"org.infinispan.example.Person", "id":2, "name":"Person", "surname":"Two", "age":27 }
PersonThree{ "_type":"org.infinispan.example.Person", "id":3, "name":"Person", "surname":"Three", "age":35 }
From the CLI, use the
put
command with the--file=
argument to add each entry, as follows:put --encoding=application/json --file=personone.json personone
From Infinispan Console, you must select Custom Type for the Value content type field when you add values in JSON format with custom types .
-
Query your remote cache.
From the CLI, use the
query
command from the context of the remote cache.query "from org.infinispan.example.Person p WHERE p.name='Person' ORDER BY p.age ASC"
The query returns all entries with a name that matches
Person
by age in ascending order.
3.4. Using analyzers with remote caches
Analyzers convert input data into terms that you can index and query.
You specify analyzer definitions with the @Text
annotation in your Java classes or directly in Protobuf schema.
-
Annotate the property with the
@Text
annotation to indicate that its value is analyzed. -
Use the
analyzer
attribute to specify the desired analyzer that you want to use for indexing and searching.
/* @Indexed */
message TestEntity {
/* @Keyword(projectable = true) */
optional string id = 1;
/* @Text(projectable = true, analyzer = "simple") */
optional string name = 2;
}
@Text(projectable = true, analyzer = "whitespace")
@ProtoField(value = 1)
private String id;
@Text(projectable = true, analyzer = "simple")
@ProtoField(value = 2)
private String description;
3.4.1. Default analyzer definitions
Infinispan provides a set of default analyzer definitions.
Definition | Description |
---|---|
|
Splits text fields into tokens, treating whitespace and punctuation as delimiters. |
|
Tokenizes input streams by delimiting at non-letters and then converting all letters to lowercase characters. Whitespace and non-letters are discarded. |
|
Splits text streams on whitespace and returns sequences of non-whitespace characters as tokens. |
|
Treats entire text fields as single tokens. |
|
Stems English words using the Snowball Porter filter. |
|
Generates n-gram tokens that are 3 grams in size by default. |
|
Splits text fields into larger size tokens than the |
|
Converts all the letters of the text to lowercase characters, the text is not tokenized (normalizer). |
These analyzer definitions are based on Apache Lucene. For more information about tokenizers, filters, and CharFilters, see the Apache Lucene documentation.
3.4.2. Creating custom analyzer definitions
Create custom analyzer definitions and add them to your Infinispan Server installations.
-
Stop Infinispan Server if it is running.
Infinispan Server loads classes at startup only.
-
Implement the
ProgrammaticSearchMappingProvider
API. -
Package your implementation in a JAR with the fully qualified class (FQN) in the following file:
META-INF/services/org.infinispan.query.spi.ProgrammaticSearchMappingProvider
-
Copy your JAR file to the
server/lib
directory of your Infinispan Server installation. -
Start Infinispan Server.
ProgrammaticSearchMappingProvider
exampleimport org.apache.lucene.analysis.core.LowerCaseFilterFactory;
import org.apache.lucene.analysis.core.StopFilterFactory;
import org.apache.lucene.analysis.standard.StandardFilterFactory;
import org.apache.lucene.analysis.standard.StandardTokenizerFactory;
import org.hibernate.search.cfg.SearchMapping;
import org.infinispan.Cache;
import org.infinispan.query.spi.ProgrammaticSearchMappingProvider;
public final class MyAnalyzerProvider implements ProgrammaticSearchMappingProvider {
@Override
public void defineMappings(Cache cache, SearchMapping searchMapping) {
searchMapping
.analyzerDef("standard-with-stop", StandardTokenizerFactory.class)
.filter(StandardFilterFactory.class)
.filter(LowerCaseFilterFactory.class)
.filter(StopFilterFactory.class);
}
}
3.5. Queries by keys
You can define the key of a cache entry as Indexed
type to index the key fields as well the value fields allowing the keys to be used in Ickle queries.
To define an Indexed
key, specify the fully qualified name of the ProtocolBuffer message type to use as the key type in the keyEntity
attribute of the @Indexed
annotation.
This feature is available only with indexed remote queries. |
keyEntity
of an indexed entityimport org.infinispan.api.annotations.indexing.Basic;
import org.infinispan.api.annotations.indexing.Indexed;
import org.infinispan.api.annotations.indexing.Text;
import org.infinispan.protostream.GeneratedSchema;
import org.infinispan.protostream.annotations.ProtoFactory;
import org.infinispan.protostream.annotations.ProtoField;
import org.infinispan.protostream.annotations.ProtoSchema;
@Indexed(keyEntity = "model.StructureKey")
public class Structure {
private final String code;
private final String description;
private final Integer value;
@ProtoFactory
public Structure(String code, String description, Integer value) {
this.code = code;
this.description = description;
this.value = value;
}
@ProtoField(1)
@Basic
public String getCode() {
return code;
}
@ProtoField(2)
@Text
public String getDescription() {
return description;
}
@ProtoField(3)
@Basic
public Integer getValue() {
return value;
}
@ProtoSchema(includeClasses = { Structure.class, StructureKey.class }, schemaPackageName = "model")
public interface StructureSchema extends GeneratedSchema {
StructureSchema INSTANCE = new StructureSchemaImpl();
}
}
import org.infinispan.api.annotations.indexing.Basic;
import org.infinispan.api.annotations.indexing.Indexed;
import org.infinispan.api.annotations.indexing.Keyword;
import org.infinispan.protostream.annotations.ProtoFactory;
import org.infinispan.protostream.annotations.ProtoField;
@Indexed
public class StructureKey {
private String zone;
private Integer row;
private Integer column;
@ProtoFactory
public StructureKey(String zone, Integer row, Integer column) {
this.zone = zone;
this.row = row;
this.column = column;
}
@Keyword(projectable = true, sortable = true)
@ProtoField(1)
public String getZone() {
return zone;
}
@Basic(projectable = true, sortable = true)
@ProtoField(2)
public Integer getRow() {
return row;
}
@Basic(projectable = true, sortable = true)
@ProtoField(3)
public Integer getColumn() {
return column;
}
}
3.5.1. Key property name
By default, the key fields will be targeted using the property named key
.
select s.key.column from model.Structure s where s.key.zone = 'z7'
If the value already has a property named key
, the definition of the key entity could create a naming conflict
with the properties.
For this reason, and also in general,
it is possible to change the name to assign as a prefix for the property keys changing the attribute keyPropertyName
of the @Indexed
annotation.
3.6. Remote queries from server tasks
The feature is marked as experimental. |
Indexes for remote caches encoded with ProtoBuf can be used to run queries from server tasks, even if the server tasks are run embedded with the server JVM.
Here is an example:
package org.infinispan.server.functional.extensions;
import java.util.Map;
import org.infinispan.commons.api.query.Query;
import org.infinispan.commons.api.query.QueryResult;
import org.infinispan.protostream.sampledomain.User;
import org.infinispan.tasks.ServerTask;
import org.infinispan.tasks.TaskContext;
import org.infinispan.tasks.query.RemoteQueryAccess;
public class RemoteQueryAccessTask implements ServerTask<Integer> {
private static final ThreadLocal<TaskContext> taskContext = new ThreadLocal<>();
private static final String QUERY = "FROM pro.User WHERE name = :name order by id";
private static final String QUERY_PROJ = "select id, name, surname " + QUERY;
@Override
public void setTaskContext(TaskContext ctx) {
taskContext.set(ctx);
}
@Override
public Integer call() {
TaskContext ctx = taskContext.get();
String name = (String) ctx.getParameters().get().get("name");
RemoteQueryAccess remoteQueryAccess = ctx.getRemoteQueryAccess().get();
Map<String, Object> params = Map.of("name", name);
Query<User> query = remoteQueryAccess.query(QUERY);
query.setParameters(params);
QueryResult<User> result1 = query.execute();
Query<Object[]> queryProj = remoteQueryAccess.query(QUERY_PROJ);
query.setParameters(params);
QueryResult<Object[]> result2 = queryProj.execute();
return result1.count().value() + result2.count().value();
}
@Override
public String getName() {
return "remote-query-access-task";
}
}
The RemoteQueryAccess
can be obtained from the TaskContext
.
It allows run the remote queries.
Using the method executeQuery
, that will take the following parameter:
-
queryString
: the Iclke query to execute, as it was executed from a client -
namedParametersMap
: the parameters to pass to the query -
offset
andmaxResults
: for pagination -
hitCountAccuracy
: the bound up to hit count will be exact -
local
: if the query should report results only considering the local index shard
The feature is marked as experimental. Several query APIs are not available from this setting. In particular is not possible to execute a statement, verify if a query has projections defined, it is not possible to use Neither the iterator, nor the entity iterator, timeouts and forcing the score to be computed. |
4. Querying embedded caches
Use embedded queries when you add Infinispan as a library to custom applications.
Protobuf mapping is not required with embedded queries. Indexing and querying are both done on top of Java objects.
4.1. Querying embedded caches
This section explains how to query an embedded cache using an example cache named "books" that stores indexed Book
instances.
In this example, each Book
instance defines which properties are indexed and specifies some advanced indexing options with Hibernate Search annotations as follows:
package org.infinispan.sample;
import java.time.LocalDate;
import java.util.HashSet;
import java.util.Set;
import org.infinispan.api.annotations.indexing.*;
// Annotate values with @Indexed to add them to indexes
// Annotate each field according to how you want to index it
@Indexed
public class Book {
@Keyword
String title;
@Text
String description;
@Keyword
String isbn;
@Basic
LocalDate publicationDate;
@Embedded
Set<Author> authors = new HashSet<Author>();
}
package org.infinispan.sample;
import org.infinispan.api.annotations.indexing.Text;
public class Author {
@Text
String name;
@Text
String surname;
}
-
Configure Infinispan to index the "books" cache and specify
org.infinispan.sample.Book
as the entity to index.<distributed-cache name="books"> <indexing path="${user.home}/index"> <indexed-entities> <indexed-entity>org.infinispan.sample.Book</indexed-entity> </indexed-entities> </indexing> </distributed-cache>
-
Obtain the cache.
import org.infinispan.Cache; import org.infinispan.manager.DefaultCacheManager; import org.infinispan.manager.EmbeddedCacheManager; EmbeddedCacheManager manager = new DefaultCacheManager("infinispan.xml"); Cache<String, Book> cache = manager.getCache("books");
-
Perform queries for fields in the
Book
instances that are stored in the Infinispan cache, as in the following example:// Create an Ickle query that performs a full-text search using the ':' operator on the 'title' and 'authors.name' fields // You can perform full-text search only on indexed caches Query<Book> fullTextQuery = cache.query("FROM org.infinispan.sample.Book b WHERE b.title:'infinispan' AND b.authors.name:'sanne'"); // Use the '=' operator to query fields in caches that are indexed or not // Non full-text operators apply only to fields that are not analyzed Query<Book> exactMatchQuery= cache.query("FROM org.infinispan.sample.Book b WHERE b.isbn = '12345678' AND b.authors.name : 'sanne'"); // You can use full-text and non-full text operators in the same query Query<Book> query= cache.query("FROM org.infinispan.sample.Book b where b.authors.name : 'Stephen' and b.description : (+'dark' -'tower')"); // Get the results List<Book> found=query.execute().list();
4.2. Entity mapping annotations
Add annotations to your Java classes to map your entities to indexes.
Infinispan uses the Hibernate Search API to define fine grained configuration for indexing at entity level. This configuration includes which fields are annotated, which analyzers should be used, how to map nested objects, and so on.
The following sections provide information that applies to entity mapping annotations for use with Infinispan.
For complete detail about these annotations, you should refer to the Hibernate Search manual.
@DocumentId
Unlike Hibernate Search, using @DocumentId
to mark a field as identifier does not apply to Infinispan values; in Infinispan the identifier for all @Indexed
objects is the key used to store the value. You can still customize how the key is indexed using a combination of @Transformable
, custom types and custom FieldBridge
implementations.
@Transformable keys
The key for each value needs to be indexed as well, and the key instance must be transformed in a String
. Infinispan includes some default transformation routines to encode common primitives, but to use a custom key you must provide an implementation of org.infinispan.query.Transformer
.
You can annotate your key class with org.infinispan.query.Transformable
and your custom transformer implementation
will be picked up automatically:
@Transformable(transformer = CustomTransformer.class)
public class CustomKey {
...
}
public class CustomTransformer implements Transformer {
@Override
public Object fromString(String s) {
...
return new CustomKey(...);
}
@Override
public String toString(Object customType) {
CustomKey ck = (CustomKey) customType;
return ...
}
}
Use the key-transformers
xml element in both embedded and server config:
<replicated-cache name="test">
<indexing>
<key-transformers>
<key-transformer key="com.mycompany.CustomKey"
transformer="com.mycompany.CustomTransformer"/>
</key-transformers>
</indexing>
</replicated-cache>
Alternatively, use the Java configuration API (embedded mode):
ConfigurationBuilder builder = ...
builder.indexing().enable()
.addKeyTransformer(CustomKey.class, CustomTransformer.class);
5. Creating continuous queries
Applications can register listeners to receive continual updates about cache entries that match query filters.
5.1. Continuous queries
Continuous queries provide applications with real-time notifications about data in Infinispan caches that are filtered by queries. When entries match the query Infinispan sends the updated data to any listeners, which provides a stream of events instead of applications having to execute the query.
Continuous queries can notify applications about incoming matches, for values that have joined the set; updated matches, for matching values that were modified and continue to match; and outgoing matches, for values that have left the set.
For example, continuous queries can notify applications about all:
-
Persons with an age between 18 and 25, assuming the
Person
entity has anage
property and is updated by the user application. -
Transactions for dollar amounts larger than $2000.
-
Times where the lap speed of F1 racers were less than 1:45.00 seconds, assuming the cache contains Lap entries and that laps are entered during the race.
Continuous queries can use all query capabilities except for grouping, aggregation, and sorting operations. |
How continuous queries work
Continuous queries notify client listeners with the following events:
Join
-
A cache entry matches the query.
Update
-
A cache entry that matches the query is updated and still matches the query.
Leave
-
A cache entry no longer matches the query.
When a client registers a continuous query listener it immediately receives Join
events for any entries that match the query.
Client listeners receive subsequent events each time a cache operation modifies entries that match the query.
Infinispan determines when to send Join
, Update
, or Leave
events to client listeners as follows:
-
If the query on both the old and new value does not match, Infinispan does not sent an event.
-
If the query on the old value does not match but the new value does, Infinispan sends a
Join
event. -
If the query on both the old and new values match, Infinispan sends an
Update
event. -
If the query on the old value matches but the new value does not, Infinispan sends a
Leave
event. -
If the query on the old value matches and the entry is then deleted or it expires, Infinispan sends a
Leave
event.
5.1.1. Continuous queries and Infinispan performance
Continuous queries provide a constant stream of updates to applications, which can generate a significant number of events.
Infinispan temporarily allocates memory for each event it generates, which can result in memory pressure and potentially lead to OutOfMemoryError
exceptions, especially for remote caches.
For this reason, you should carefully design your continuous queries to avoid any performance impact.
Infinispan strongly recommends that you limit the scope of your continuous queries to the smallest amount of information that you need. To achieve this, you can use projections and predicates. For example, the following statement provides results about only a subset of fields that match the criteria rather than the entire entry:
SELECT field1, field2 FROM Entity WHERE x AND y
It is also important to ensure that each ContinuousQueryListener
you create can quickly process all received events without blocking threads.
To achieve this, you should avoid any cache operations that generate events unnecessarily.
5.2. Creating continuous queries
You can create continuous queries for remote and embedded caches.
-
Create a
Query
object. -
Obtain the
ContinuousQuery
object of your cache by calling the appropriate method:-
Remote caches:
org.infinispan.client.hotrod.Search.getContinuousQuery(RemoteCache<K, V> cache)
-
Embedded caches:
org.infinispan.query.Search.getContinuousQuery(Cache<K, V> cache)
-
-
Register the query and a
ContinuousQueryListener
object as follows:continuousQuery.addContinuousQueryListener(query, listener);
-
When you no longer need the continuous query, remove the listener as follows:
continuousQuery.removeContinuousQueryListener(listener);
Continuous query example
The following code example demonstrates a simple continuous query with an embedded cache.
In this example, the listener receives notifications when any Person
instances under the age of 21 are added to the cache.
Those Person
instances are also added to the "matches" map.
When the entries are removed from the cache or their age becomes greater than or equal to 21, they are removed from "matches" map.
import org.infinispan.query.api.continuous.ContinuousQuery;
import org.infinispan.query.api.continuous.ContinuousQueryListener;
import org.infinispan.query.Search;
import org.infinispan.query.dsl.QueryFactory;
import org.infinispan.query.dsl.Query;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
[...]
// We have a cache of Person objects.
Cache<Integer, Person> cache = ...
// Create a ContinuousQuery instance on the cache.
ContinuousQuery<Integer, Person> continuousQuery = Search.getContinuousQuery(cache);
// Define a query.
// In this example, we search for Person instances under 21 years of age.
Query query = cache.query("FROM Person p WHERE p.age < 21");
final Map<Integer, Person> matches = new ConcurrentHashMap<Integer, Person>();
// Define the ContinuousQueryListener.
ContinuousQueryListener<Integer, Person> listener = new ContinuousQueryListener<Integer, Person>() {
@Override
public void resultJoining(Integer key, Person value) {
matches.put(key, value);
}
@Override
public void resultUpdated(Integer key, Person value) {
// We do not process this event.
}
@Override
public void resultLeaving(Integer key) {
matches.remove(key);
}
};
// Add the listener and the query.
continuousQuery.addContinuousQueryListener(query, listener);
[...]
// Remove the listener to stop receiving notifications.
continuousQuery.removeContinuousQueryListener(listener);
6. Monitoring and tuning Infinispan queries
Infinispan exposes statistics for queries and provides attributes that you can adjust to improve query performance.
6.1. Getting query statistics
Collect statistics to gather information about performance of your indexes and queries, including information such as the types of indexes, average time for queries to complete and the number of possible failures on indexing operations.
Do one of the following:
-
Invoke the
getSearchStatistics()
orgetClusteredSearchStatistics()
methods for embedded caches. -
Use
GET
requests to obtain statistics for remote caches from the REST API.
// Statistics for the local cluster member
SearchStatistics statistics = Search.getSearchStatistics(cache);
// Consolidated statistics for the whole cluster
CompletionStage<SearchStatisticsSnapshot> statistics = Search.getClusteredSearchStatistics(cache)
GET /rest/v2/caches/{cacheName}/search/stats
6.2. Tuning query performance
Use the following guidelines to help you improve the performance of indexing operations and queries.
Queries against partially indexed caches return slower results. For instance, if some fields in a schema are not annotated then the resulting index does not include those fields.
Start tuning query performance by checking the time it takes for each type of query to run. If your queries seem to be slow, you should make sure that queries are using the indexes for caches and that all entities and field mappings are indexed.
Indexing can degrade write throughput for Infinispan clusters.
The commit-interval
attribute defines the interval, in milliseconds, between which index changes that are buffered in memory are flushed to the index storage and a commit is performed.
This operation is costly so you should avoid configuring an interval that is too small. The default is 1000 ms (1 second).
The refresh-interval
attribute defines the interval, in milliseconds, between which the index reader is refreshed.
The default value is 0
, which returns data in queries as soon as it is written to a cache.
A value greater than 0
results in some stale query results but substantially increases throughput, especially in write-heavy scenarios.
If you do not need data returned in queries as soon as it is written, you should adjust the refresh interval to improve query performance.