Friday, 09 February 2018

RESTful queries coming to Infinispan 9.2

One of the interesting features in the upcoming Infinispan 9.2 release is the possibility to execute queries over the REST endpoint, enabling users to take advantage of the easy-to-use and expressiveness of the Ickle query language, that combines a subset of JP-QL with full-text features. You can learn more info about Ickle in a previous post.

Besides exposing query over REST, Infinispan 9.2 also adds support for mapping between JSON and Protobuf formats, allowing an efficient storage in binary format while exposing queries, reading and writing content as JSON documents.

To illustrate those new capabilities, this post will walk you through a sample app from scratch!

Sample app

Running the server

We start by running the Infinispan Server 9.2.0.CR2 (the latest release candidate):

This will get you a fresh instance of Infinispan running, with login and password 'user' and the REST port 8080 mapped to localhost. TIP: if you run more than one container, they’ll form a cluster automatically.

Creating an indexed cache

Next step is to create an indexed cache called 'pokemon'. We make use of the CLI  (Command Line Interface) to create this cache. In the future, with ISPN-8529, we’ll also be able to create cache with arbitrary configuration using REST, but for now we execute a CLI recipe:

Creating the schema

In order to be able to query, we need to define a protobuf schema for our data. The schema follows the Protobuf 2 format (Protobuf 3 support is coming) and allows for extensions to define indexing properties (analyzers, storage, etc).

Here’s how it looks like:

The protobuf schema can contain some comments on top of fields and messages with "annotations" to control indexing. Hibernate Search users will recognize some of those pseudo annotations we are using here: they resemble closely their counterpart.

Registering the schema

Once we have our schema, we can easily register it via REST:

Populating the cache

We’re now ready to put some data in the cache. As mentioned earlier, ingesting can be done by sending JSON documents directly. Once Infinispan receives those documents, it will convert them to protobuf, index and store them.

In order to match a particular inbound document to an entity in the schema, Infinispan uses a special meta field called _type that must be provided in the document. Here’s an example of a JSON document that conforms to our schema:

Writing the document is easy:

we can retrieve content by key as JSON:

Querying

The new query endpoint can be called with an "action" parameter named "search", after the cache name. The simplest query, which returns all data can be done with:

If you do not want to return all the fields, use a Select clause:

Pagination can be controlled with the offset, max_results URL parameters:

Grouping is also possible:

Example of a query result:

Results:

Conclusion

 

Infinispan 9.2 makes it easier to quickly ingest and query datasets using the ubiquitous JSON format, without sacrificing type safety and storage size.

By storing Protobuf, this will also enable other clients like the Hot Rod C#/C++ clients to query, read and write data simultaneously with REST clients.

The full source code for the demo, along with instructions on how to populate the whole dataset can be found at Github.

Finally, please try out this new feature in your own dataset and let us know how it goes!

Posted by Gustavo on 2018-02-09
Tags: rest query Protobuf JSON query

Friday, 07 April 2017

In Memory Data Grid Patterns Demos from Devoxx France!

Devoxx France 2017 was a blast!! Emmanuel and I would like to thank all attendees to our in-memory data grids patterns talk. The room was full and we thoroughly enjoyed the experience!

During the talk we presented a couple of small demos that showcased some in-memory data grid use cases. The demos are located here, but I thought it’d be useful to provide some step-by-step here so that you can get them running as quickly as possible.

Before we start with any of the demos, it’s necessary to run some set up steps:

  1. Check out git repository:

  2. Download Infinispan Server 9.0.0.Final and at the same level as the git repository.

  3. Go into the datagrid-patterns directory, start the servers and wait until they’ve started:

    cd datagrid-patterns     ./run-servers.sh

  4. Install Anaconda for Python 3, this is required to run Jupyter notebook for plotting.

  5. Install Maven 3.

Once the set up is complete, it’s time to start with the individual demos.

Both demos shown below work with the same application domain: rail transport systems. In this domain, we differentiate between physical stations, trains, station boards which are located in stations, and finally stops, which are individual entries in station boards.

Analytics Demo

The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

To answer this question, Infinispan data grid will be loaded with 3 weeks worth of data from station boards. Once the data is loaded, we will execute a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

Then, these results are sent back to the client, which in turn, stores the results as JSON in an intermediate cache. Once the results are in place, we will use a Jupyter notebook to read those results and plot the result.

Let’s see these steps in action:

  1. First, we need to install the server tasks in the running servers above:

    cd datagrid-patterns/analytics

    mvn clean install package -am -pl analytics-server

    mvn wildfly:deploy -pl analytics-server

    

  2. Open the datagrid-pattern repo with your favourite IDE and run delays.java.stream.InjectApp class located in analytics/analytics-server project. This command will inject the data into the cache. On my environment, it takes between 1 and 2 minutes.

  3. With the data loaded, we need to run the remote task that will calculate the total number of trains per hour and how many of those are delayed. To do that, execute delays.java.stream.AnalyticsApp class located in analytics/analytics-server project from your IDE.

  4. You can verify that the results have been calculating by going to the following address:

  5. With the results in place, it’s time to start the Jupyter notebook:

    cd datagrid-patterns/analytics/analytics-jupyter

    ~/anaconda/bin/jupyter notebook

  6. Once the notebook opens, click open live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:

image

So, the answer to the question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

is 2am! That’s because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Real Time Demo

The second demo that we presented uses the same application domain as above, but this time we’re trying to use our data grid as a way of storing the station board state of each station at a given point in time. So, the idea is to use Infinispan as an in memory data grids for working with real time data.

So, what can we do with this type of data? In our demo, we will create a centralised dashboard of delayed trains around the country. To do that, we will take advantage of Infinispan’s Continuous Query functionality which allows us to find those station boards which contain stops that are delayed, and as new delayed trains appeared these will be pushed to our dashboard.

To run this demo, keep the same servers running as above and do the following:

  1. Run delays.query.continuous.FxApp application located in real-time project inside the datagrid-patterns demo. This app will inject some live station board data and will launch a JavaFX dashboard that shows delayed trains as they appear. It should look something like this:

image

Conclusion

This has been a summary of the demos that we run in our talk at Devoxx France with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information of these demos. If there’s anything unclear or any of the instructions above are not working, please let us know!

Thanks to Emmanuel Bernard for partnering with me for this Devoxx France talk and for the continuous feedback while developing the demos. Thanks as well to Tristan Tarrant for the input in the demos and many thanks to all Devoxx France attendees who attended our talk :)

A very special thanks to Alexandre Masselot whose "Swiss Transport in Real Time: Tribulations in the Big Data Stack" talk at Soft-Shake 2016 was the inspiration for these demos. @Alex, thanks a lot for sharing the demos and data with me and the rest of the community!!

In a just a few weeks I’ll be at Great Indian Developer Summit presenting these demos and much more! Stay tuned :)

Cheers,

Galder

Posted by Galder Zamarreño on 2017-04-07
Tags: conference devoxx demo streams query

Thursday, 08 December 2016

Meet Ickle!

As you’ve already learned from an earlier post this week, Infinispan 9 is on its final approach to landing and is bringing a new query language. Hurray! But wait, was there something wrong with the old one(s)? Not wrong really …​  I’ll explain.

Infinispan is a data grid of several query languages. Historically, it has offered search support early in its existence by integrating with Hibernate Search which provides a powerful Java-based DSL enabling you to build Lucene queries and run them on top of your Java domain model living in the data grid. Usage of this integration is confined to embedded mode, but that still succeeds in making Java users happy.

While the Hibernate Search combination is neat and very appealing to Java users it completely leaves non-JVM languages accessing Infinispan via remote protocols out in the cold.

Enter Remote Query. Infinispan 6.0 starts to address the need of searching the grid remotely via Hot Rod. The internals are still built on top of Lucene and Hibernate Search bedrock but these technologies are now hidden behind a new query API, the QueryBuilder, an internal DSL resembling JPA criteria query. The QueryBuilder has implementations for both embedded mode and Hot Rod. This new API provides all relational operators you can think of, but no full-text search initially, we planned to add that later.

Creating a new internal DSL was fun. However, having a long term strategy for evolving it while keeping complete backward compatibility and also doing so uniformly across implementations in multiple languages proved to be a difficult challenge. So while we were contemplating adding new full-text operators to this DSL we decided on making a long leap forward and adopt a more flexible alternative by having our own string based query language instead, another DSL really, albeit an external one this time.

So after the long ado, let me introduce Ickle, Infinispan’s new query language, conspicuously resembling JP-QL.

Ickle:

  • is a light and small subset of JP-QL, hence the lovely name

  • queries Java classes and supports Protocol Buffers too

  • queries can target a single entity type

  • queries can filter on properties of embedded objects too, including collections

  • supports projections, aggregations, sorting, named parameters

  • supports indexed and non-indexed execution

  • supports complex boolean expressions

  • does not support computations in expressions (eg. user.age > sqrt(user.shoeSize + 3) is not allowed but user.age >= 18 is fine)

  • does not support joins

  • but, navigations along embedded entities are implicit joins and are allowed

  • joining on embedded collections is allowed

  • other join types not supported

  • subqueries are not supported

  • besides the normal relational operators it offers full-text operators, similar to Lucene’s  query parser

  • is now supported across various Infinispan APIs, wherever a Query produced by the QueryBuilder is accepted (even for continuous queries or in event filters for listeners!)

That is to say we squeezed JP-QL to the bare minimum and added full-text predicates that closely follow the syntax of Lucene’s query parser.

If you are familiar with JPA/JP-QL then the following example will speak for itself:

select accountId, sum(amount) from com.acme.Transaction
    where amount < 20.0
    group by accountId
    having sum(amount) > 1000.0
    order by accountId

The same query can be written using the QueryBuilder:

Query query = queryFactory.from(Transaction.class)
.select(Expression.property("accountId"), Expression.sum("amount"))
.having("amount").lt(20.0)
.groupBy("accountId")
.having(Expression.sum("amount")).gt(1000.0)
.orderBy("accountId").build();

Both examples look nice but I hope you will agree the first one is better.

Ickle supports several new predicates for full-text matching that the QueryBuilder is missing. These predicates use the : operator that you are probably familiar from Lucene’s own query language.  This example demonstrates a simple full-text term query:

select transactionId, amount, description from com.acme.Transaction
where amount > 10 and description : "coffee"

As you can see, relational predicates and full-text predicates can be combined with boolean operators at will.

The only important thing to remark here is relational predicates are applicable to non-analyzed fields while full-text predicates can be applied to analyzed field only. How does indexing work, what is analysis and how do I turn it on/off for my fields? That’s the topic of a future post, so please be patient or start readinghttps://docs.jboss.org/hibernate/search/5.6/reference/en-US/html_single/#_analysis[ here].

Besides term queries we support several more:

  • Term                     description : "coffee"

  • Fuzzy                    description : "cofee"~2

  • Range                    amount : [40 to 90}`

  • Phrase                   description : "hello world"

  • Proximity                description : "canceling fee"~3

  • Wildcard                 description : "te?t"

  • Regexp                  description : /[mb]oat/

  • Boosting                 description : "beer"^3 and description :"books"

You can read all about them starting from here.

But is Ickle really new? Not really. The name is new, the full-text features are new, but a JP-QL-ish query string was always internally present in the Query objects produced by the QueryBuilder since the beginning of Remote Query. That language was never exposed and specified until now. It evolved significantly over time and now it is ready for you to use it. The QueryBuilder / criteria-like API is still there as a convenience but it might go out of favor over time. It will be limited to non-full-text functionality only. As Ickle grows we’ll probably not be able to include some of the additions in the QueryBuilder in a backward compatible manner. If growing will cause too much pain we might consider deprecating it in favor of Ickle or if there is serious demand for it we might continue to evolve the QueryBuilder in a non compatible manner.

Being a string based query language, Ickle is very convenient for our REST endpoint, the CLI, and the administration console allowing you to quickly inspect the contents of the grid. You’ll be able to use it there pretty soon. We’ll also continue to expand Ickle with more advanced full-text features like spatial queries and faceting, but that’s a subject for another major version. Until then, why not grab the current 9.0 Beta1 and test drive the new query language yourself? We’d love to hear your feedback on the forum, on our issue tracker or on IRC on the #infinispan channel on Freenode.

Happy coding!

Posted by Unknown on 2016-12-08
Tags: JP-QL Hibernate-Search jpa lucene full-text indexing language query DSL

Monday, 05 December 2016

Infinispan 9.0.0.Beta1 "Ruppaner"

image

It took us quite a bit to get here, but we’re finally ready to announce Infinispan 9.0.0.Beta1, which comes loaded with a ton of goodies.

Performance improvements

  • JGroups 4

  • A new algorithm for non-transactional writes (aka the Triangle) which reduces the number of RPCs required when performing writes 

  • A new faster internal marshaller which produced smaller payloads. 

  • A new asynchronous interceptor core

Off-Heap support

  • Avoid the size of the data in the caches affecting your GC times

CaffeineMap-based bounded data container

  • Superior performance

  • More reliable eviction

Ickle, Infinispan’s new query language

  • A limited yet powerful subset of JPQL

  • Supports full-text predicates

The Server Admin console now supports both Standalone and Domain modes

Pluggable marshallers for Kryo and ProtoStuff

The LevelDB cache store has been replaced with the better-maintained and faster RocksDB 

Spring Session support

Upgraded Spring to 4.3.4.RELEASE

We will be blogging about the above in detail over the coming weeks, including benchmarks and tutorials.

The following improvements were also present in our previous Alpha releases:

Graceful clustered shutdown / restart with persistent state

Support for streaming values over Hot Rod, useful when you are dealing with very large entries

Cloud and Containers

  • Out-of-the box support for Kubernetes discovery

Cache store improvements

  • The JDBC cache store now use transactions and upserts. Also the internal connection pool is now based on HikariCP

Also, our documentation has received a big overhaul and we believe it is vastly superior than before.

There will be one more Beta including further performance improvements as well as additional features, so stay tuned.

Infinispan 9 is codenamed "Ruppaner" in honor of the Konstanz brewery, since many of the improvements of this release have been brewed on the shores of the Bodensee !

Prost!

Posted by Tristan Tarrant on 2016-12-05
Tags: beta release marshalling off-heap performance query

Tuesday, 19 November 2013

Infinispan 6.0.0.Final is out!

Dear Infinispan community,

We’re pleased to announce the final release of Infinispan 6.0 "Infinium". As announced, this is the first Infinispan stable version to be released under the terms of Apache License v2.0.

This release brings some highly demanded features besides many stability enhancements and bug fixes:

  • Support for remote query. It is now possible for the HotRod clients to query an Infinispan grid using a new expressive query DSL. This querying functionality is built on top of Apache Lucene and Google Protobuf and lays the foundation for storing information and querying an Infinispan server in a language neutral manner. The Java HotRod client has already been enhanced to support this, the soon-to-be announced C++ HotRod client will also contain this functionality (initially for write/read, then full blown querying).

  • C HotRod client.  Allows C applications to read and write information from an Infinispan server. This is a fully fledged HotRod client that is topology (level 2) and consistent hash aware (level 3) and will be released in the following days. Some features (such as Remote Query and SSL support) will be developed during the next iteration so that it maintains feature parity with its Java counterpart.

  • Better persistence integration. We’ve revisited the entire cache loader API and we’re quite pleased with the result: the new Persistence API brought by Infinispan 6.0 supports parallel iteration of the stored entries, reduces the overall serialization overhead and also is aligned with the JSR-107 specification, which makes implementations more portable.

  • A more efficient FileCacheStore implementation. This file store is built with efficiency in mind: it outperforms the existing file store with up to 2 levels of magnitude. This comes at a cost though, as keys need to be kept  in memory. Thanks to Karsten Blees for contributing this!

  • Support for heterogeneous clusters. Up to this release, every member of the cluster owns an equal share of the cluster’s data. This doesn’t work well if one machine is more powerful than the other cluster participants. This functionality allows specifying the amount of data, compared with the average, held by a particular machine.

  • A new set of usage and performance statistics developed within the scope of the CloudTM projecthttps://issues.jboss.org/browse/ISPN-3234[].

  • JCache (JSR-107) implementation upgrade. First released in Infinispan 5.3.0, the standard caching support is now upgraded to version 1.0.0-PFD.

For a complete list of features included in this release please refer to the release notes.

The user documentation for this release has been revamped and migrated to the new website - we think it looks much better and hope you’ll like it too!

This release has spread over a period of 5 months: a sustained effort from the core development team, QE team and our growing community - a BIG thanks to everybody involved! Please visit our downloads section to find the latest release. Also if you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,

Adrian

Posted by Unknown on 2013-11-19
Tags: hotrod persistence jsr 107 jcache Protobuf remote query query

Monday, 09 September 2013

Infinispan 6.0.0.Alpha4 out with new CacheLoader/CacheWriter API!

Infinispan 6.0.0.Alpha4 is now with a few very important changes, particularly around cache stores. We’ve completely revamped the cache store/loader API to align it a bit better with JSR-107 (old CacheStore has become CacheWriter) and to simplify creation of new implementations. The new CacheLoader and CacheWriter should help implementors focus on the important operations and reduce the coding time. We’ve also created AdvancedCacheLoader and AdvancedCacheWriter in order to separate for bulk operations or purging for those implementations that wish optionally implement them. Expect a blog post from Mircea in the next few days providing many more details on this topic.

This new Infinispan version comes with other important goodies:

  • Rolling upgrades of a Infinsipan REST cluster

  • Support for Cache-Control headers for REST operations

  • Remote querying server modules and Hot Rod client update

  • REST and LevelDB stores added to Infinispan Server

  • KeyFilters can now be applied to Cache listeners

  • Allow Cache listener events to be invoked only on the primary data owner

For a complete list of features and fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release and if you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,

Galder

Posted by Galder Zamarreño on 2013-09-09
Tags: release leveldb listeners alpha rest cache store query

Wednesday, 23 September 2009

Infinispan Query breaks into 4.0.0.CR1

Hello all,

Querying is an important feature for Infinispan, so we’ve decided to include a technology preview of this for 4.0.0.CR1 and 4.0.0.GA, even though it is only really scheduled for Infinispan 4.1.0.

Browse to this wiki page to see how the new API works for querying, along with usage examples.#

Origins#

Some of the API has come from JBoss Cache Searchable but has been enhanced and runs slicker. A lot more work is being done under the hood so it makes it easier for users. For example, the API method on the QueryFactory.getBasicQuery() just needs two Strings and builds a basic Lucene Query instance, as opposed to forcing the user to create a Lucene query manually. This is still possible however, should a user want to create a more complex query.

The indexing for Lucene is now done through interceptors as opposed to listeners, and hence more tightly integrated into Infinispan’s core.

You can also choose how indexes are maintained. If indexes are shared (perhaps stored on a network mounted drive), then you only want nodes to index changes made locally. On the other hand, if each node maintains its own indexes (either in-memory on on a local filesystem) then you want each node to index changes made, regardless of where the changes are made. This behaviour is controlled by a system property - -Dinfinispan.query.indexLocalOnly=true. However, this is system property temporary and will be replaced with a proper configuration property once the feature is out of technology preview.

What’s coming up? Future releases of Hibernate Search and Infinispan will have improvements that will change the way that querying works. The QueryHelper class - as documented in the wiki - is temporary so that will eventually be removed, as you will not need to provide the class definitions of the types you wish to index upfront. We will be able to detect this on the fly (see HSEARCH-397)

There will be a better system for running distributed queries. And the system properties will disappear in favour of proper configuration attributes.

And also, GSoC student Lukasz Moren’s work involving an Infinispan-based Lucene Directory implementation will allow indexes to be shared cluster-wide by using Infinispan itself to distribute these indexes. All very clever stuff.

Thanks for reading!

Navin.

Posted by Unknown on 2009-09-23
Tags: jboss cache lucene hibernate hibernate search index query

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting testing tomcat transactions uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip

back to top