Iterate all the entries in the cache

Dear all, with the release of 7.0.0.Alpha4 it was mentioned that we now support Distributed Entry Iterator which allows for iteration over all entries in the cache.  Iterating over all the entries in the cache has always been an highly demanded community feature. Existing methods (entrySet, keySet, size) were not a good fit because of potential OOM and were causing a lot of user annoyance. Voila a nice distributed solution :-)  ISPN-4222

Public Interface Additions

The added public API changes are as follows:

=== AdvancedCache

This returns an EntryIterable that can be used directly as an Iterable over the contents or also to pass a converter to convert the resulting value that is returned to another value or even type itself.

=== EntryIterable

EntryIterable also implements AutoCloseable and as such should be closed after iteration or if an exception case occurs.  Thus the Java 7 try with resources syntax should be used.

Note that EntryIterable has a method that allows you to also provide an optional Converter to change the values to another type if desired. This conversion is done on the remote nodes and is preferable to be used when the values can be reduced in size to reduce overall payload size.

An example of how to perform the iteration with any cache type.

General Algorithm

Essentially when the iterator is generated it will start an iteration process on the local node to retrieve all values local to that node (including from loader) and also a remote thread that will do the same thing on nodes one at at time. As values are retrieved they are made available to the iterator for processing. The chunkSize configuration for the State Transfer configuration will limit how many values are available to be waiting to be iterated on at a time (loader, local and remotely retrieved values count towards this). This is important to limit how many values are stored in memory when both using a loader and in distributed caches to help prevent an OOM condition from occurring.

The provided KeyValueFilter is used on the various nodes to limit what entries are returned to the iterator and are sent to the remote node(s) when using a Distributed cache to limit how many results are returned. A converter is similar to the KeyValueFilter but it is ran on any entry that passes the filter to possibly converter the value to another such as a projection view if desired. Both the KeyValueFilter and Converter must be serializable for proper operation!

The operation is also aware of rehash events occurring, since this could alter which node owns what entry. This is handled automatically by the iterator by tracking what segments have moved and requesting them from the other node if needed.

Local, Replicated and Invalidation Cache Optimizations

These caches have some additional optimizations from above in the following

  1. The KeyValueFilter and Converter do not need to be Serializable

  2. KeyValueFilter optimization is only relevant when using a loader

  3. Converter optimization is minimial, the main benefit being it allows code to be the same between cache types


This is just to talk about some various cases that users should be aware of.

Transactional Behaviour

When using the entry iterator in a transactional context, all of the values are retrieved outside of the current transaction if there is one, and no transaction is started if there isn’t one.

This is done due to the behaviour of Repeatable Read isolation level.  If not then then all of the retrieved values would have to be stored locally in the current context for that transaction, which would most likely cause an OOM condition in many cases.

Removal using Iterator

Since the iteration process does not take part of transactions, the remove operation of the iterator is not supported as well.  If desired the user should just invoke the remove method from the Cache itself to do this.

Consistency Guarantees

This iterator only guarantees consistency in regards to each value independently. That is it will show a view of each value that existed during the period of when the iteration began and when it completed. Thus it is entirely possible to see a subset of values if say a transaction was committed at the same time as iteration. This would require additional isolation level changes outside of the scope of the iterator to implement this, such as adding Serializable isolation level.

Return type change

Before ISPN 7 is released, it is still needed to change the return type from Map.Entry to instead be CacheEntry as users may need the Metadata stored with the entry as well. This will come in ISPN-4326

Try it out

Let us know if and how you guys plan on using this and any feedback would be appreciated!

Update Oct 31, 2014

As of Infinispan 7.0.0.Final the Entry Iterator now properly supports transactional data and thus will show the most up to date value if there is a pending change (however read values are not brought into the context to prevent OOM errors).

The remove operation on the iterator is fully supported and will perform the operation in the current transactional context if there is one.  Caution though as you must use the iterator in the thread it was retrieved from for it to work properly.



JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Unknown on 2014-05-27
back to top