Why doesn't Map.size return the size of the entire cluster?

Many people may have been surprised the first time they used Map.size method on a distributed Infinispan cluster.  As was later deduced only the local node size is returned.

Infinispan had taken this approach to limit the chance that instead of getting the full cluster size you would receive an OutOfMemoryError.  This seems fair to return the local answer only but you secretly always wanted the entire cluster size.

For the Infinispan 7.0.0.Final release forget what you know when using the Map interface with Infinispan.

==

Enter Distributed Entry Iterator

We already announced this feature a while back at http://blog.infinispan.org/2014/05/iterate-all-entries-in-cache.html.  You can check it out for more details but it is essentially a memory efficient way of retrieving all the entries in the cache by iterating over them.

This has opened the way to implementing the various bulk methods on the Map interface that we could never do efficiently in the past (ie. Map.size, Map.keySet, Map.entrySet & Map.values).

Map size

Okay I admit, size could have been done more efficiently before, but the answer would have contained a very high margin of error.  Now size will give you a size value with consistency semantics just slightly less than ConcurrentHashMap does, but for the entire cluster.  Warning should be given that size may be slower for larger clusters or ones with a lot of data in a stored cache loader.

The size method behavior can be controlled by using a supplied Flag such as SKIP_CACHE_LOAD to not count any configured cache loaders or CACHE_MODE_LOCAL if you want the local count only.  These flags are not exclusive and can be both passed if desired as well.

==

Map Collection Views

In the past the Map.values, Map.keySet & Map.entrySet methods were only ever in memory copies of the local data at the time they were invoked, similar to Map.size.

Now these collection views will be cluster wide and an additionally will show updated contents when the cache is updated and your writes to the collection will be reflected in the Cache itself!  The only operations you can’t do on these collections are adding values to either the keySet or values collections as they aren’t key/value pairs.

If your cache was configured with a Flag such as SKIP_CACHE_LOAD or CACHE_MODE_LOCAL it will also be reflected in the collection view for both reads and writes.

Some caution is advised when using toArray, retainAll, or size methods as they will require full iteration to complete.

KeySet Optimization

The key set collection also has an optimization so it will never pull down the values so it has a lower network and serialization/deserialization overhead (unlike entrySet and values).

Transactionality

All of the aforementioned methods still support transactions in a way that you would expect.  There is one guarantee we don’t provide and that is when using REPEATABLE_READ isolation.  We will not store entries read from an iterator in the transactional context as this could very easily run your local node out of memory with a large enough data set.

For reference methods that use an iterator internally are toArray, retainAll, isEmpty & size on the various collections as well as contains and containsAll on the values collection.

Other API Changes

These changes have also loosened some restrictions on other methods as well.

Map.isEmpty

This method before was only used locally as it used to calculate the size to determine if it was empty.  This method will now use the entry iterator and returns as soon as it finds that even a single value exists.  This is an important change as the old implementation would have to query any configured Cache Loader’s complete size before returning.

Map.containsValue

We never supported this method before (not even locally).  This method will now use the iterator though and if it finds the value at any point point will return immediately so it doesn’t have to iterate over the entire contents unless they don’t exist.  However if you really want to do this operation often you should really use Indexing to make this faster.

Code Examples

I could put an example here, but I think some could take it as insult.  You have already seen 100’s of examples as to how to use the Map interface and now in Infinispan you can use those in the exact same way and they will work just how you would expect them to.

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Unknown on 2014-11-04
Tags:
back to top