Faster file cache store (no extra dependencies!) in 6.0.0.Alpha1

As announced yesterday by Adrian, the brand new Infinispan 6.0.0.Alpha1 release contains a new file-based cache store which needs no extra dependencies. This is essentially a replacement of the existing FileCacheStore which didn’t perform as expected, and caused major issues due to the number of files it created.

The new cache store, contributed by a Karsten Blees (who also contributed an improved asynchronous cache store), is called SingleFileCacheStore and it keeps all data in a single file. The way it looks up data is by keeping an in-memory index of keys and the positions of their values in this file. This design outperforms the existing FileCacheStore and even LevelDB based JNI cache store.

The classic case for a file based cache store is when you want to have a cache with a cache store available locally which stores data that has overflowed from memory, having exceeded size and/or time restrictions. We ran some performance tests to verify how fast different cache store implementations could deal with reading and writing overflowed data, and these are the results we got (in Ks):

  • FileCacheStore: 0.75k reads/s, 0.285k writes/s

  • LevelDB-JNI impl: 46k reads/s, 15.2k writes/s

  • SingleFileCacheStore: 458k reads/s, 137k writes/s

The difference is quite astonishing but as already hinted, this performance increase comes at a cost. Having to maintain an index of keys and positions in the file in memory has a cost in terms of extra memory required, and potential impact on GC. That’s why the SingleFileCacheStore is not recommended for use cases where the keys are too big.

In order to help tame this memory consumption issues, the size of the cache store can be optionally limited, providing a maximum number of entries to store in it. However, setting this parameter will only work in use cases where Infinispan is used as a cache. When used as a cache, data not present in Infinispan can be recomputed or re-retrieved from the authoritative data store and stored in Infinispan cache. The reason for this limitation is because once the maximum number of entries is reached, older data in the cache store is removed, so if Infinispan was used as an authoritative data store, it would lead to data loss which is not good.

Existing FileCacheStore users might wonder: what is it gonna happen to the existing FileCacheStore? We’re not 100% sure yet what we’re going to do with it, but we’re looking into some ways to migrate data from the FileCacheStore to the SingleFileCacheStore. Some interesting ideas have already been submitted which we’ll investigate in next Infinispan 6.0 pre-releases.

So, if you’re a FileCacheStore user, give the new SingleFileCacheStore a go and let us know how it goes! Switching from one to the other is easy :)

Cheers,

Galder

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Galder Zamarreño on 2013-07-18
Tags: persistence alpha performance
back to top