Data Container Changes Part 2

Before the end of the year I wrote a blog post detailing some of the more recent changes that Infinispan introduced with the in memory data container.  As was mentioned in the previous post we would be detailing some other new changes. If you poked around in our new schema after Beta 1 you may have spoiled the surprise for yourself.

With the upcoming 9.0 Beta 2, I am excited to announce that Infinispan will have support for entries being stored off heap, as in outside of the JVM heap. This has some interesting benefits and drawbacks, but we hope you can agree the benefits in many cases far outweigh the drawbacks. But before we get into that lets first see how you can configure your cache to utilize off heap.

New Configuration

The off heap configuration is another option under the new memory element that was discussed in the previous post. It is used in the same way that either OBJECT or BINARY is used.  You can use either COUNT or MEMORY eviciton types, the example below shows the latter.

XML

DECLARATIVE

As you can see the configuration is almost identical to the other types of storage. The only real difference is the new address pointer argument, which will be explained below.

Requirements

Our off heap implementation supports all existing features of Infinispan. There are some limitations and drawbacks of using the feature. This section will describe these in further detail.

Serialization

Off Heap runs in essentially BINARY mode, which requires entries to be serialized into their byte[] forms. Thus all keys and entries must be Serializable or have provided Infinispan Externalizers.

Size

Currently a key and a value must be able to be stored in a byte[]. Therefore a key or value in serialized form cannot be more than just over 2 Gigabytes.  This could be enhanced possibly at a later point, if the need arose.  I hope you aren’t transferring this over your network though!

Implementation Details 

Our off heap implementation uses the Java Unsafe to allocate memory outside of the Java heap. This data is stored as a bucket of linked list pointers, just like a standard Java HashMap. When an entry is added the key’s serialized byte[] is hashed and an appropriate offset is found in the bucket. Then the entry is added to the bucket as the first element or if an entry(ies) is present it is added to the rear of the linked list.

All of this data is protected by an array of ReadWriteLock instances.  The number of address pointers is evenly divisible by the number of lock instances.  The number of lock instances is how many cores your machines doubled and rounded to the nearest power of two.  Thus each lock protects an equivalent amount of address spaces.  This provides for good lock granularity and reads will not block each other but unfortunately writes will wait and block all reads.

If you are using a bounded off heap container either by count or memory this will create a backing LRU doubly linked list to keep track of which elements were accessed most recently and removes the least recently accessed element when there are too many present in the cache.

 

Memory Overhead

As with all cache implementations there is overhead required to store these entries. We have a fixed and variable overhead which scales with the amount of entries. I will detail these and briefly mention what they are used for.

Fixed overhead

As was mentioned there is a new address count parameter when configuring off heap. This value is used to determine how many linked list pointers are available. Normally you want to have more node pointers than you have entries in the cache, since then chances are you have one element in each linked list.  This is very similar to the int argument constructor for HashMap.  It is also rounded up to the nearest power of two.  The big difference being that this off heap implementation will not resize.  Thus your read/write times will be slower if you have a lot of collisions. The overhead of a pointer is 8 bytes, so for approximately one million pointers it will be 8 Megabytes of off heap.

Bounded off heap requires very little fixed memory, just 32 bytes for head/tail pointers and a counter and an additional Java lock object.

Variable overhead

Unfortunately to store your entries we may need to wrap them with some data. Thus for every entry you add to the cache we store an additional 25 bytes for each entry.  This data is used for header information and also our linked list forward pointer.

Bounded off heap requires additional housekeeping for its LRU list nodes.  Thus each entry adds an additional 36 bytes above the number above. It is larger due to requiring a doubly linked list and having to have pointers to and from the entry and eviction node.

Performance

The off heap container was designed with the intent that key lookups are quite fast. In general these should be about the same performance. However local reads and stream operations can be a little slower as there is an additional deserialization phase required.

Summary

We hope you all try out our new off heap feature! Please make sure to contact us if you have any feedback, find any bugs or have any questions!  You can get in contact with us on our forum, issue tracker, or directly on IRC freenode channel Infinispan

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Unknown on 2017-01-23
Tags:
back to top