Fine-grained replication in Infinispan

[.underline]##

Sometimes we have a large object, possibly with lots of attributes or holding some binary data, and we would like to tell Infinispan to replicate only certain part of the object across the cluster. Typically, we wanna replicate only that part which we’ve just updated. This is where DeltaAware and Delta interfaces come to play. By providing implementations of these interfaces we can define fine-grained replication. When we put some effort into such such an enhancements, we would also like to speed up object marshalling and unmarshalling. Therefore, we’re going to define our own externalizers - to avoid slow default Java serialization.

The following code snippets are gathered in a complete example at https://github.com/mgencur/infinispan-examples/tree/master/partial-state-transfer This project contains a readme file with instructions on how to build and run the example. It is based on clustered-cache quickstart in Infinispan.

Implementing DeltaAware interface

So let’s look at our main object. For the purpose of this exercise, I defined a Bicycle class that consists of many components like frame, fork, rearShock, etc. This object is stored in a cache as a value under certain (not important) key. It might happen in our scenario that we update only certain components of the bike and in such case we want to replicate just those component changes.

Important methods here are (description taken from javadocs):

commit() - Indicates that all deltas collected to date has been extracted (via a                  call to delta()) and can be discarded. Often used as an optimization if                  the delta isn’t really needed, but the cleaning and resetting of                         internal state is desirable.

delta() - Extracts changes made to implementations, in an efficient format that              can easily and cheaply be serialized and deserialized.  This method will              only be called once for each changeset as it is assumed that any              implementation’s internal changelog is wiped and reset after generating              and submitting the delta to the caller.           We also need to define setters and getters for our members. Setter methods are, among other things, responsible for registering changes to the changelog that will be later used to reconstruct the object’s state. The externalizer for this class is only needed when cache stores are used. For the sake of simplicity, I don’t mention it here.

Implementing Delta interface

Actual object that will be replicated across the cluster is the implementation of Delta interface. Let’s look at the class. First, we need a field that will hold the changes - changeLog. Second, we need to define a merge() method. This method must be implemented so that Infinispan knows how to merge an existing object with incoming changes. The parameter of this method represents an object that is already stored in a cache, incoming changes will be applied to this object. We’re using a reflection here to apply the changes to the actual object but it is not necessary. We could easily call setter methods. The advantage of using reflection is that we can set those fields in a loop.

Another piece is a registerComponentChange() method. This is called by an object of the Bicycle class - to record changes to that object. The name of this method is not important.

Defining our own externalizer

Alright, so what remains is the externalizer definition for the Delta implementation. We implement AdvancedExternalizer interface and say that only changeLog object should be marshalled and unmarshalled when transfering data over the wire. A complete (almost) implementation of Delta interface is the following.

Tell Infinispan about the extra externalizer

We also need to configure Infinispan to use our special externalizer to marshall/unmarshall our objects. We can do it e.g. programatically by calling .addAdvancedExternalizer() on the serialization configuration builder.

You can see we’re also configuring transactions here. This is not necessary, though. We just aim to provide a richer example, removing transactional behavior is trully easy.

And here comes the "usage" part. Enclose cache calls by a transaction, retrieve a bicycle object from the cache, do some changes and commit them.

That’s it. What is eventually transferred over the wire is just the changeLog object. The actual bicycle object is reconstructed from incomming updates.

If all of this seem to be too complex to you, I have good news. Infinispan provides one implementation of DeltaAware interface whish is called AtomicHashMap (package org.infinispan.atomic). If this map is used as a value in key/value pairs stored in the cache, only puts/gets/removes performed to this map during a transaction are replicated to other nodes. Classes like Bicycle and BicycleDelta are not need then. Even registering the externalizer for AtomicHashMap is not needed, this is done automatically during registration of internal externalizers. However, one might want a class emulating a real-world object, not just a map. That’s the case when your own implementations of DeltaAware and Delta interfaces are the only way.

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Martin Genčúr on 2012-06-21
Tags: replication fine grained state transfer
back to top