Friday, 21 August 2009

Distribution instead of Buddy Replication

People have often commented on Buddy Replication (from JBoss Cache) not being available in Infinispan, and have asked how Infinispan’s far superior distribution mode works. I’ve decided to write this article to discuss the main differences from a high level. For deeper technical details, please visit the Infinispan wiki.

Scalability versus high availability These two concepts are often at odds with one another, even though they are commonly lumped together. What is usually good for scalability isn’t always good for high availability, and vice versa. When it comes to clustering servers, high availability often means simply maintaining more copies, so that if nodes fail - and with commodity hardware, this is expected - state is not lost. An extreme case of this is replicated mode, available in both JBoss Cache and Infinispan, where each node is a clone of its neighbour. This provides very high availability, but unfortunately, this does not scale well. Assume you have 2GB per node. Discounting overhead, with replicated mode, you can only address 2GB of space, regardless of how large the cluster is. Even if you had 100 nodes - seemingly 200GB of space! - you’d still only be able to address 2GB since each node maintains a redundant copy. Further, since every node needs a copy, a lot of network traffic is generated as the cluster size grows.

Enter Buddy Replication Buddy Replication (BR) was originally devised as a solution to this scalability problem. BR does not replicate state to every other node in the cluster. Instead, it chooses a fixed number of 'backup' nodes and only replicates to these backups. The number of backups is configurable, but in general it means that the number of backups is fixed. BR improved scalability significantly and showed near-linear scalability with increasing cluster size. This means that as more nodes are added to a cluster, the space available grows linearly as does the available computing power if measured in transactions per second.

But Buddy Replication doesn’t help everybody! BR was specifically designed around the HTTP session caching use-case for the JBoss Application Server, and heavily optimised accordingly. As a result, session affinity is mandated, and applications that do not use session affinity can be prone to a lot of data gravitation and 'thrashing' - data is moved back and forth across a cluster as different nodes attempt to claim 'ownership' of state. Of course this is not a problem with JBoss AS and HTTP session caching - session affinity is recommended, available on most load balancer hardware and/or software, is taken for granted, and is a well-understood and employed paradigm for web-based applications.

So we had to get better Just solving the HTTP session caching use-case wasn’t enough. A well-performing data grid needs to to better, and crucially, session affinity cannot be taken for granted. And this was the primary reason for not porting BR to Infinispan. As such, Infinispan does not and will not support BR as it is too restrictive.

Distribution Distribution is a new cache mode in Infinispan. It is also the default clustered mode - as opposed to replication, which isn’t scalable. Distribution makes use of familiar concepts in data grids, such as consistent hashing, call proxying and local caching of remote lookups. What this leads to is a design that does scale well - fixed number of replicas for each cache entry, just like BR - but no requirement for session affinity.

What about co-locating state? Co-location of state - moving entries about as a single block - was automatic and implicit with BR. Since each node always picked a backup node for all its state, one could visualize all of the state on a given node as a single block. Thus, colocation was trivial and automatic: whatever you put in Node1 will always be together, even if Node1 eventually dies and the state is accessed on Node2. However, this meant that state cannot be evenly balanced across a cluster since the data blocks are very coarse grained. With distribution, colocation is not implicit. In part due to the use of consistent hashing to determine where each cached entry resides, and also in part due to the finer-grained cache structure of Infinispan - key/value pairs instead of a tree-structure - this leads to individual entries as the granularity of state blocks. This means nodes can be far better balanced across a cluster. However, it does mean that certain optimizations which rely on co-location - such as keeping related entries close together - is a little more tricky.

One approach to co-locate state would be to use containers as values. For example, put all entries that should be colocated together into a HashMap. Then store the HashMap in the cache. But that is coarse-grained and ugly as an approach, and will mean that the entire HashMap would need to be locked and serialized as a single atomic unit, which can be expensive if this map is large.

Another approach is to use Infinispan’s AtomicMap API. This powerful API lets you group entries together, so they will always be colocated, locked together, but replication will be much finer-grained, allowing only deltas to the map to be replicated. So that makes replication fast and performant, but it still means everything is locked as a single atomic unit. While this is necessary for certain applications, it isn’t always be desirable.

One more solution is to implement your own ConsistentHash algorithm - perhaps extending DefaultConsistentHash. This implementation would have knowledge of your object model, and hashes related instances such that they are located together in the hash space. By far the most complex mechanism, but if performance and co-location really is a hard requirement then you cannot get better than this approach.

In summary:

Buddy Replication

  • Near-linear scalability

  • Session affinity mandatory

  • Co-location automatic

  • Applicable to a specific set of use cases due to the session affinity requirement

Distribution

  • Near-linear scalability

  • No session affinity needed

  • Co-location requires special treatment, ranging in complexity based on performance and locking requirements. By default, no co-location is provided

  • Applicable to a far wider range of use cases, and hence the default highly scalable clustered mode in Infinispan

Hopefully this article has sufficiently interested you in distribution, and has whetted your appetite for more. I would recommend the Infinispan wiki which has a wealth of information including interactive tutorials and GUI demos, design documents and API documentation. And of course you can’t beat downloading Infinispan and trying it out, or grabbing the source code and looking through the implementation details.

Cheers Manik

Posted by Manik Surtani on 2009-08-21
Tags: buddy replication partitioning distribution

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip

back to top