Thursday, 13 October 2016

OpenShift and Node Affinity

OpenShift (and Kubernetes) has a great feature called Affinity. In some scenarios, it might be beneficial to configure it along with Infinispan node affinity.

Before we start…​ this tutorial is heavily based on our previous blog post about deploying Infinispan on Openshift and OpenShift Scheduler functionality. It is highly recommended to read those articles before continuing this tutorial.

How does the Affinity feature work…​ in short?

In order to decide on which node given Pod should be running, OpenShift looks at so called Predicates and Priority Functions. A predicate must match the one configured in DeploymentConfiguration and Priority Function is responsible for choosing the best node for running Pods.

Let’s assume that we have a sample policy (similar to one provided in OpenShift manual), that uses site as a Predicate along with rack and machine as Priority Functions. Now let’s assume we have two nodes:

  • Node 1 - site=EU, rack=R1, machine=VM1

  • Node 2 - site=US, rack=R2, machine=VM2

And two DeploymentConfiguration with Node Selectors (this tells OpenShift on which nodes given DeploymentConfiguration wishes to run) defined as follows:

  • DeploymentConfiguration 1 - site=EU, rack=R5, machine=VM5

  • DeploymentConfiguration 2 - site=JAP, rack=R5, machine=VM5

With the above example only DeploymentConfiguration 1 will be scheduled (on Node 1), since site matches the predicate. In this case rack and machine parameters are not used (because we have only one node).

Note that the default OpenShift’s configuration uses region (as a Predicate) and zone (as a Priority Function). However it can be reconfigured very easily

==

And I need it because…​.

Some OpenShift deployments might span multiple racks in a data center or even multiple sites. It is important to tell Infinispan where physical machines are located, which will allow to choose better nodes for backing up your data (in distribution mode). 

As the matter of fact, Infinispan uses site, rack and machine. The main goal is to avoid backing up data on the same host.

===

The implementation

The implementation is pretty straightforward but there are several gotchas. 

The first one is that OpenShift uses regions and zones by default and Infinispan uses sites, racks and machines. The good news is that all those three are optional, so you have two options - reuse existing region and zone (map them to rack and site for example), or adjust OpenShift scheduler settings. In my example I used the former approach.

The second one is the need of hardcoding those parameters into DeploymentConfiguration. Unfortunately Node Selectors are not exposed through Downward API, so there’s no other way.

So let’s have a look at our DeploymentConfiguration:

  • Line 26 - Zone default used as a rack

  • Line 27 - Region primary used as a site

  • Lines 57 - 59 - Node Selector for scheduling Pods

Conclusion

Combining OpenShift Affinity Service and Infinispan Server Hinting allows to optimize data distribution across the cluster. Keep in mind that your configuration might be totally different (OpenShift Scheduler is a highly configurable thing). But once you understand how it works, you can adjust the hinting strategy for your needs. 

Happy Scaling!

Posted by Sebastian Łaskawiec on 2016-10-13
Tags: openshift affinity

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting testing tomcat transactions uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip

back to top