Friday, 21 August 2009

Defining cache configurations via CacheManager in Beta1

Infinispan’s first beta release is just around the corner and in preparation, I’d like to introduce to the Infinispan users an important API change in org.infinispan.manager.CacheManager class that will be part of this beta release.

As a result of the development of the Infinispan second level cache provider for Hibernate, we have discovered that the CacheManager API for definition and retrieval of Configuration instances was a bit limited. So, for this coming release, the following method has been deleted:

void defineCache(String cacheName, Configuration configurationOverride)

And instead, the following two methods have been added:

Configuration defineConfiguration(String cacheName, Configuration configurationOverride);
Configuration defineConfiguration(String cacheName, String templateCacheName,
   Configuration configurationOverride);

The primary driver for this change has been the development of the Infinispan cache provider, where we wanted to enable users to configure or override most commonly modified Infinispan parameters via hibernate configuration file. This would avoid users having to modify different files for the most commonly modified parameters, hence improving usability of the Infinispan cache provider. However, to be able to implement this, we needed CacheManager’s API to be enhanced so that:

  • Existing defined cache configurations could be overriden. This enables use cases like this: Sample Infinispan cache provider configuration will contain a generic cache definition to be used for entities. Via hibernate configuration file, users could redefine the maximum number of entries to be allowed before eviction kicks in for all entities. The code would look something like this:

// Assume that 'cache-provider-configs.xml' contains
// a named cache for entities called 'entity'
CacheManager cacheManager = new DefaultCacheManager(
   "/home/me/infinispan/cache-provider-configs.xml");
Configuration overridingConfiguration = new Configuration();
overridingConfiguration.setEvictionMaxEntries(20000); // max entries to 20.000
// Override existing 'entity' configuration so that eviction max entries are 20.000.
cacheManager.defineConfiguration("entity", overridingConfiguration);
  • Be able to define new cache configurations based on the configuration of a given cache instance, optionally applying some overrides. This enables uses cases like the following: A user wants to define eviction wake up interval for a specific entity which is different to the wake up interval used for the rest of entities.

// Assume that 'cache-provider-configs.xml' contains
// a named cache for entities called 'entity'
CacheManager cacheManager = new DefaultCacheManager(
   "/home/me/infinispan/cache-provider-configs.xml");
Configuration overridingConfiguration = new Configuration();
// set wake up interval to 240 seconds
overridingConfiguration.setEvictionWakeUpInterval(240000L);
// Create a new cache configuration for com.acme.Person entity
// based on 'entity' configuration, overriding the wake up interval to be 240 seconds
cacheManager.defineConfiguration("com.acme.Person", "entity", overridingConfiguration);

Another limitation of the previous API, which we’ve solved with this API change, is that in the past the only way to get a cache’s Configuration required the cache to be started because the only way to get the Configuration instance was from the Cache API. However, with this API change, we can now retrieve a cache’s Configuration instance via the CacheManager API. Example:

// Assume that 'cache-provider-configs.xml' contains
// a named cache for entities called 'entity'
CacheManager cacheManager = new DefaultCacheManager(
   "/home/me/infinispan/cache-provider-configs.xml");
// Pass a brand new Configuration instance without overrides
// and it will return the given cache name's Configuration
Configuration entityConfiguration = cacheManager.defineConfiguration("entity",
new Configuration());

If you would like to provide any feedback to this post, either respond to this blog entry or go to Infinispan’s user forums.

Posted by Galder Zamarreño on 2009-08-21
Tags: configuration

Wednesday, 12 August 2009

Coalesced Asynchronous Cache Store

As we prepare for Infinispan’s beta release, let me introduce to you one of the recent enhancements implemented which improves the way the current asynchronous (or write-behind) cache store works.

Right until now, the asynchronous cache store simply queued modifications, while a set of threads would apply them. However, if the queue contained N put operations on the same key, these threads would apply each and every modification one after the other, which is not very efficient.

Thanks to the excellent feedback from the Infinispan community, we’ve now improved the asynchronous cache store so that it coalesces changes and only applies the latest modification on a key. So, if N put operations on the same key are queued, only the last modification will be applied to the cache store.

Internally, the asynchronous concurrent queueing mechanism used performs in O(1) by keeping an map with the latest values for each key. So, this maps acts like the queue but there’s a not a need for a queue as such, we only care about making sure the latest values are stored hence, order is not important.

Note that the way threads apply these modifications is that they start working as soon as there are any changes available and so to see these changes coalesced, the system needs to be relatively busy or a lot of changes on the same key need to happen in a relatively short period of time. We could have made these threads work periodically, i.e. every X seconds, but by doing that, we would be letting modifications pile up and the time between operations and the cache store updates would go up, hence increasing the chance that the cache store is outdated.

Finally, there’s no configuration modifications required to get the asynchronous cache store to work in the coalesced way, it just works like this out-of-the-box. Example:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:infinispan:config:4.0">
  <namedCache name="persistentCache">
     <loaders passivation="false" shared="false" preload="true">
        <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
           <properties>
              <property name="location" value="/tmp"/>
           </properties>
           <async enabled="true" threadPoolSize="10"/>
        </loader>
     </loaders>
  </namedCache>
</infinispan>
Posted by Galder Zamarreño on 2009-08-12
Tags: cache stores asynchronous

Thursday, 30 July 2009

More JUG talks

Adrian Cole and I recently presented at a JUGs in Krakow and Dublin, on Infinispan and JClouds.

Krakow was great - and hot: 39 degree weather! (Don’t we just love global warming?) Anna Kolodziejczyk at Proidea (who also organises JDD) organised the event, which attracted an excited and interactive audience of about 40 people. Dublin, beautiful as always, was a polar opposite in terms of weather: a cloudy, windy and wet day, living up to its reputation. Luan O’Carroll of DUBJUG organised the event, at the plush Odessa Club. Again, an inquisitive audience of about 35 people attended, with a lot of questions on the future of data storage in clouds. Thomas Diesler of JBoss OSGi fame made a surprise guest appearance too.

In general, the talks have been very well received and have provoked thought and discussion. As requested by many, I will soon be recording a podcast of this talk and will make it available on this blog.

Apart from a JUG I hope to organise soon in London, the next time I speak about Infinispan publicly will be at JBoss World in September. Hope to see you there!

Cheers Manik

Posted by Manik Surtani on 2009-07-30
Tags: JUGs

Monday, 27 July 2009

Increase transactional throughput with deadlock detection

Deadlock detection is a new feature in Infinispan. It is about increasing the number of transactions that can be concurrently processed. Let’s start with the problem first (the deadlock) then discuss some design details and performance.

So, the by-the-book deadlock example is the following:

  • Transaction one (T1) performs following operation sequence: (write key_1,write key_2)

  • Transaction two (T2) performs following sequence: (write key_2, write key_1).

Now, if the T1 and T2 happen at the same time and both have executed first operation, then they will wait for each other virtually forever to release owned locks on keys. In the real world, the waiting period is defined by a lock acquisition timeout (LAT) - which defaults to 10 seconds - that allows the system to overcome such scenarios and respond to the user one way (successful) or the other(failure): so after a period of LAT one (or both) transaction will rollback, allowing the other to continue working.

Deadlocks are bad for both system’s throughput and user experience. System throughput is affected because during the deadlock period (which might extend up to LAT) no other thread will be able to update neither key_1 nor key_2. Even worse, access to any other keys that were modified by T1 or T2 will be similarly restricted. User experience is altered by the fact that the call(s) will freeze for the entire deadlock period, and also there’s a chance that both T1 and T2 will rollback by timing out.

As a side note, in the previous example, if the code running the transactions would(and can) enforce any sort of ordering on the keys accessed within the transaction, then the deadlock would be avoided. E.g. if the application code would order the operation based on the lexicographic ordering of keys, both T1 and T2 would execute the following sequence: (write key_1,write key_2), and so no deadlock would result. This is a best practice and should be followed whenever possible. Enough with the theory! The way Infinispan performs deadlock detection is based on an algorithm designed by Jason Greene and Manik Surtani, which is detailed here. The basic idea is to split the LAT in smaller cycles, as it follows:

lock(int lockAcquisitionTimeout) {
while (currentTime < startTime + timeout) {
 if (acquire(smallTimeout)) break;
 testForDeadlock(globalTransaction, key);
}
}

What testForDeadlock(globalTransaction, key) does is check weather there is another transaction that satisfies both conditions:

  1. holds a lock on key and

  2. intends to lock on a key that is currently called by this transaction.

If such a transaction is found then this is a deadlock, and one of the running transactions will be interrupted: the decision of which transaction will interrupt is based on coin toss, a random number that is associated with each transaction. This will ensure that only one transaction will rollback, and the decision is deterministic: nodes and transactions do not need to communicate with each other to determine the outcome.

Deadlock detection in Infinispan works in two flavors: determining deadlocks on transactions that spread over several caches and deadlock detection in transactions running on a single(local) cache.

Let’s see some performance figures as well. A class for benchmarking performance of deadlock detection functionality was created and can be seen here. Test description (from javadoc):

We use a fixed size pool of keys (KEY_POOL_SIZE) on which each transaction operates. A number of threads (THREAD_COUNT) repeatedly starts transactions and tries to acquire locks on a random subset of this pool, by executing put operations on each key. If all locks were successfully acquired then the tx tries to commit: only if it succeeds this tx is counted as successful. The number of elements in this subset is the transaction size (TX_SIZE). The greater transaction size is, the higher chance for deadlock situation to occur. On each thread these transactions are being repeatedly executed (each time on a different, random key set) for a given time interval (BENCHMARK_DURATION). At the end, the number of successful transactions from each thread is cumulated, and this defines throughput (successful tx) per time unit (by default one minute).

Disclaimer: The following figures are for a scenario especially designed to force very high contention. This is not typical, and you shouldn’t expect to see this level of increase in performance for applications with lower contention (which most likely is the case). Please feel free tune the above benchmark class to fit the contention level of your application; sharing your experience would be very useful!

Following diagram shows the performance degradation resulting from running the deadlock detection code by itslef in a scenario where no contention/deadlocks are present. imagehttp://2.bp.blogspot.com/_ISQfVF8ALAQ/Sm2h_re8qKI/AAAAAAAABqA/bsNgEyCkcYw/s1600-h/DLD_replicated.JPG[image]image Some clues on when to enable deadlock detection. A high number of transaction rolling back due to org.infinispan.util.concurrent.TimeoutException is an indicator that this functionality might help. TimeoutException might be caused by other causes as well, but deadlocks will always result in this exception being thrown. Generally, when you have a high contention on a set of keys, deadlock detection may help. But the best way is not to guess the performance improvement but to benchmark and monitor it: you can have access to statistics (e.g. number of deadlocks detected) through JMX, as it is exposed via the DeadlockDetectingLockManager MBean.

Posted by Mircea Markus on 2009-07-27
Tags: transactions benchmarks deadlock detection concurrency

Monday, 20 July 2009

Berlin and Stuttgart say hello to Infinispan

image Last week I finally put together my presentation on cloud computing and Infinispan. To kick things off, I presented it at two JUG events in Germany.

Berlin’s Brandenburg JUG organised an event at the NewThinking Store in Berlin’s trendy Mitte district. Thanks to Tobias Hartwig and Ralph Bergmann for organising the event, which drew an audience of about 35 people. Cloud computing was the focus of the evening, and I started the event with my rather lengthy presentation on cloud computing and specific issues around persisting data in a cloud. The bulk of the presentation focused on Infinispan, what it provides as a data grid platform, and what’s on the roadmap. After a demo and a short break, Infinispan committer Adrian Cole then spoke about JClouds, demonstrating Infinispan’s use of JClouds to back cached state onto Amazon’s S3. You can read more about Adrian’s presentation on his blog.

Two days later, the Stuttgart JUG arranged for me to speak to their JBoss Special Interest Group on Infinispan. Thanks to Tobias Frech and Heiko Rupp for organising this event, which was held in one of Red Hat’s training rooms in Stuttgart. The presentation followed a similar pattern to what was presented in Berlin, to an audience of about 15 people.

In both cases, there was an overwhelming interest in Infinispan as a distributed storage engine. The JPA interface which is on our roadmap generated a lot of interest, as did the query API and to a lesser extent the asynchronous API - which could benefit from a better example in my presentation to demonstrate why this really is a powerful thing.

Overall, it is good to see that folks are interested in and are aware of the challenges involved in data storage on clouds, where traditional database usage is less relevant.

Many people have asked me for downloadable versions of my slides. Rest assured I will put them up - either as PDFs or better still, as a podcast - over the next 2 weeks.

Coming up, I will be in Krakow speaking at their JUG on Thursday the 23rd, and then in Dublin on Tuesday the 29th. Details of these two events are on the Infinispan Talks Calendar. Hope to see you there!

Cheers Manik

Posted by Manik Surtani on 2009-07-20
Tags: JUGs jclouds

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting testing tomcat transactions uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip

back to top