Thursday, 18 April 2013
Infinispan team is coming to São Paulo (Brazil) to present on multiple topics around caching, data grids and NoSQL in Brazil’s first ever JBoss Users & Developers Conference (JUDCon). The event is being held over two days, on 19th and 20th of April.
On the 19th, Manik Surtani and Pete Muir will present on how to supercharge web applications using JBoss Data Grid. Expect a very lively presentation from these very seasoned presenters :)
Another presentation where you’ll be able to see Infinispan in action is in Shekhar Gulati’s "Closed PaaS to Open PaaS : Migrate GAE Applications to OpenShift Using CapeDwarf" and Randall Hauch’s "Elastic Consistent NoSQL Data Storage with ModeShape 3" talk, both on the 19th of April, where Shekhar and Randal will demonstrate JBoss projects using Infinispan heavily.
On 20th of April, I’ll be speaking about scaling up Hibernate/JPA applications with Infinispan second-level cache. Even if the Infinispan caching provider was created almost 4 years ago, this is the first time I’m presenting about it. Really looking forward to that.
Finally, I’ll also give the first ever presentation on Infinispan JCache (JSR-107) API implementation, which will be mostly a live coding session showing different bits about JCache API and the extra capabilities JCache users get from using Infinispan implementation.
Tags: conference data grids event judcon hibernate
Friday, 04 January 2013
Happy new year, everyone.
One of my goals for 2013 is to push JSR 347 into action again. To kick start this, I propose a meeting among expert group members - anyone else with an interest in the JSR is welcome to attend as well.
Details are in my post to the mailing list. Please respond to the mail list if you are interested in participating.
Tags: jcp data grids jsr 347 standards
Wednesday, 05 September 2012
Infinispan Arquillian Container is an extension to Arquillian that provides several ways to interact with Infinispan, either with a standalone Infinispan server or just with Infinispan libraries. This extension can also communicate with JBoss Data Grid server via JMX.
It was released as Maven artifacts in JBoss Maven Repository. It is located at http://repository.jboss.org/nexus/content/groups/public-jboss/ . More information on how to set up and use the repo can be found at https://community.jboss.org/wiki/MavenGettingStarted-Users
What does this Arquillian extension offer to you? Let me describe all aspects of this extension one by one.
When testing, you might want to automatically start the Infinispan server before the test and stop it afterwards. This can be achieved by configuring infinispan-arquillian-container via Arquillian’s configuration file. The following is a subset of attributes that can be specified and thus passed to the Infinispan server during startup: masterThreads, workerThreads, cacheConfig, jmxPort, … The complete list can be found in bit.ly/R7j4d1 (all private fields).
|Examples are not a part of the release, only libraries are. In order to check out examples provided with the project, one has to clone project’s repository: https://github.com/mgencur/infinispan-arquillian-container Examples are located in the respective sub-directory.|
The configuration file then looks similar to the following:
Whether these two Infinispan servers are clustered or not depends on the configuration passed to them via cacheConfig (file path) attribute or their default configuration (when no config. file is passed). The configuration in arquillian.xml file just says: "Start these two instances with whatever configuration is passed to them".
Complete example: bit.ly/RkrpEE
When we tell Arquillian to work with Infinispan server, we can inject RemoteInfinispanServer object into our test. Such an object provides various information about the running Infinispan server. For example, we can retrieve a hostname and HotRod port and use these pieces of information to create a RemoteCacheManager instance. Besides that users are allowed to retrieve information available via JMX from the server like cluster size, number of entries in the cache, number of cache hits and many more.
Complete example: http://bit.ly/OaCw8q
Vital dependencies required for the test to run are:
Not only with standalone Infinispan server can Infinispan Arquillian extension work.
This time, the properties in Arquillian’s configuration file are different and correspond to properties of JBoss Application Server 7. The most important property is again the path to the server (jbossHome).
Are you interested in what the test looks like? It looks completely the same as tests for standalone Infinispan server, you just have a few more attributes available. JDG server usually starts all three endpoints (HotRod, Memcached, REST) at the same time while for the Infinispan server you have to specify which end point should be started. Furthermore, Infinispan server does not have the REST endpoint available out-of-the-box.
As a result, you can call the following methods with JDG in one single test.
server1.getMemcachedEndpoint().getPort(); server1.getRESTEndpoint().getContextPath(); server1.getHotRodEndpoint().getPort();
The difference is, of course in dependencies. Instead of a handler for standalone Infinispan server, one has to use a handler for JBoss AS 7. The dependencies then look like this:
Sometimes we don’t want to use a standalone server. Sometimes we want to test just Infinispan in its basic form - Java libraries. Infinispan has been under development for years and during that time, lots of tests were developed. With tests come utility methods. Infinispan Arquillian Container enables you to leverage these utility methods and call them via an instance of DatagridManager. This instance can be easily injected into a test, no matter which test framework (TestNG, JUnit) you use.
DatagridManager class can be found at http://bit.ly/Q0a7ki
Can you see the advantage? No? Let me point out some useful methods available in the manager.
List<Cache<K, V>> createClusteredCaches(int numMembersInCluster, String cacheName, ConfigurationBuilder builder)
creates a cluster of caches with certain name and pre-defined configuration
helps to wait until the cluster is up and running
Cache<A, B> cache(int index)
retrieves a cache from certain node according to the index
Cache<A, B> cache(int managerIndex, String cacheName)
retrieves a cache with that name
void killMember(int cacheIndex)
kills a cache with cacheIndex index
AdvancedCache advancedCache(int i)
retrieves an advanced cache from node i
Trancation tx(int i)
retrieves a transaction from node i
TransactionManager tm(int i)
retrieves a transaction manager from node i
…and much more.
The following test can be found among other examples in the GIT repository.
org.infinispan:infinispan-core:jar:5.1.5.FINAL:test - users should replace this version with the one they want to test org.infinispan.arquillian.container:infinispan-arquillian-impl:jar:1.0.0.CR1:test
Infinispan Arquillian Container was tested with Infinispan 5.1.5.FINAL and JDG 6.0.0.GA. Nevertheless, it should work smoothly also with other not-very-distinct versions. I’ll be updating the project to work with newer versions of both Infinispan and JBoss Data Grid.
Tags: testing data grids
Thursday, 14 April 2011
PCWorld has published an article on the recent data grid JSR that I have submitted. As a follow-up to PCWorld’s article, I would like to make a few comments to clarify a few things.
I don’t quite understand what is meant by Red Hat’s approach not being the best solution. Do people take issue with having a standard in the first place? Or is it the standards body used in this particular case (the JCP)? If it is the details of the standard itself, one should keep in mind that this has yet to be defined by an expert group!
It is unfortunate that the "others" mentioned in the article - who feel that Red Hat’s approach is not the best - were not able to provide any details about their objections. I would love to hear these objections and make sure that the JSR addresses them.
The importance of a standard, to remove vendor lock-in, etc., is pretty well understood, so I won’t go into too much detail here. But with that in mind, I find Pandey’s comment regarding a "self-beneficial move" an odd one. A standard makes it easier for people to switch between products (which may explain why no one else may have stepped up to the plate to propose such a standard thus far). Proposing a standard makes it easier for end-users to move away from Infinispan. Yes, it may help with awareness of Infinispan, but it also means Red Hat, just like other data grid vendors, will need to work really hard to make sure their products are up to scratch. The only real beneficiary here is the end-user. In fact, I’d like to invite Terracotta to participate in this JSR, as participation can only make it stronger, more relevant and eventually even more useful to end-users.
With regards to JSR-107, I believe Pandey has misunderstood the intention in proposing a data grid JSR. I have proposed extending and building on top of JSR-107 - not throwing it away - and I have expressed this the JSR-107 expert group mailing list, of which Terracotta’s Greg Luck is a member. In fact, without Pandey’s actually seeing my data grid proposal blog post - PCWorld’s article was written before I published details of the JSR submission, based on a high-level Red Hat press release - one has to wonder where such strong words come from! :-)
Tags: jcp data grids jsr 107 standards
Thursday, 14 April 2011
Following up on my previous response to Antonio Goncalves' blog post, I have submitted a JSR to the JCP on a data grid standard, titled "Java Data Grids". It has yet to be assigned a number by the JCP, but I thought I’d talk about it a little here anyway.
Here is the description of the JSR that I have submitted:
This specification proposes to provide an API for accessing, storing, and managing data in a distributed data grid. The primary API will build upon and extend JSR-107 (JCACHE) API. In addition to it’s genericized Map-like API to access a Cache, JSR-107 defines SPIs for spooling in-memory data to persistent storage, an API for obtaining a named Cache from a CacheManager and an API to register event listeners. Above and beyond JSR-107, this JSR will define characteristics and expectations from eviction, replication and distribution, and transactions (via the JTA specification). Further, it would define an asynchronous, non-blocking API as an alternative to JSR-107’s primary API, as non-blocking access to data becomes a concern when an implementation needs to perform remote calls, as in the case of a data grid. This specification builds upon JSR-107, which is not yet complete. We intend to work with the JSR-107 EG to ensure that their schedule is compatible with the schedule for this JSR. If JSR-107 is unable to complete, we propose merging the last available draft into this specification.
Data grids are gaining prominence and importance in enterprise Java, particularly as cloud-style deployments gain popularity:
Characteristics such as high availability, along with removal of single points of failure become increasingly important, since cloud infrastructure is inherently unreliable and can be re-provisioned with minimal notice; applications deployed on cloud need to be resilient to this.
Further, one of the major benefits of cloud-style deployments is elasticity. The ability to scale out (and back in) quickly and easily. Again, data grids have a role to play here.
Finally, with scalable middleware comes additional stress on the data tier (traditionally an RDBMS), as middleware nodes scale out to cope with load. Data grids - used as a distributed cache - can help with mitigating database bottlenecks.
With one of Java EE 7’s stated goals being "cloud-friendliness", the above are powerful arguments for the inclusion of a distributed data grid standard in Java EE 7.
What about JSR-107? JSR-107 - the temporary caching API proposed in 2001 - certainly has a role to play in Java EE too. Temporary caches are an important part of enterprise middleware, but yet a standard has been sadly missing from a Java EE umbrella specification for far too long. Spring, having identified the need as well, has a temporary caching abstraction in their current development versions. Several other non-Java frameworks define temporary caching APIs too (Ruby on Rails, Django for Python, .NET). There is no denying JSR-107 is necessary, and necessary as a part of Java EE.
But JSR-107 isn’t a data grid. JSR-107 falls short as a standard for data grids, specifically as it doesn’t take into account characteristics of distribution and replication of data, and doesn’t define a contract that implementations would have to adhere to when it comes to moving data around a cluster. Crucial things for a data grid that, if not baked into a specification, will hinder portability and render the standard itself useless and impotent.
Further, with remote capabilities in mind, a data grid should also expose a non-blocking API, since network calls can be a limiting factor. Invoking methods that involve remote calls should be able to be done in an asynchronous fashion. Stuff that is irrelevant to a temporary caching API like JSR-107.
So with all that in mind, I’d love to hear your thoughts on the data grid JSR. In addition to Red Hat, the JSR is currently backed by a major Java EE and data grid vendor which cannot be named at this stage, along with independent JCP members with relevant interest and background.
Tags: jcp data grids jsr 107 standards
Tuesday, 15 February 2011
In response to Antonio Goncalves' blog post on his wish list for Java EE 7 and particularly on his comments around the inactive JSR-107 JCACHE spec, I’d like to spend a few moments jotting down my thoughts on the subject.
To start with, I am on the JSR-107 expert group, representing Red Hat. I have also been in recent discussions with the JCP about the inactive JSR and what can be done about it.
My feel is JSR-107 needs to be axed. It’s been inactive for way too long, it is out of date, and the community is pretty jaded about it. We do, however, need a JSR around distributed caches and in-memory data grids. There is definitely a need in the Java EE 7 umbrella specification, particularly with increasing focus and alignment with cloud. Apps designed to scale would almost certainly need a distributed, in-memory data grid. If Java EE is to be the preferred platform to build Software-as-a-Service offerings, scalability is crucial.
So what should this data grid JSR look like? Well, let’s start with JSR-107. After all, I didn’t think there was anything wrong with JSR-107, just that it was too limiting/simplistic.
What’s in JSR-107? A quick summary:
Primary interface - javax.cache.Cache - extending j.u.c.ConcurrentMap
Adds ability to register, de-register and list event listeners
Defines a CacheLoader interface for loading/storing cached data
Defines an #evict(K) #method, as well as the support for different eviction algorithms
Defines a ServiceLocator approach to loading the appropriate implementation at runtime
Defines a CacheManager interface to construct and retrieve Cache instances
What JSR-107 does not cover - but should be included in a Data Grid JSR Over and above what JSR-107 proposed, I believe the following features are crucial to a useful data grid standard:
JTA interoperability. The ability to participate in transactions is necessary, both as an XA resource and as a simple cache to front a RDBMS, via JPA
Define behaviour at certain stages of a tx’s lifecycle, particularly with regards to recovery
Should play nice with JPA’s second level cache SPI
Define and mandate REPLICATION and DISTRIBUTION, as well as SYNCHRONOUS and ASYNCHRONOUS versions of network communications
These could be useful in the JSR, but needs more thought and discussion
An asynchronous, Future-based API (See Infinispan’s Async API)
XML-based config file standardisation (including an XSD)
Standardise programmatic config bean interfaces
Further interesting thoughts
These additional, NoSQL-like features would also be very interesting, but probably more sense in a later revision of this JSR - both for the sake of manageability as well as to allow more community adoption/feedback on such APIs.
I’d like to hear your thoughts and opinions around this - please comment away!
Tags: jcp data grids
Wednesday, 27 October 2010
Last month, at the JavaOne conference in San Francisco, I spoke about data grids. A BOF session on on cloud-ready data stores using data grids, and a conference session on measuring performance and benchmarking data grids. But in addition to the official JavaOne talks, I also did two short, 20-minute mini-sessions at the Red Hat booth at the JavaOne pavillion, titled Data-as-a-Service using Infinispan. The good folks at the Red Hat booth even recorded it and put it online on Vimeo, where it is accessible on-demand.
Tags: data grids data-as-a-service daas JavaOne
Friday, 10 July 2009
I will be presenting on Infinispan, data grids and the data fabric of clouds at JBoss World Chicago, in September 2009. I will cover a brief history of Infinispan and the motivations behind the project, and then talk in a more abstract manner about data grids and the special place they occupy in providing data services for clouds.
In addition, I expect to pontificate on my thoughts on clouds and the future of computing in general to anyone who buys me a coffee/beer! :-)
So go on, convince your boss to let you go, and attend my talk, and hopefully I’ll see you there!
Tags: presentations data grids jbossworld
Tuesday, 28 April 2009
Over the past few months we’ve been flying under the radar preparing for the launch of a new, open source, highly scalable distributed data grid platform. We’ve finally got to a stage where we can announce it publicly and I would like to say that Infinispan is now ready to take on the world!
The way we write computer software is changing. The demise of the Quake Rule has made hardware manufacturers cram more cores on a CPU, more CPUs in a server. To achieve the levels of throughput and resilience that modern applications demand, compute grids are becoming increasingly popular. All this serves to exacerbate existing database bottlenecks; hence the need for a data grid platform.
Massive heap - If you have 100 blade servers, and each node has 2GB of space to dedicate to a replicated cache, you end up with 2 GB of total data. Every server is just a copy. On the other hand, with a distributed grid - assuming you want 1 copy per data item - you get a 100 GB memory backed virtual heap that is efficiently accessible from anywhere in the grid. Session affinity is not required, so you don’t need fancy load balancing policies. Of course you can still use them for further optimisation. If a server fails, the grid simply creates new copies of the lost data, and puts them on other servers. This means that applications looking for ultimate performance are no longer forced to delegate the majority of their data lookups to a large single database server - that massive bottleneck that exists in over 80% of enterprise applications!
Extreme scalability - Since data is evenly distributed, there is essentially no major limit to the size of the grid, except group communication on the network - which is minimised to just discovery of new nodes. All data access patterns use peer-to-peer communication where nodes directly speak to each other, which scales very well.
Very fast and lightweight core - The internal data structures of Infinispan are simple, very lightweight and heavily optimised for high concurrency. Early benchmarks have indicated 3-5 times less memory usage, and around 50% better CPU performance than the latest and greatest JBoss Cache release. Unlike other popular, competing commercial software, Infinispan scales when there are many local threads accessing the grid at the same time. Even though non-clustered caching (LOCAL mode) is not its primary goal, Infinispan still is very competitive here.
Not Just for Java (PHP, Python, Ruby, C, etc.) - The roadmap has a plan for a language-independent server module. This will support both the popular memcached protocol - with existing clients for almost every popular programming language - as well as an optimised Infinispan-specific protocol. This means that Infinispan is not just useful to Java. Any major website or application that wants to take advantage of a fast data grid will be able to do so.
Support for Compute Grids - Also on the roadmap is the ability to pass a Runnable around the grid. You will be able to push complex processing towards the server where data is local, and pull back results using a Future. This map/reduce style paradigm is common in applications where a large amount of data is needed to compute relatively small results.
Management is key! - When you start thinking about running a grid on several hundred servers, management is no longer an extra, it becomes a necessity. This is on Infinispan’s roadmap. We aim to provide rich tooling in this area, with many integration opportunities.
Competition is Proprietary - All of the major, viable competitors in the space are not open-source, and are very expensive. Enough said. :-)
What are data grids?http://www.arcatoglobal.com/images/ag_serverfarm.jpg
Data grids are, to put it simply, highly concurrent distributed data structures. Data grids typically allow you to address a large amount of memory and store data in a way that it is quick to access. They also tend to feature low latency retrieval, and maintain adequate copies across a network to provide resilience to server failure.
As such, at its core, Infinispan presents a humble data structure. But this is also a high specialised data structure, tuned to and geared for a great degree of concurrency - especially on multi-CPU/multi-core architectures. Most of the internals are essentially lock- and synchronization-free, favouring state-of-the-art non-blocking algorithms and techniques wherever possible. This translates to a data structure that is extremely quick even when it deals with a large number of concurrent accesses.
Beyond this, Infinispan is also a distributed data structure. It farms data out across a cluster of in-memory containers. It does so with a configurable degree of redundancy and various parameters to tune the performance-versus-resilience trade-off. Local "L1" caches are also maintained for quick reads of frequently accessed data.
Further, Infinispan supports JTA transactions. It also offers eviction strategies to ensure individual nodes do not run out of memory and passivation/overflow to disk. Warm-starts using preloads are also supported.
JBoss Cache and Infinispan
So where does Infinispan stand against the competition? Let’s start with JBoss Cache. It is no surprise that there are many similarities between JBoss Cache and Infinispan, given that they share the same minds! Infinispan is an evolution of JBoss Cache in that it borrows ideas, designs and some code, but for all practical purposes it is a brand new project and a new, much more streamlined codebase.
JBoss Cache has evolved from a basic replicated tree structure to include custom, high performance marshalling (in version 1.4), Buddy Replication (1.4), a new simplified API (2.X), high concurrency MVCC locking (3.0.X) and a new non-blocking state transfer mechanism (3.1.X). These were all incremental steps, but it is time for a quantum leap.
Hence Infinispan. Infinispan is a whole new project - not just JBoss Cache 4.0! - because it is far wider in scope and goals - not to mention target audience. Here are a few points summarising the differences:
JBoss Cache is a clustering library. Infinispan’s goal is to be a data grid platform, complete with management and migration tooling.
JBoss Cache’s focus has been on clustering, using replication. This has allowed it to scale to several 10s (occasionally even over 100) nodes. Infinispan’s goals are far greater - to scale to grids of several 100’s of nodes, eventually exceeding 1000’s of nodes. This is achieved using consistent hash based data distribution.
Infinispan’s data structure design is significantly different to that of JBoss Cache. This is to help achieve the target CPU and memory performance. Internally, data is stored in a flat, map-like container rather than a tree. That said, a tree-like compatibility layer - implemented on top of the flat container - is provided to aid migration from JBoss Cache.
JBoss Cache traditionally competed against other frameworks like EHCache and Terracotta. Infinispan, on the other hand, goes head to head against Oracle’s Coherence, Gemfire and Gigaspaces.
I look forward to your feedback!
Tags: data grids announcement