Friday, 02 November 2018

Infinispan triple connector release!

Infinispan Spark, Infinispan Hadoop and Infinispan Kafka have a new fresh release each!

Infinispan Spark 0.9

The native Apache Spark connector now supports Infinispan 9.4.x and Spark 2.3.2, and it exposes Infinispan’s new transcoding capabilities, enabling the InfinispanRDD and InfinispanDStream to operate with multiple data formats. For more details see the documentation.

Infinispan Hadoop 0.3

The connector that allows accessing Infinispan using standard Input/OutputFormat interfaces now offers compatibility with Infinispan 9.4.x and has been certified to run with the Hadoop 3.1.1 runtime. For more details about this connector, see the user manual. Also make sure to check the docker based demos: Infinispan + Yarn and Infinispan + Apache Flink.

Infinispan Kafka 0.3

Last but not least, the Infinispan Kafka connector was upgraded to work with the latest Kafka (2.0.x) and Infinispan releases (9.4.x). Many thanks to Andrea Cosentino for contributing this integration.

Posted by Gustavo on 2018-11-02
Tags: release kafka spark hadoop

Monday, 01 February 2016

'Infinispan 8 Tour' video available

The recording of the talk presented last Wednesday in London is available! Thanks to everyone who joined and to the London JBUG organizers for the awesome new venue!

https://skillsmatter.com/skillscasts/7190-infinispan-8-tour

Cheers, Gustavo

Posted by Gustavo on 2016-02-01
Tags: functional presentations spark hadoop streams video

Monday, 21 September 2015

Introducing the Infinispan Hadoop Connector

The version 0.1 of the Infinispan Hadoop connector has just been made available!

The connector will host several integrations with Hadoop related projects, and in this first release it supports converting Infinispan server into a Hadoop compliant data source, by providing an implementation of InputFormat and OutputFormat.

The InfinispanInputFormat and InfinispanOutputFormat

A Hadoop InputFormat is a specification of how a certain data source can be partitioned and how to read data from each of the partitions. Conversely, OutputFormat is used to write.

Looking closely at the Hadoop’s InputFormat interface, we can see two methods:

List<InputSplit> getSplits(JobContext context); RecordReader<K,V> createRecordReader(InputSplit split,TaskAttemptContext context);

The first method defines essentially a data partitioner, calculating one or more InputSplits that contain information about a certain partition of the data. With possession of a InputSplit, one can use it to obtain a RecordReader to iterate over the data. These two operations allow for parallelization of data processing across multiple nodes, and that’s how Hadoop map reduce achieves a high throughput over large datasets.

In Infinispan terms, each partition is a set of segments on a certain server, and a record reader is a remote iterator over those segments. The default partitioner shipped with the connector will create as many partitions as servers in the cluster, and each partition will contain the segments that are associated with that specific server.

==== Not only map reduce

Although the InfinispanInputFormat and InfinispanOutputformat can be used to run traditional Hadoop map reduce jobs over Infinispan data, it is not coupled to the Hadoop map reduce runtime. It is possible to leverage the connector to integrate Infinispan with other tools that, besides supporting Hadoop I/O interfaces, are able to read and write data more efficiently. One of those tools is Apache Flink, that has a dataflow engine capable of doing batch and stream data processing that supersedes the classic two stage map reduce approach.

==== Apache Flink example

====

Apache Flink supports Hadoop’s InputFormat as a data source to execute batch jobs, so to integrate with Infinispan it’s straightforward:

Please refer to the complete sample that has docker images for both Apache Flink and Infinispan server, and detailed instructions on how to execute and customise job.

Stay tuned

More details about the connector, maven coordinates, configuration options, sources and samples can be found at the project repository

In upcoming versions we expect to have a tighter integration with the Hadoop platform in order to run Infinispan clusters as a YARN application (ISPN-5709), and also support other tools from the ecosystem such as Apache Pig (ISPN-5749)

Posted by Gustavo on 2015-09-21
Tags: yarn hadoop server flink

Friday, 02 November 2018

Infinispan triple connector release!

Infinispan Spark 0.9

Infinispan Hadoop 0.3

Infinispan Kafka 0.3

Monday, 01 February 2016

'Infinispan 8 Tour' video available

Monday, 21 September 2015

Introducing the Infinispan Hadoop Connector

The InfinispanInputFormat and InfinispanOutputFormat

Stay tuned

News

2020-09-08 Infinispan 12.0.0.Dev03

2020-08-31 Non Blocking Saga

2020-08-28 The developer Conference Sao Paulo

2020-07-28 Infinispan Server Tutorial

2020-07-27 Infinispan 12.0.0.Dev01

Tags