In Memory Data Grid Patterns Demos from Devoxx France!

Devoxx France 2017 was a blast!! Emmanuel and I would like to thank all attendees to our in-memory data grids patterns talk. The room was full and we thoroughly enjoyed the experience!

During the talk we presented a couple of small demos that showcased some in-memory data grid use cases. The demos are located here, but I thought it’d be useful to provide some step-by-step here so that you can get them running as quickly as possible.

Before we start with any of the demos, it’s necessary to run some set up steps:

  1. Check out git repository:

  2. Download Infinispan Server 9.0.0.Final and at the same level as the git repository.

  3. Go into the datagrid-patterns directory, start the servers and wait until they’ve started:

    cd datagrid-patterns     ./run-servers.sh

  4. Install Anaconda for Python 3, this is required to run Jupyter notebook for plotting.

  5. Install Maven 3.

Once the set up is complete, it’s time to start with the individual demos.

Both demos shown below work with the same application domain: rail transport systems. In this domain, we differentiate between physical stations, trains, station boards which are located in stations, and finally stops, which are individual entries in station boards.

Analytics Demo

The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

To answer this question, Infinispan data grid will be loaded with 3 weeks worth of data from station boards. Once the data is loaded, we will execute a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

Then, these results are sent back to the client, which in turn, stores the results as JSON in an intermediate cache. Once the results are in place, we will use a Jupyter notebook to read those results and plot the result.

Let’s see these steps in action:

  1. First, we need to install the server tasks in the running servers above:

    cd datagrid-patterns/analytics

    mvn clean install package -am -pl analytics-server

    mvn wildfly:deploy -pl analytics-server

    

  2. Open the datagrid-pattern repo with your favourite IDE and run delays.java.stream.InjectApp class located in analytics/analytics-server project. This command will inject the data into the cache. On my environment, it takes between 1 and 2 minutes.

  3. With the data loaded, we need to run the remote task that will calculate the total number of trains per hour and how many of those are delayed. To do that, execute delays.java.stream.AnalyticsApp class located in analytics/analytics-server project from your IDE.

  4. You can verify that the results have been calculating by going to the following address:

  5. With the results in place, it’s time to start the Jupyter notebook:

    cd datagrid-patterns/analytics/analytics-jupyter

    ~/anaconda/bin/jupyter notebook

  6. Once the notebook opens, click open live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:

image

So, the answer to the question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

is 2am! That’s because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Real Time Demo

The second demo that we presented uses the same application domain as above, but this time we’re trying to use our data grid as a way of storing the station board state of each station at a given point in time. So, the idea is to use Infinispan as an in memory data grids for working with real time data.

So, what can we do with this type of data? In our demo, we will create a centralised dashboard of delayed trains around the country. To do that, we will take advantage of Infinispan’s Continuous Query functionality which allows us to find those station boards which contain stops that are delayed, and as new delayed trains appeared these will be pushed to our dashboard.

To run this demo, keep the same servers running as above and do the following:

  1. Run delays.query.continuous.FxApp application located in real-time project inside the datagrid-patterns demo. This app will inject some live station board data and will launch a JavaFX dashboard that shows delayed trains as they appear. It should look something like this:

image

Conclusion

This has been a summary of the demos that we run in our talk at Devoxx France with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information of these demos. If there’s anything unclear or any of the instructions above are not working, please let us know!

Thanks to Emmanuel Bernard for partnering with me for this Devoxx France talk and for the continuous feedback while developing the demos. Thanks as well to Tristan Tarrant for the input in the demos and many thanks to all Devoxx France attendees who attended our talk :)

A very special thanks to Alexandre Masselot whose "Swiss Transport in Real Time: Tribulations in the Big Data Stack" talk at Soft-Shake 2016 was the inspiration for these demos. @Alex, thanks a lot for sharing the demos and data with me and the rest of the community!!

In a just a few weeks I’ll be at Great Indian Developer Summit presenting these demos and much more! Stay tuned :)

Cheers,

Galder

News

Tags

JUGs alpha as7 asymmetric clusters asynchronous beta c++ cdi chat clustering community conference configuration console data grids data-as-a-service database devoxx distributed executors docker event functional grouping and aggregation hotrod infinispan java 8 jboss cache jcache jclouds jcp jdg jpa judcon kubernetes listeners meetup minor release off-heap openshift performance presentations product protostream radargun radegast recruit release release 8.2 9.0 final release candidate remote query replication queue rest query security spring streams transactions vert.x workshop 8.1.0 API DSL Hibernate-Search Ickle Infinispan Query JP-QL JSON JUGs JavaOne LGPL License NoSQL Open Source Protobuf SCM administration affinity algorithms alpha amazon anchored keys annotations announcement archetype archetypes as5 as7 asl2 asynchronous atomic maps atomic objects availability aws beer benchmark benchmarks berkeleydb beta beta release blogger book breizh camp buddy replication bugfix c# c++ c3p0 cache benchmark framework cache store cache stores cachestore cassandra cdi cep certification cli cloud storage clustered cache configuration clustered counters clustered locks codemotion codename colocation command line interface community comparison compose concurrency conference conferences configuration console counter cpp-client cpu creative cross site replication csharp custom commands daas data container data entry data grids data structures data-as-a-service deadlock detection demo deployment dev-preview development devnation devoxx distributed executors distributed queries distribution docker documentation domain mode dotnet-client dzone refcard ec2 ehcache embedded embedded query equivalence event eviction example externalizers failover faq final fine grained flags flink full-text functional future garbage collection geecon getAll gigaspaces git github gke google graalvm greach conf gsoc hackergarten hadoop hbase health hibernate hibernate ogm hibernate search hot rod hotrod hql http/2 ide index indexing india infinispan infinispan 8 infoq internationalization interoperability interview introduction iteration javascript jboss as 5 jboss asylum jboss cache jbossworld jbug jcache jclouds jcp jdbc jdg jgroups jopr jpa js-client jsr 107 jsr 347 jta judcon kafka kubernetes lambda language learning leveldb license listeners loader local mode lock striping locking logging lucene mac management map reduce marshalling maven memcached memory migration minikube minishift minor release modules mongodb monitoring multi-tenancy nashorn native near caching netty node.js nodejs non-blocking nosqlunit off-heap openshift operator oracle osgi overhead paas paid support partition handling partitioning performance persistence podcast presentation presentations protostream public speaking push api putAll python quarkus query quick start radargun radegast react reactive red hat redis rehashing releaase release release candidate remote remote events remote query replication rest rest query roadmap rocksdb ruby s3 scattered cache scripting second level cache provider security segmented server shell site snowcamp spark split brain spring spring boot spring-session stable standards state transfer statistics storage store store by reference store by value streams substratevm synchronization syntax highlighting tdc testing tomcat transactions tutorial uneven load user groups user guide vagrant versioning vert.x video videos virtual nodes vote voxxed voxxed days milano wallpaper websocket websockets wildfly workshop xsd xsite yarn zulip
Posted by Galder Zamarreño on 2017-04-07
Tags: conference devoxx demo streams query
back to top