Blogs Infinispan Java Hot Rod client pool rework

Infinispan Java Hot Rod client pool rework

November 26, 2024 Tags: client hotrod performance

By William Burns

Infinispan 15.1 will be shipping a new default Hot Rod client implementation. This implementation completely overhauls the "pool" implementation and adds many internal code optimizations and reductions.

An overview of the changes

Remove ChannelPool implementation, replaced with single pipelined Channel-per-server
Reduce per-operation allocation rate
Remove unnecessary allocations during response processing
New protocol (HR 4.1) supporting streaming API in a connection stateless way
Internal refactoring to simplify adding additional commands
Multiple fixes for client bloom filters
Rework client flags to be more consistent and not tied to thread locals
Drop client support for HR protocol versions older than 3.0
New hotrod-client-legacy jar for the old client

How fast is it though?

As you can see this is quite an extensive rework and I am guessing many of you want to know just "How fast is it?". Lets test it and find out.

Using the following JMH benchmark we found that in each case the new client has better performance.

Clients	Servers	Concurrency	Performance Difference
1	1	single	+11.5%
1	1	high	+7%
1	3	single	+2.5%
3	3	high	+10%

From this table you should expect a performance benefit in the majority of cases while also reaping the other benefits listed below.

What does pipelined `Channel` mean though?

In the previous client, we would keep a pool of connections and, for each concurrent operation, we would allocate the operations required and then submit the bytes to the server in a single socket and wait until the bytes were flushed to the socket. During that period another thread could not use the same socket, thus it would use another.

The new client, however, uses pipelined requests so that multiple requests can be sent on the same socket without flsuhing immediately. Flushing is only performed after the concurrent requests are sent. This means, if multiple threads all send an operation, we can keep those requests possibly in a single packet when being sent to the server instead of one per-request.

This has the possibility of a loss in performance in a very specific case: a single client instance and a single server on hardware with a lot of cores. This is due to the use of an event loop in both the client and the server: operations to a server from the same client are always sent on a dedicated thread and the server proceses responses on a dedicated thread per client as well. As the number of clients and servers are scaled up, though, this concern dissolves very quickly and, depending on your hardware, may not be a concern at all as given the numbers we have above.

File Descriptors

What about resource usage during the test? As mentioned above the client now only uses a single connection per server instead of a pool per server.

Using a single server, after everything has been initialized, we can see we are using 35 file descriptors.

perf@perf:~$ lsof -p <pid> -i | wc -l
35

While running with the legacy client we see the following (note this is filtering only on the HR port)

perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
45

So in this case we have 45 files opened when running the test, which is more than the server just idle! This makes sense though given we have 1 file for the LISTEN for connections on the server and 22 each for the client and server (as our test is running 22 concurrent threads).

In comparison, when using the new client we only have 3 file descriptors! (note this filtering only on the HR port)

perf@perf:~$ lsof -p <pid> -i | grep 11222 | wc -l
3

That is one for the LISTEN for the server and the single connection between the client to the server.

In this run we also saw a 5.8% increase from the new client, pretty great!

This should help out all users, as we have had some cases in the past where users were using hundreds of clients and dozens of servers, causing for almost 100K+ connections. With this change in place the most number of client connections will be capped at the number of clients times the number of servers, completely eliminating the effect of concurrency on the number of client connections.

Client memory usage

The per operation allocation rates have been reduced as well, thus not requiring client applications to have as much resource dedicated to the GC there.

In the above test the legacy client allocation rate was around 660 MB/s whereas the new client was only 350 MB/s: almost half the allocation rate! As expected, in test we saw half the number of GC runs between the legacy and new client.

The biggest reason for this is because of our simplified internal operations and other miscellaneous per operation things. Just as a simple measure, you can see how much fewer objects are required in the constructor for our operations.

Legacy PutOperation

   public PutOperation(Codec codec, ChannelFactory channelFactory,
                       Object key, byte[] keyBytes, byte[] cacheName, AtomicReference<ClientTopology> clientTopology,
                       int flags, Configuration cfg, byte[] value, long lifespan, TimeUnit lifespanTimeUnit,
                       long maxIdle, TimeUnit maxIdleTimeUnit, DataFormat dataFormat, ClientStatistics clientStatistics,
                       TelemetryService telemetryService) {
      super(PUT_REQUEST, PUT_RESPONSE, codec, channelFactory, key, keyBytes, cacheName, clientTopology,
            flags, cfg, value, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit, dataFormat, clientStatistics,
            telemetryService);
   }

New PutOperation

   public PutOperation(InternalRemoteCache<?, ?> cache, byte[] keyBytes, byte[] valueBytes, long lifespan,
                       TimeUnit lifespanTimeUnit, long maxIdle, TimeUnit maxIdleTimeUnit) {
      super(cache, keyBytes, valueBytes, lifespan, lifespanTimeUnit, maxIdle, maxIdleTimeUnit);
   }

New Hot Rod protocol 4.1 and Streaming commands

Some of you may have been using the streaming remote API. Don’t worry this API has not changed. Instead, the underlying operations were needed to be updated. For those of you not familiar this is a way to use a streamed based approach to read and write byte[] values to the remote cache, allowing the client to only have to have a portion of value in memory at a given time.

The problem is the underlying operations were implemented in a way where it would reserve a connection while the read or write operation was performed. This is problematic with our current single connection per server approach in the client. Instead, Hot Rod protocol 4.1 implements new "stateless" commands that send/receive chunks of the value bytes as they are read/received with a non-blocking operation underneath. The OutputStream|InputStream instances will still block waiting for the underlying socket to complete its operations, but with the change to the protocol it no longer requires reserving the socket to the server.

Initial performance tests show a small to no change in performance which is well within what we would hope for. Please test it out if you are using it and let us know!

Client Hot Rod flags

Many of you may not be aware, but when you applied a Flag to an operation on the RemoteCache instance, you would have to set for every operation and if you shared the RemoteCache instance between threads they were independent. This embedded Cache instance behaved in a different fashion saving the Flag between operations and was the same between threads if using the same instance.

The RemoteCache behavior while being error-prone due to above was also detrimental to performance as you would need additional allocations per operation. As such in 15.1.0 the Flag instances are now stored in the RemoteCache instance and only need to be set once. If applied more than once the same instance is returned to the user to reduce allocation rates.

Note this change is for both the new client and the legacy client referred to in the next section.

Legacy Client

The new client, due to how it works, cannot support older Hot Rod protocols and as such it does not support anything older than protocol 3.0. The 3.0 protocol was introduced in Infinispan 10.0, which was released over 5 years ago. The protocol definitions can be found here for reference.

Due to this, and the complete overhaul of the internals we are providing a legacy module available which will use the previous client which supports back to HotRod 2.0. This can be used by just changing the module dependency from hotrod-client to hotrod-client-legacy.

Conclusions

We hope you all get a chance to try out the client changes and see what benefits or issues you find with the new client. If you want to discuss this please feel free to reach out to us as can be seen at https://infinispan.org/community/.

Get it, Use it, Ask us!

We’re hard at work on new features, improvements and fixes, so watch this space for more announcements!

Please, download and test the latest release.

The source code is hosted on GitHub. If you need to report a bug or request a new feature, look for a similar one on our GitHub issues tracker. If you don’t find any, create a new issue.

If you have questions, are experiencing a bug or want advice on using Infinispan, you can use GitHub discussions. We will do our best to answer you as soon as we can.

The Infinispan community uses Zulip for real-time communications. Join us using either a web-browser or a dedicated application on the Infinispan chat.

William Burns

Will is a core Infinispan engineer working for Red Hat since 2013. He enjoys streaming data and writing reactive code.

Infinispan Java Hot Rod client pool rework

An overview of the changes

How fast is it though?

What does pipelined Channel mean though?

File Descriptors

Client memory usage

New Hot Rod protocol 4.1 and Streaming commands

Client Hot Rod flags

Legacy Client

Conclusions

Get it, Use it, Ask us!

William Burns

What does pipelined `Channel` mean though?