Infinispan for AI

The distributed in-memory backend for your AI applications

Cache LLM responses by semantic similarity. Return cached responses instantly — saving tokens, reducing latency, and cutting API costs.

Store document embeddings and retrieve relevant chunks via kNN search to ground LLM responses in your data.

Built-in MCP server lets AI assistants and agents query, cache, and manage data through the Model Context Protocol.

Cache tool results and API responses for AI agents. Multiple agents share the same distributed cache.

Persist multi-turn conversation history across restarts. TTL expiration cleans up old sessions automatically.

Use Infinispan as a VectorStore in your Spring AI applications. Full auto-configuration, metadata filtering, and observability support.

CDI-injectable embedding store with Dev Services. Zero-config in dev mode — Quarkus starts Infinispan automatically.

Official Infinispan embedding store for Java AI applications. Builder pattern, auto-schema registration, and kNN similarity search.

InfinispanVS vector store for Python AI applications. Similarity search, MMR, and auto-configuration out of the box.

Sub-millisecond vector search and cache lookups. Your AI application gets the context it needs without waiting.

Scale your vector store and caches across a cluster. Add nodes as your data grows — no single-node bottleneck.

Connect via HotRod, REST, or Redis (RESP) protocol. AI agents in any language can share the same cache.

No vendor lock-in. No enterprise-only AI features. Everything works in the community edition.

Cached LLM responses and conversation history expire automatically. No manual cleanup needed.

Replicate your AI data across data centers for global availability. Included in the open source distribution.