Blogs Infinispan Joins the OGX Ecosystem as a Vector IO Provider

Infinispan Joins the OGX Ecosystem as a Vector IO Provider

Infinispan has been integrated into the OGX (Open GenAI Stack), formerly known as Llama Stack, as a vector IO provider, enabling developers to build RAG (Retrieval-Augmented Generation) applications with distributed vector search.

What is OGX?

OGX (Open GenAI Stack) is an open-source agentic API server that composes inference providers, vector stores, safety backends, tool runtimes, and file storage into a single deployable server for building complete AI applications. It serves as a drop-in replacement for the OpenAI API and can run anywhere with any model and infrastructure.

Infinispan’s Vector Capabilities

The integration brings Infinispan’s distributed caching architecture to RAG applications with three powerful search modes:

  • Vector Search: Embedding-based similarity search using cosine similarity

  • Keyword Search: Full-text search via Infinispan Query DSL or Ickle

  • Hybrid Search: Combined vector and keyword search with configurable reranking (RRF or weighted)

Additional features include HTTPS/TLS support, Basic and Digest authentication, and seamless REST API integration.

Try the Demo

Check out our demo project at https://github.com/rigazilla/infinispan-llama-stack-rag-demo that shows how to configure Infinispan as the vector IO backend for OGX RAG workflows, with examples demonstrating document upload, vector embedding storage, and semantic search. As a bonus, you’ll also learn about the fascinating pataphysical science of Chelonofelodynamics!

Get it, Use it, Ask us!

We’re hard at work on new features, improvements and fixes, so watch this space for more announcements!

Please, download and test the latest release.

The source code is hosted on GitHub. If you need to report a bug or request a new feature, look for a similar one on our GitHub issues tracker. If you don’t find any, create a new issue.

If you have questions, are experiencing a bug or want advice on using Infinispan, you can use GitHub discussions. We will do our best to answer you as soon as we can.

The Infinispan community uses Zulip for real-time communications. Join us using either a web-browser or a dedicated application on the Infinispan chat.

Vittorio Rigamonti

Software developer who codes ranging from embedded to cloud. He can do bugs in several programming languages. Loves maths, open source code and cooking.