Qdrant Cloud Inference

Qdrant Cloud Inference is a managed service that allows developers to generate, store, and index text and image embeddings directly within Qdrant Cloud, a managed vector search engine. This service simplifies the process of building applications with multimodal search, retrieval-augmented generation, and hybrid search. By integrating model inference directly into Qdrant Cloud, developers can eliminate the need for separate inference infrastructure, manual pipelines, and redundant data transfers. This integration simplifies workflows, accelerates development cycles, and reduces unnecessary network hops. According to André Zayarni, CEO and co-founder of Qdrant, "Traditionally, embedding generation and vector search have been handled separately in developer workflows. With Qdrant Cloud Inference, it feels like a single tool: one API call with optimal resources for each component."
The service supports a variety of models, including MiniLM, SPLADE, BM25, Mixedbread Embed-Large, and CLIP for both text and image embeddings. This allows developers to build vector search applications that can handle dense, sparse, and image models, as well as hybrid and multimodal search capabilities. Qdrant Cloud Inference is the only managed vector database offering multimodal inference with separate image and text embedding models natively integrated into its cloud.
Qdrant Cloud Inference includes up to 5 million free tokens per model each month, with unlimited tokens for BM25. This enables teams to build and iterate on real AI features from day one. The service is designed to reduce latency, cut network costs, and simplify search by combining dense and sparse unstructured documents alongside images in one environment.
Vector databases are fundamental for providing essential memory for AI assistants and agents, allowing them to retain context over extended periods. Qdrant's service helps reduce the total number of times developers need to make API calls, driving less cloud usage and AI models, which lowers costs.
Qdrant is a high-performance, scalable, open-source vector database essential for building the next generation of AI/ML applications. The company's open-source project has surpassed 250 million installs across all open-source packages and has been recognized in The Forrester Wave™: Vector Databases, Q3 2024. Qdrant powers real-time Agentic RAG applications at scale in enterprises like Tripadvisor, HubSpot, and Deutsche Telekom.
Comments
Please log in to post a comment.