Vector Database Selection for Production RAG: Pinecone vs pgvector vs Weaviate vs Qdrant
Choosing a vector database for a RAG system is a decision that is hard to reverse after you have indexed 500,000 chunks. This is the comparison framework we use, the benchmarks that matter in practice, and the decision we reached for different project types.

The vector database market expanded dramatically between 2022 and 2025, from a handful of specialized options to over a dozen production-ready choices. For teams building RAG systems, this is simultaneously good news and a decision paralysis problem. This post is the comparison framework we use to select a vector store for production RAG, based on what we have learned from deploying several of these systems.
The axes that matter
Before comparing specific databases, establish what actually matters for your use case. The axes:
- Scale: How many vectors will you index? 10k is trivial. 100k is easy. 10M requires careful thought about index size, query latency, and memory requirements.
- Query latency requirements: A customer-facing chatbot with a 2-second total response budget needs faster retrieval than an async batch processing pipeline where 500ms is fine.
- Filtering requirements: Do you need to filter by metadata fields alongside the vector search? (Almost everyone does.) How many filter combinations do you need to support? Pre-filtering vs. post-filtering performance differs significantly across databases.
- Operational model: Managed cloud service vs. self-hosted. What is the team's operational capacity? For teams without dedicated infra engineers, managed is almost always right.
- Multi-tenancy: Does each of your customers need isolated vector stores, or is shared-store-with-filtering acceptable?
- Cost: Storage cost is proportional to vector count × dimension × bytes per float. Compute cost is proportional to query volume × index size. At 10M vectors with 1,536 dimensions using float32, storage alone is ~61GB before replication.
Pinecone
Pinecone is a fully managed vector database. You do not run infrastructure; you create an index, upsert vectors, and query. It handles replication, scaling, and updates transparently.
Architecture
Pinecone uses a proprietary indexing algorithm (reportedly based on HNSW with additional optimizations). Each index can be either "serverless" (usage-based pricing, no reserved capacity) or "pod-based" (dedicated infrastructure with predictable performance).
Filtering
Pinecone supports metadata filtering alongside vector search using a MongoDB-style filter syntax:
const results = await pinecone.index('my-index').query({
vector: queryEmbedding,
topK: 10,
filter: {
tenantId: { $eq: 'tenant-123' },
documentDate: { $gte: '2025-01-01' },
category: { $in: ['clinical-guidelines', 'drug-information'] },
},
includeMetadata: true,
});
Pinecone's filtering is generally efficient — it is done at the index level rather than post-retrieval. However, high-cardinality filters (many unique values in a filter field) can degrade performance on very large indexes.
Multi-tenancy
Pinecone supports namespaces within an index, which provides logical isolation between tenants while sharing the same physical index. For strict tenant isolation (regulated industries), separate indexes per tenant is safer but more expensive.
Cost at scale
Serverless pricing scales with query volume and storage. At high query volumes (millions per month), pod-based indexes with reserved capacity become more predictable. For our RAG systems in the 50k-500k vector range, Pinecone serverless has been the most cost-effective managed option.
When Pinecone wins:
- Managed, zero-ops vector store is a priority
- Team is small and cannot afford dedicated infra work
- Scale is in the 10k-5M range
- Standard filtering requirements
pgvector (PostgreSQL extension)
pgvector adds vector similarity search to PostgreSQL. If you are already running PostgreSQL, this is an attractive option because it adds vector search without adding a new system to operate.
Index types
pgvector supports two index types:
- IVFFlat — Inverted file index with flat quantization. Faster to build, lower recall at high ef values. Good for development and smaller indexes (<100k vectors).
- HNSW — Hierarchical Navigable Small World graph. Better query performance and higher recall, slower to build, higher memory usage. Recommended for production at any significant scale.
-- Create table with vector column
CREATE TABLE document_chunks (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
document_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- dimension must match embedding model
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index for production (requires pgvector 0.5.0+)
CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Index on tenant_id for efficient filtered queries
CREATE INDEX ON document_chunks (tenant_id);
-- Filtered similarity search
SELECT id, content, (1 - (embedding <=> $1)) as similarity
FROM document_chunks
WHERE tenant_id = $2
AND created_at >= '2025-01-01'
ORDER BY embedding <=> $1
LIMIT 10;
Performance characteristics
HNSW in pgvector is competitive with standalone vector databases for indexes under 1M vectors on properly resourced PostgreSQL instances. At 5M+ vectors, memory pressure becomes significant — each vector in an HNSW index is held in memory during queries. A 1M vector HNSW index with 1,536-dimensional float32 vectors requires approximately 6GB of memory just for the vectors, plus HNSW graph overhead.
When pgvector wins:
- Already running PostgreSQL and operational complexity is a primary concern
- Scale is under 1M vectors
- Strong filtering requirements that benefit from PostgreSQL's rich query planner
- Team has PostgreSQL expertise
- HIPAA or compliance requirements that make managed services with third-party BAAs complicated
Weaviate
Weaviate is a purpose-built vector database with a GraphQL query API and a rich feature set including multi-vector search, generative search (built-in OpenAI integration), and a schema-driven data model.
Schema-first design
Weaviate requires you to define a schema for your classes (equivalent to tables) before inserting data:
// Define a class with vector and properties
await weaviateClient.schema.classCreator().withClass({
class: 'DocumentChunk',
vectorizer: 'text2vec-openai',
moduleConfig: {
'text2vec-openai': {
model: 'text-embedding-3-large',
dimensions: 3072,
}
},
properties: [
{ name: 'content', dataType: ['text'] },
{ name: 'tenantId', dataType: ['uuid'] },
{ name: 'documentId', dataType: ['uuid'] },
{ name: 'documentDate', dataType: ['date'] },
{ name: 'category', dataType: ['text'] },
]
}).do()
Multi-tenancy
Weaviate has first-class multi-tenancy support with explicit tenant isolation at the data layer — each tenant's data is stored in separate physical shards. This is the strongest built-in multi-tenancy implementation among the options compared here.
When Weaviate wins:
- Multi-tenancy with strong isolation is a first-class requirement
- Multi-vector search (image + text, or multiple text representations per document) is needed
- GraphQL API aligns with the team's existing tooling
- Built-in vectorization is attractive (Weaviate can call the embedding model for you)
Qdrant
Qdrant is a Rust-based vector database known for high performance and memory efficiency. It supports both disk-based storage (with quantization for memory reduction) and in-memory indexes.
Quantization for memory efficiency
Qdrant's scalar and product quantization modes compress vectors to a fraction of their original size (e.g., 4x-8x reduction) with a small accuracy tradeoff. For large indexes where memory is the binding constraint, quantization makes Qdrant a practical choice when pgvector or in-memory alternatives would run out of RAM.
// Configure a collection with scalar quantization (int8 instead of float32)
await qdrantClient.createCollection('document_chunks', {
vectors: {
size: 1536,
distance: 'Cosine',
},
quantization_config: {
scalar: {
type: 'int8',
quantile: 0.99,
always_ram: true, // keep quantized vectors in memory, store originals on disk
}
}
})
Filtered search performance
Qdrant's payload (metadata) filtering uses an adaptive strategy: for high-selectivity filters (narrow filter that matches few points), it filters first then searches the vector space. For low-selectivity filters (broad filter that matches most points), it searches the vector space first then filters. This adaptive strategy maintains better performance across filter selectivity extremes than databases with a fixed filter strategy.
When Qdrant wins:
- Very large indexes (5M+ vectors) where memory efficiency matters
- Complex filtering requirements with variable filter selectivity
- Self-hosted with Rust performance characteristics is attractive
- Streaming ingestion of vectors (Qdrant handles concurrent writes well)
The decision matrix
| Criteria | Pinecone | pgvector | Weaviate | Qdrant |
|---|---|---|---|---|
| Operational complexity | Lowest (managed) | Low (if already on PG) | Medium | Medium |
| Scale (>5M vectors) | Good | Challenging | Good | Best |
| Multi-tenancy | Namespaces | Filter-based | First-class | Filter-based |
| Filtering performance | Good | Excellent (PG planner) | Good | Excellent (adaptive) |
| Managed option | Yes (fully managed) | Via PG providers | Yes (Weaviate Cloud) | Yes (Qdrant Cloud) |
| Memory efficiency | Good (managed) | Limited | Good | Best (quantization) |
What we use in production
For new healthcare and enterprise RAG projects where the team is small and operational simplicity is important: Pinecone serverless unless there is a specific reason not to. The managed model removes infrastructure work from the project scope, and the performance is strong enough for the 50k-1M vector range that most initial RAG deployments land in.
For projects where we are already running PostgreSQL and the vector count is under 1M: pgvector with HNSW. The operational simplicity of not adding a new system outweighs the performance advantages of a specialized vector database at this scale.
For multi-tenant B2B SaaS with strict per-tenant isolation requirements: Weaviate, because its first-class multi-tenancy model handles this requirement cleanly without workarounds.
For large-scale indexing projects (5M+ vectors) where memory efficiency is a concern: Qdrant with quantization. The scalar quantization mode makes very large indexes viable on reasonable infrastructure.
The most important advice: start with Pinecone or pgvector, instrument your retrieval quality metrics from day one, and migrate only when a specific limitation of the current system is actually causing problems. Premature vector database optimization is a real category of wasted engineering time.
If you are designing a RAG system and want help working through the vector store decision for your specific scale, filtering requirements, and operational constraints, we have made this decision several times and can help you find the right fit.
Related service
AI Development & Automation
Production RAG pipelines, LLM integrations, and AI workflow automation for healthcare and e-commerce.
Written by
Founder & CEO
Gaurang Ghinaiya is the Founder & CEO of Nexios Technologies. He is passionate about building innovative software solutions that drive business growth. With years of experience in technology leadership, he guides teams toward excellence.

