Ai AutomationAI EngineeringSoftware Engineering

Vector Database Selection for Production RAG: Pinecone vs pgvector vs Weaviate vs Qdrant

Choosing a vector database for a RAG system is a decision that is hard to reverse after you have indexed 500,000 chunks. This is the comparison framework we use, the benchmarks that matter in practice, and the decision we reached for different project types.

Gaurang Ghinaiya

Founder & CEO

March 10, 2026

8 min read

Vector Database Selection for Production RAG: Pinecone vs pgvector vs Weaviate vs Qdrant

The vector database market expanded dramatically between 2022 and 2025, from a handful of specialized options to over a dozen production-ready choices. For teams building RAG systems, this is simultaneously good news and a decision paralysis problem. This post is the comparison framework we use to select a vector store for production RAG, based on what we have learned from deploying several of these systems.

The axes that matter

Before comparing specific databases, establish what actually matters for your use case. The axes:

Scale: How many vectors will you index? 10k is trivial. 100k is easy. 10M requires careful thought about index size, query latency, and memory requirements.
Query latency requirements: A customer-facing chatbot with a 2-second total response budget needs faster retrieval than an async batch processing pipeline where 500ms is fine.
Filtering requirements: Do you need to filter by metadata fields alongside the vector search? (Almost everyone does.) How many filter combinations do you need to support? Pre-filtering versus post-filtering performance differs significantly across databases.
Operational model: Managed cloud service versus self-hosted. What is the team's operational capacity? For teams without dedicated infra engineers, managed is almost always right.
Multi-tenancy: Does each of your customers need isolated vector stores, or is shared-store-with-filtering acceptable?
Cost: Storage cost is proportional to vector count × dimension × bytes per float. Compute cost is proportional to query volume × index size. At 10M vectors with 1,536 dimensions using float32, storage alone is ~61GB before replication.

Pinecone

Pinecone is a fully managed vector database. You do not run infrastructure; you create an index, upsert vectors, and query. It handles replication, scaling, and updates transparently.

Architecture

Pinecone uses a proprietary indexing algorithm (reportedly based on HNSW with additional optimizations). Each index can be either "serverless" (usage-based pricing, no reserved capacity) or "pod-based" (dedicated infrastructure with predictable performance).

Filtering

Pinecone supports metadata filtering alongside vector search using a MongoDB-style filter syntax:

const results = await pinecone.index('my-index').query({

  vector: queryEmbedding,

  topK: 10,

  filter: {

    tenantId: { $eq: 'tenant-123' },

    documentDate: { $gte: '2025-01-01' },

    category: { $in: ['clinical-guidelines', 'drug-information'] },

  },

  includeMetadata: true,

});

Pinecone's filtering is generally efficient; it is done at the index level rather than post-retrieval. However, high-cardinality filters (many unique values in a filter field) can degrade performance on very large indexes.

Multi-tenancy

Pinecone supports namespaces within an index, which provides logical isolation between tenants while sharing the same physical index. For strict tenant isolation (regulated industries), separate indexes per tenant is safer but more expensive.

Cost at scale

Serverless pricing scales with query volume and storage. At high query volumes (millions per month), pod-based indexes with reserved capacity become more predictable. For our RAG systems in the 50k to 500k vector range, Pinecone serverless has been the most cost-effective managed option.

When Pinecone wins:

Managed, zero-ops vector store is a priority
Team is small and cannot afford dedicated infra work
Scale is in the 10k to 5M range
Standard filtering requirements

pgvector (PostgreSQL extension)

pgvector adds vector similarity search to PostgreSQL. If you are already running PostgreSQL, this is an attractive option because it adds vector search without adding a new system to operate.

Index types

pgvector supports two index types:

IVFFlat: Inverted file index with flat quantization. Faster to build, lower recall at high ef values. Good for development and smaller indexes under 100k vectors.
HNSW: Hierarchical Navigable Small World graph. Better query performance and higher recall, slower to build, higher memory usage. Recommended for production at any significant scale.

-- Create table with vector column

CREATE TABLE document_chunks (

  id          BIGSERIAL PRIMARY KEY,

  tenant_id   UUID NOT NULL,

  document_id UUID NOT NULL,

  content     TEXT NOT NULL,

  embedding   vector(1536),  -- dimension must match embedding model

  created_at  TIMESTAMPTZ DEFAULT NOW()

);



-- HNSW index for production (requires pgvector 0.5.0+)

CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops)

  WITH (m = 16, ef_construction = 64);



-- Index on tenant_id for efficient filtered queries

CREATE INDEX ON document_chunks (tenant_id);



-- Filtered similarity search

SELECT id, content, (1 - (embedding <=> $1)) as similarity

FROM document_chunks

WHERE tenant_id = $2

  AND created_at >= '2025-01-01'

ORDER BY embedding <=> $1

LIMIT 10;

Performance characteristics

HNSW in pgvector is competitive with standalone vector databases for indexes under 1M vectors on properly resourced PostgreSQL instances. At 5M plus vectors, memory pressure becomes significant; each vector in an HNSW index is held in memory during queries. A 1M vector HNSW index with 1,536-dimensional float32 vectors requires approximately 6GB of memory just for the vectors, plus HNSW graph overhead.

When pgvector wins:

Already running PostgreSQL and operational complexity is a primary concern
Scale is under 1M vectors
Strong filtering requirements that benefit from PostgreSQL's rich query planner
Team has PostgreSQL expertise
HIPAA or compliance requirements that make managed services with third-party BAAs complicated

Weaviate

Weaviate is a purpose-built vector database with a GraphQL query API and a rich feature set including multi-vector search, generative search (built-in OpenAI integration), and a schema-driven data model.

Schema-first design

Weaviate requires you to define a schema for your classes (equivalent to tables) before inserting data:

// Define a class with vector and properties

await weaviateClient.schema.classCreator().withClass({

  class: 'DocumentChunk',

  vectorizer: 'text2vec-openai',

  moduleConfig: {

    'text2vec-openai': {

      model: 'text-embedding-3-large',

      dimensions: 3072,

    }

  },

  properties: [

    { name: 'content', dataType: ['text'] },

    { name: 'tenantId', dataType: ['uuid'] },

    { name: 'documentId', dataType: ['uuid'] },

    { name: 'documentDate', dataType: ['date'] },

    { name: 'category', dataType: ['text'] },

  ]

}).do()

Multi-tenancy

Weaviate has first-class multi-tenancy support with explicit tenant isolation at the data layer; each tenant's data is stored in separate physical shards. This is the strongest built-in multi-tenancy implementation among the options compared here.

When Weaviate wins:

Multi-tenancy with strong isolation is a first-class requirement
Multi-vector search (image plus text, or multiple text representations per document) is needed
GraphQL API aligns with the team's existing tooling
Built-in vectorization is attractive (Weaviate can call the embedding model for you)

Qdrant

Qdrant is a Rust-based vector database known for high performance and memory efficiency. It supports both disk-based storage (with quantization for memory reduction) and in-memory indexes.

Quantization for memory efficiency

Qdrant's scalar and product quantization modes compress vectors to a fraction of their original size (for example, a 4x to 8x reduction) with a small accuracy tradeoff. For large indexes where memory is the binding constraint, quantization makes Qdrant a practical choice when pgvector or in-memory alternatives would run out of RAM.

// Configure a collection with scalar quantization (int8 instead of float32)

await qdrantClient.createCollection('document_chunks', {

  vectors: {

    size: 1536,

    distance: 'Cosine',

  },

  quantization_config: {

    scalar: {

      type: 'int8',

      quantile: 0.99,

      always_ram: true,  // keep quantized vectors in memory, store originals on disk

    }

  }

})

Filtered search performance

Qdrant's payload (metadata) filtering uses an adaptive strategy: for high-selectivity filters (narrow filter that matches few points), it filters first then searches the vector space. For low-selectivity filters (broad filter that matches most points), it searches the vector space first then filters. This adaptive strategy maintains better performance across filter selectivity extremes than databases with a fixed filter strategy.

When Qdrant wins:

Very large indexes (5M plus vectors) where memory efficiency matters
Complex filtering requirements with variable filter selectivity
Self-hosted with Rust performance characteristics is attractive
Streaming ingestion of vectors (Qdrant handles concurrent writes well)

The decision matrix

Criteria	Pinecone	pgvector	Weaviate	Qdrant
Operational complexity	Lowest (managed)	Low (if already on PG)	Medium	Medium
Scale (>5M vectors)	Good	Challenging	Good	Best
Multi-tenancy	Namespaces	Filter-based	First-class	Filter-based
Filtering performance	Good	Excellent (PG planner)	Good	Excellent (adaptive)
Managed option	Yes (fully managed)	Via PG providers	Yes (Weaviate Cloud)	Yes (Qdrant Cloud)
Memory efficiency	Good (managed)	Limited	Good	Best (quantization)

What we use in production

For new healthcare and enterprise RAG projects where the team is small and operational simplicity is important: Pinecone serverless unless there is a specific reason not to. The managed model removes infrastructure work from the project scope, and the performance is strong enough for the 50k to 500k vector range that most initial RAG deployments land in.

For projects where we are already running PostgreSQL and the vector count is under 1M: pgvector with HNSW. The operational simplicity of not adding a new system outweighs the performance advantages of a specialized vector database at this scale.

For multi-tenant B2B SaaS with strict per-tenant isolation requirements: Weaviate, because its first-class multi-tenancy model handles this requirement cleanly without workarounds.

For large-scale indexing projects (5M plus vectors) where memory efficiency is a concern: Qdrant with quantization. The scalar quantization mode makes very large indexes viable on reasonable infrastructure.

The most important advice: start with Pinecone or pgvector, instrument your retrieval quality metrics from day one, and migrate only when a specific limitation of the current system is actually causing problems. Premature vector database optimization is a real category of wasted engineering time.

If you are designing a RAG system and want help working through the vector store decision for your specific scale, filtering requirements, and operational constraints, we have made this decision several times and can help you find the right fit.

Related service

AI Development & Automation

Production RAG pipelines, LLM integrations, and AI workflow automation for healthcare and e-commerce.

Learn more

Written by

Gaurang Ghinaiya

Founder & CEO

Gaurang Ghinaiya is the Founder & CEO of Nexios Technologies. He is passionate about building innovative software solutions that drive business growth. With years of experience in technology leadership, he guides teams toward excellence.

The axes that matter

Pinecone

Architecture

Filtering

Multi-tenancy

Cost at scale

When Pinecone wins:

pgvector (PostgreSQL extension)

Index types

Performance characteristics

When pgvector wins:

Weaviate

Schema-first design

Multi-tenancy

When Weaviate wins:

Qdrant

Quantization for memory efficiency

Filtered search performance

When Qdrant wins:

The decision matrix

What we use in production

Have a project in mind?