Vector Database Explained — The Foundation of AI Search
A vector database stores numerical representations (embeddings) of data and enables similarity search. It is the critical infrastructure layer behind RAG, semantic search, recommendation systems, and modern AI applications.
What is a Vector Database?
A vector database is a specialized database designed to store, index, and query high-dimensional vectors. These vectors — called embeddings — are numerical representations of data (text, images, audio) that capture semantic meaning in a way machines can understand.
To understand why vector databases exist, consider how traditional databases work. A SQL database stores structured data in rows and columns. When you search for "apple," it looks for exact matches of the string "apple." It cannot understand that "fruit from a tree" is semantically related, or that "Apple Inc." is a different entity entirely.
Vector databases solve this by operating in a completely different paradigm. Instead of storing text and matching keywords, they store numerical vectors — lists of hundreds or thousands of floating-point numbers — that represent the meaning of the original data. When you query a vector database, you search by meaning, not by exact text.
What Are Embeddings?
An embedding is a dense numerical vector generated by a machine learning model. The model takes input (a sentence, a paragraph, an image) and converts it into a fixed-size list of numbers. The key property: similar inputs produce vectors that are close together in the vector space.
For example, the sentences "The cat sat on the mat" and "A feline rested on the rug" would produce very similar embeddings, because they have the same meaning. But "The stock market crashed yesterday" would produce a very different embedding, because its meaning is unrelated.
This is the foundation of semantic search — searching by meaning rather than by keywords. For a deeper dive into how embeddings work, see our embedding visualization module.
How Similarity Search Works
When you query a vector database, the process is fundamentally different from a SQL query:
- Embed the query: Your text query is converted into a vector using the same embedding model that was used to store the data
- Find nearest neighbors: The database finds the stored vectors that are closest to your query vector
- Return results: The original data associated with those vectors is returned, ranked by similarity
Distance Metrics
"Closeness" between vectors is measured using distance metrics. The most common ones are:
- Cosine Similarity: Measures the angle between two vectors. Ranges from -1 (opposite) to 1 (identical). This is the most common metric for text embeddings, as it captures directional similarity regardless of magnitude.
- Euclidean Distance (L2): Measures the straight-line distance between two points. Lower values mean more similar. Works well when vector magnitude matters.
- Dot Product: The sum of element-wise multiplication. Higher values mean more similar. Fast to compute and works well with normalized vectors.
For most text embedding use cases, cosine similarity is the recommended metric. It is invariant to vector length and focuses on the direction of the vector, which encodes meaning.
The Curse of Dimensionality
High-dimensional vectors (768, 1536, or even 3072 dimensions for modern embedding models) present a computational challenge. Brute-force comparison of a query vector against millions of stored vectors is too slow for production use. This is why vector databases use specialized indexing methods.
Vector Indexing Methods
Vector databases use specialized data structures and algorithms to make similarity search fast, even with millions or billions of vectors. The main approaches are:
HNSW (Hierarchical Navigable Small World)
The most popular indexing method. HNSW builds a multi-layer graph where each node is connected to its nearest neighbors. Search starts at the top layer (coarse) and navigates down to finer layers, quickly converging on the nearest vectors. HNSW offers excellent recall (accuracy) and speed, but requires more memory because it stores the graph structure.
This is the default index in most vector databases, including Qdrant, Weaviate, and Chroma.
IVF (Inverted File Index)
IVF partitions the vector space into clusters (using k-means). At query time, only the nearest clusters are searched, dramatically reducing the number of comparisons. IVF is memory-efficient but can miss results if the query falls near cluster boundaries.
PQ (Product Quantization)
PQ compresses vectors by dividing them into sub-vectors and quantizing each sub-vector to a small number of representative centroids. This reduces memory usage by 10-100x at the cost of some accuracy. Often combined with IVF (IVF-PQ) for large-scale deployments.
Flat (Brute Force)
The simplest approach: compare the query against every stored vector. Guarantees 100% recall but is slow for large datasets. Useful for small collections (under 100K vectors) or as a baseline for benchmarking other methods.
Popular Vector Databases
The vector database landscape is crowded and evolving rapidly. Here are the most important options:
| Database | Type | Best For | Key Strength |
|---|---|---|---|
| Pinecone | Managed cloud | Production, ease of use | Fully managed, zero ops |
| Weaviate | Open-source / cloud | Hybrid search | Vector + keyword search |
| Chroma | Open-source / local | Prototyping, small apps | Simple API, lightweight |
| Qdrant | Open-source / cloud | Performance-critical | Fast, Rust-based, rich filtering |
| Milvus | Open-source | Large-scale enterprise | Billions of vectors, GPU support |
| FAISS | Library (Meta) | In-memory, research | Blazing fast, GPU-accelerated |
| pgvector | PostgreSQL extension | Existing PostgreSQL users | Add vectors to your SQL database |
Pinecone
Pinecone is a fully managed vector database designed for production workloads. You do not need to manage infrastructure, handle scaling, or worry about index optimization. It offers low-latency queries, metadata filtering, and automatic scaling. The free tier is generous enough for prototyping. The trade-off is vendor lock-in and cost at scale.
Weaviate
Weaviate's standout feature is hybrid search — it combines vector (semantic) search with traditional keyword (BM25) search in a single query. This is powerful because some queries are better served by exact keyword matching (product codes, proper nouns) while others benefit from semantic understanding. Weaviate also has built-in vectorization modules, so you can generate embeddings directly within the database.
Chroma
Chroma is the simplest vector database to get started with. It runs in-process (no server needed), has a clean Python API, and integrates seamlessly with LangChain. It stores data in a local directory by default. Chroma is perfect for prototyping and small applications, but it is not designed for large-scale production use with millions of vectors.
Qdrant
Qdrant is written in Rust and is optimized for performance. It offers advanced filtering (you can filter by metadata before, during, or after the vector search), payload support, and a rich query language. Qdrant can run as a single node or in a distributed cluster. It is a strong choice when you need both speed and flexibility.
When to Use a Vector Database
Vector databases are not always the right choice. Here is when they shine and when they do not:
Use a Vector Database When
- Building RAG: You need to retrieve relevant document chunks for LLM context. This is the most common use case.
- Semantic search: Users search by meaning, not keywords. "How do I fix a leaky faucet" should find plumbing guides even if they do not contain those exact words.
- Recommendation systems: Find similar products, articles, or content based on embeddings.
- Duplicate detection: Find near-duplicate content (similar product listings, plagiarized text).
- Anomaly detection: Identify data points that are far from all known clusters.
Do Not Use a Vector Database When
- Exact keyword search: If users search for exact product IDs, email addresses, or names, a traditional search engine (Elasticsearch, Meilisearch) is better.
- Structured queries: If you need to filter, aggregate, and join data, use SQL or a document database.
- Small datasets: For fewer than 10,000 documents, a simple brute-force search (NumPy cosine similarity) is fast enough and simpler.
- Real-time transactions: Vector databases are optimized for read-heavy similarity search, not for high-frequency writes and ACID transactions.
Vector Databases and RAG
The most important application of vector databases today is RAG (Retrieval-Augmented Generation). RAG solves a fundamental LLM limitation: the model only knows what it was trained on. By retrieving relevant documents at query time and passing them as context, you can ground the LLM's response in accurate, up-to-date information.
The RAG pipeline with a vector database works like this:
- Indexing phase (offline): Documents are split into chunks, each chunk is embedded using an embedding model, and the embeddings are stored in the vector database along with the original text and metadata.
- Query phase (online): The user's question is embedded using the same model. The vector database retrieves the most similar chunks. These chunks are inserted into the LLM prompt as context. The LLM generates an answer grounded in the retrieved information.
This approach has several advantages over fine-tuning: it works with any LLM, the knowledge can be updated instantly (just re-index), and you can cite specific sources. For a complete guide, see our RAG Explained article and our LangChain tutorial for building RAG pipelines.
How to Choose
Choosing a vector database depends on your stage and requirements:
- Prototyping / learning: Start with Chroma (local, simple) or FAISS (in-memory, fast). Zero infrastructure overhead.
- Small production app: Use Pinecone (managed, no ops) or Qdrant Cloud (flexible, good free tier).
- Hybrid search needed: Use Weaviate (built-in BM25 + vector).
- Existing PostgreSQL: Use pgvector (no new infrastructure).
- Large scale / enterprise: Use Milvus or Pinecone (designed for billions of vectors).
- Research / custom indexing: Use FAISS (maximum control over index types and parameters).
The good news: LangChain and most LLM frameworks abstract the vector database behind a common VectorStore interface. You can start with Chroma and switch to Pinecone later by changing a few lines of code.
Frequently Asked Questions
What is a vector database?
A vector database is a specialized database designed to store, index, and query high-dimensional vectors (embeddings). It enables similarity search — finding items that are semantically similar to a query, rather than matching exact keywords.
How is a vector database different from a regular database?
Regular databases (SQL, NoSQL) store structured data and match queries exactly. Vector databases store numerical embeddings and find items by semantic similarity. You query a vector database with a vector, not a keyword, and get back the most similar items ranked by distance.
Which vector database should I use?
For prototyping, use Chroma (local, lightweight) or FAISS (in-memory). For production cloud deployments, use Pinecone (managed, easy) or Qdrant (open-source, performant). For hybrid search (vector + keyword), use Weaviate. For large-scale enterprise, consider Milvus or Pinecone.
Do I need a vector database for RAG?
Yes, a vector database is essential for RAG (Retrieval-Augmented Generation). It stores document embeddings and retrieves the most relevant chunks at query time, which are then passed to the LLM as context. Without a vector store, you would need to pass your entire document library to the LLM, which is impractical due to context window limits and cost.