Back to roadmaps pgvector Course

Building HNSW Indexes for Large Vector Columns

HNSW is the recommended index type for production vector applications. Let us look at how to build and configure HNSW indexes on a high-dimensional vector column.


1. HNSW Index Parameters

When creating an HNSW index, you can customize two parameters to balance build times against search accuracy:

  • m: The maximum number of bidirectional connection links created for each new node in the graph (default is 16).
  • ef_construction: The size of the dynamic candidate list evaluated during index construction (default is 64). Higher values improve search accuracy but increase index build times.

2. Creating the HNSW Index in SQL

To build an HNSW index on a column using Cosine distance, use the vector_cosine_ops operator class:

-- Create HNSW index on the embedding column
CREATE INDEX ON document_sections 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

If your query targets L2 Euclidean distance instead, build the index using the vector_l2_ops operator class:

CREATE INDEX ON document_sections 
USING hnsw (embedding vector_l2_ops);

3. Optimizing Search Execution (ef_search)

You can tune the search accuracy at runtime by adjusting the hnsw.ef_search session variable.

This parameter determines the candidate list size evaluated during query traversal:

-- Increase candidate scan size for the current database transaction session
SET hnsw.ef_search = 100;

-- Execute the similarity search query
SELECT id, content 
FROM document_sections
ORDER BY embedding <=> '[0.012, -0.003, ...]'
LIMIT 5;
  • Higher ef_search values: Increases nearest-neighbor search accuracy.
  • Lower ef_search values: Speed up query execution at the cost of slight accuracy losses.
Published on Last updated: