Back to roadmaps pgvector Course

SQL Similarity Search and Ordering in pgvector

Let us write SQL queries to compare user search vectors against stored embeddings inside our database tables.


1. Running a Similarity Search Query

To perform a basic semantic search, order your query using a distance operator (such as <=> for Cosine distance) and set a row limit:

-- Retrieve the top 3 most similar document sections
SELECT id, content, embedding <=> '[0.012, -0.003, 0.045, ...]' AS distance
FROM document_sections
ORDER BY distance ASC
LIMIT 3;

This query:

  1. Calculates the Cosine distance between each row and the search vector.
  2. Orders rows from closest distance to furthest distance.
  3. Limits the return payload to the top 3 closest items.

2. Converting Distance to Similarity Score

Cosine distance measures difference, meaning a distance of 0 indicates a perfect match. To convert this to a user-friendly similarity score (where 1 represents a perfect match), subtract the distance from 1:

SELECT 
  id, 
  content, 
  (1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) AS similarity_score
FROM document_sections
ORDER BY similarity_score DESC
LIMIT 5;

3. Filtering Results using Distance Thresholds

To filter out irrelevant search results, query using a distance threshold in the WHERE clause:

SELECT id, content, (1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) AS similarity
FROM document_sections
-- Only return rows with a similarity score higher than 75%
WHERE (1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) > 0.75
ORDER BY similarity DESC;
Published on Last updated: