SQL Similarity Search and Ordering in pgvector
Let us write SQL queries to compare user search vectors against stored embeddings inside our database tables.
1. Running a Similarity Search Query
To perform a basic semantic search, order your query using a distance operator (such as <=> for Cosine distance) and set a row limit:
-- Retrieve the top 3 most similar document sections
SELECT id, content, embedding <=> '[0.012, -0.003, 0.045, ...]' AS distance
FROM document_sections
ORDER BY distance ASC
LIMIT 3;This query:
- Calculates the Cosine distance between each row and the search vector.
- Orders rows from closest distance to furthest distance.
- Limits the return payload to the top 3 closest items.
2. Converting Distance to Similarity Score
Cosine distance measures difference, meaning a distance of 0 indicates a perfect match. To convert this to a user-friendly similarity score (where 1 represents a perfect match), subtract the distance from 1:
SELECT
id,
content,
(1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) AS similarity_score
FROM document_sections
ORDER BY similarity_score DESC
LIMIT 5;3. Filtering Results using Distance Thresholds
To filter out irrelevant search results, query using a distance threshold in the WHERE clause:
SELECT id, content, (1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) AS similarity
FROM document_sections
-- Only return rows with a similarity score higher than 75%
WHERE (1 - (embedding <=> '[0.012, -0.003, 0.045, ...]')) > 0.75
ORDER BY similarity DESC;Published on Last updated: