# Distance Metrics Explained: Cosine, L2, and Inner Product To query similarity between vector embeddings, you must choose a distance metric. pgvector supports three primary distance calculations. --- ## 1. Cosine Distance (<=> Operator) Cosine distance measures the difference in **direction or angle** between two vectors, ignoring their scale or magnitude. ```sql -- Cosine distance query syntax in pgvector SELECT * FROM document_sections ORDER BY embedding <=> '[0.002, 0.015, ...]' LIMIT 5; ``` * **Values**: Ranges from 0 (identical direction) to 2 (opposite directions). * **Use Case**: This is the default recommendation for text embeddings (such as OpenAI embeddings) because it measures semantic similarity regardless of the text length. --- ## 2. L2 Euclidean Distance (<-> Operator) L2 Euclidean distance measures the **straight-line distance** between two points in a multi-dimensional space. ```sql -- L2 Euclidean distance query syntax in pgvector SELECT * FROM document_sections ORDER BY embedding <-> '[0.002, 0.015, ...]' LIMIT 5; ``` * **Values**: Ranges from 0 (identical points) to infinity. * **Use Case**: Suitable for image search or cases where vector magnitude is an important feature. --- ## 3. Inner Product (<# Operator) Inner product (dot product) multiplies corresponding coordinates of two vectors and sums the results. In pgvector, the `<#` operator represents the negative inner product because PostgreSQL indexes sort in ascending order (smaller values first). ```sql -- Inner product query syntax in pgvector SELECT * FROM document_sections ORDER BY embedding <# '[0.002, 0.015, ...]' LIMIT 5; ``` * **Use Case**: Suitable for vectors that are normalized (where magnitude equals 1), as the dot product calculation is computationally faster than Cosine distance.
Published on Last updated: