Back to roadmaps pgvector Course

Writing Database Functions for Similarity Search

Because the Supabase JS Client cannot directly generate complex SQL operators like <=> inside a standard .select() query builder, the recommended approach is to wrap similarity logic inside a PostgreSQL Database Function (stored procedure).


1. Defining the SQL Function

A database function runs directly on your database server and can be triggered via an API Remote Procedure Call (RPC).

Open the Supabase Dashboard -> SQL Editor, and run this script to create a matching function named match_documents:

-- Stored procedure function to query matching document embeddings
CREATE OR REPLACE FUNCTION match_documents (
  query_embedding VECTOR(1536),
  match_threshold FLOAT,
  match_count INT
)
RETURNS TABLE (
  id BIGINT,
  content TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    document_sections.id,
    document_sections.content,
    -- Calculate similarity (1 - Cosine distance)
    (1 - (document_sections.embedding <=> query_embedding)) AS similarity
  FROM document_sections
  -- Filter out rows below the similarity threshold
  WHERE (1 - (document_sections.embedding <=> query_embedding)) > match_threshold
  -- Order from highest similarity score to lowest
  ORDER BY document_sections.embedding <=> query_embedding ASC
  LIMIT match_count;
END;
$$;

2. Reviewing Stored Procedure Parameters

  • query_embedding VECTOR(1536): Accepts the 1536-dimension search vector sent from the client application.
  • match_threshold FLOAT: Restricts results below a minimum similarity percentage (for example, only returning matches with scores above 0.70).
  • match_count INT: Restricts the maximum number of rows returned by the query.
  • RETURNS TABLE (...): Outlines the schema structure of the output rows returned to the API client.
Published on Last updated: