Project: RAG Knowledge Base Semantic Search
In this project, we will build the backend database layer for a Retrieval-Augmented Generation (RAG) knowledge base. The system allows semantic queries across document fragments, with support for filtering results by document categories (metadata filtering).
1. Table Schema Setup
First, create a table storing document chunk details, including a JSONB metadata column to support flexible filtering fields:
CREATE TABLE knowledge_chunks (
id BIGSERIAL PRIMARY KEY,
document_title TEXT NOT NULL,
category TEXT NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1536)
);2. Writing the Stored Procedure with Metadata Filtering
To filter results by category before running similarity searches, write a custom database function that accepts a filter_category string:
CREATE OR REPLACE FUNCTION match_knowledge_chunks (
query_embedding VECTOR(1536),
match_threshold FLOAT,
match_count INT,
filter_category TEXT
)
RETURNS TABLE (
id BIGINT,
document_title TEXT,
category TEXT,
content TEXT,
similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
k.id,
k.document_title,
k.category,
k.content,
(1 - (k.embedding <=> query_embedding)) AS similarity
FROM knowledge_chunks k
WHERE
-- 1. Filter by category
k.category = filter_category
-- 2. Filter by similarity threshold
AND (1 - (k.embedding <=> query_embedding)) > match_threshold
ORDER BY k.embedding <=> query_embedding ASC
LIMIT match_count;
END;
$$;3. Implementing the Node.js Query Script
Here is the backend implementation script to execute the search:
// src/services/knowledgeBase.ts
import { supabase } from "../lib/supabase";
export interface SearchResult {
id: number;
document_title: string;
category: string;
content: string;
similarity: number;
}
export async function searchKnowledgeBase(
queryVector: number[],
targetCategory: string
): Promise<SearchResult[]> {
console.log(`Searching knowledge base inside category: ${targetCategory}`);
const { data, error } = await supabase.rpc("match_knowledge_chunks", {
query_embedding: queryVector,
match_threshold: 0.70, // Return results with 70%+ similarity
match_count: 3, // Fetch top 3 matches
filter_category: targetCategory,
});
if (error) {
console.error("RAG search failed:", error.message);
throw new Error("Failed to retrieve matching documents");
}
return data as SearchResult[];
}Published on Last updated: