Back to roadmaps langchain Course

Indexing Vectors and Querying VectorStore Retrievers

Once text documents are split into chunks, we must convert them into vector numbers and save them into a database. LangChain integrates with Vector Stores to store these arrays and provides Retrievers to query them.


1. Vector Database Workflow

graph TD
    A[Text chunks] -->|OpenAIEmbeddings| B[Compute 1536-dim vector values]
    B -->|Index| C[Vector Database: MemoryVectorStore]
    D[User search query] -->|Embeddings| E[Query vector]
    E -->|Cosine similarity| C
    C -->|Return top K docs| F[Retrieved Context Docs]

2. Setting Up MemoryVectorStore with OpenAI Embeddings

For fast local testing, use the built-in in-memory vector store database:

// src/services/vectorService.ts
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Document } from "@langchain/core/documents";

// Initialize embeddings calculator
const embeddings = new OpenAIEmbeddings({
  modelName: "text-embedding-3-small",
});

export async function indexAndRetrieveDocs(chunks: Document[], query: string) {
  // 1. Create vector store database and index all chunks
  const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);

  // 2. Convert vector store into a Retriever node query helper
  const retriever = vectorStore.asRetriever({
    k: 2, // Limit search results: return top 2 matching chunks
  });

  // 3. Search query to return matching documents
  const relevantDocs = await retriever.invoke(query);

  console.log("Top matching chunk content:", relevantDocs[0]?.pageContent);
  return relevantDocs;
}

3. Production Vector Stores

In production environments, memory vector stores clear their contents on server redeployment. Swap the MemoryVectorStore integration adapter with persistent external cloud providers (such as Pinecone, Supabase PGVector, or Chroma) to maintain your indexed database.

Published on Last updated: