Back to roadmaps langchain Course

Loading Documents from Files using Document Loaders

To feed custom business files (such as sales sheets or markdown blogs) into LLM context prompts, you must load them into memory first. LangChain uses Document Loaders to read file inputs and format them into standard Document array interfaces.


1. Using TextLoader for Raw Texts

Import TextLoader to read plain text or markdown files:

// src/services/loadText.ts
import { TextLoader } from "langchain/document_loaders/fs/text";

export async function parseLocalText(filePath: string) {
  // 1. Instantiate loader pointing to a local file
  const loader = new TextLoader(filePath);

  // 2. Read the file
  const docs = await loader.load();

  // docs is an array of Document objects containing pageContent and metadata
  console.log("Characters loaded:", docs[0].pageContent.length);
  console.log("Source filepath metadata:", docs[0].metadata.source);
  
  return docs;
}

2. Using CSVLoader for Tabular Data

To parse spreadsheet-style tables, install the csv parser dependency, then instantiate CSVLoader:

# Install parsing dependency
npm install papaparse @types/papaparse

Read structured CSV datasets:

// src/services/loadCsv.ts
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

export async function parseLocalCsv(filePath: string) {
  const loader = new CSVLoader(filePath, {
    column: "description", // Specify target column to serve as pageContent
  });

  const docs = await loader.load();
  console.log("Total rows parsed:", docs.length);
  return docs;
}

3. Standard Document Object Layout

A parsed Document object always follows this standard layout:

interface Document {
  pageContent: string; // The raw text extracted from the file
  metadata: {
    source: string; // The file path reference
    line?: number;  // Optional line index trackers
  };
}
Published on Last updated: