Loading Documents from Files using Document Loaders
To feed custom business files (such as sales sheets or markdown blogs) into LLM context prompts, you must load them into memory first. LangChain uses Document Loaders to read file inputs and format them into standard Document array interfaces.
1. Using TextLoader for Raw Texts
Import TextLoader to read plain text or markdown files:
// src/services/loadText.ts
import { TextLoader } from "langchain/document_loaders/fs/text";
export async function parseLocalText(filePath: string) {
// 1. Instantiate loader pointing to a local file
const loader = new TextLoader(filePath);
// 2. Read the file
const docs = await loader.load();
// docs is an array of Document objects containing pageContent and metadata
console.log("Characters loaded:", docs[0].pageContent.length);
console.log("Source filepath metadata:", docs[0].metadata.source);
return docs;
}2. Using CSVLoader for Tabular Data
To parse spreadsheet-style tables, install the csv parser dependency, then instantiate CSVLoader:
# Install parsing dependency
npm install papaparse @types/papaparseRead structured CSV datasets:
// src/services/loadCsv.ts
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";
export async function parseLocalCsv(filePath: string) {
const loader = new CSVLoader(filePath, {
column: "description", // Specify target column to serve as pageContent
});
const docs = await loader.load();
console.log("Total rows parsed:", docs.length);
return docs;
}3. Standard Document Object Layout
A parsed Document object always follows this standard layout:
interface Document {
pageContent: string; // The raw text extracted from the file
metadata: {
source: string; // The file path reference
line?: number; // Optional line index trackers
};
}Published on Last updated: