Back to roadmaps openai Course

Project: Automated Structured Content Summarizer

In this project, we will build a content processing pipeline that takes raw articles, parses the text, and guarantees output as structured JSON containing categories, keywords, and summaries.


1. Schema Definition Design

We enforce these attributes inside our output JSON Schema:

  • title: A clean formatted headline.
  • summary: A concise summary paragraph.
  • category: A string restricted to predefined values.
  • keywords: An array of lowercase strings representing keyword tags.

2. Implementing the Summarizer Pipeline

Create the backend parsing utility function:

// src/services/summarizer.ts
import { openai } from "../lib/openai";

const SummaryOutputSchema = {
  name: "article_summary",
  strict: true,
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      summary: { type: "string", description: "A three-sentence breakdown of key takeaways." },
      category: {
        type: "string",
        enum: ["Engineering", "Product Design", "Marketing", "Security"]
      },
      keywords: {
        type: "array",
        items: { type: "string" },
        description: "Max 5 lowercase keyword tags."
      }
    },
    required: ["title", "summary", "category", "keywords"],
    additionalProperties: false
  }
};

export async function summarizeText(articleText: string) {
  try {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        { 
          role: "system", 
          content: "You are a professional research compiler. Parse the raw text and structure it." 
        },
        { role: "user", content: articleText }
      ],
      // Force response structure compliance
      response_format: {
        type: "json_schema",
        json_schema: SummaryOutputSchema
      }
    });

    const outputJsonString = response.choices[0].message.content || "";
    
    // Safely parse JSON
    return JSON.parse(outputJsonString);
  } catch (err: any) {
    console.error("Summarization pipeline failed:", err.message);
    return null;
  }
}

3. Database Sync Integration

When a user submits an article link:

  1. Fetch the raw text content.
  2. Call summarizeText to generate the structured metadata payload.
  3. Save the result directly into PostgreSQL tables using Prisma:
const metadata = await summarizeText(rawText);

if (metadata) {
  await prisma.article.create({
    data: {
      title: metadata.title,
      summary: metadata.summary,
      category: metadata.category,
      tagsList: metadata.keywords, // Directly save the validated array
    }
  });
}
Published on Last updated: