Back to roadmaps openai Course

Table of Contents (13 guides)

Getting Started

Chat Completions

Function Calling

Structured Outputs

Practice Project

Resources

Tuning Model Parameters: Temperature, Top P, and Max Tokens

To control the creativity, length, and randomness of AI responses, configure these model parameters when requesting completions.

1. Temperature (Randomness vs Determinism)

Range: 0 to 2 (default is 1).
Behavior:
- Low Temperature (0.0 - 0.2): High predictability and accuracy. Ideal for database lookups, code generation, and factual summaries.
- High Temperature (0.8 - 1.5): High creativity and variety. Ideal for copywriting, creative fiction, or brainstorming.
Important: Do not modify both temperature and top_p simultaneously. Keep one at default values when adjusting the other.

2. Top P (Nucleus Sampling)

Range: 0 to 1 (default is 1).
Behavior:
- Instead of selecting randomly from all possible word tokens, the model only selects from tokens comprising the top percentage of probability mass.
- Setting top_p to 0.1 means the model only considers the top 10 percent highest probability tokens.

3. Token Limits and Control

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a joke." }],
  
  // 1. Set the maximum token count allowed in the response
  max_tokens: 150, 
  
  // 2. Control creativity
  temperature: 0.7, 
  
  // 3. Stop token sequences
  stop: ["\n", "User:"], 
});

max_tokens: Prevents the model from writing overly long responses, helping control your API credit consumption.
stop: Instructs the model to immediately cease generating content if a matching text pattern occurs.
presence_penalty: Encourages the model to talk about new topics.
frequency_penalty: Discourages the model from repeating identical words.

Published on Jun 16, 2026 Last updated: Jun 16, 2026

Previous Guide Chat Messages Structure: Roles and Configurations Next Guide Implementing Streaming Responses in Next.js