Tuning Model Parameters: Temperature, Top P, and Max Tokens
To control the creativity, length, and randomness of AI responses, configure these model parameters when requesting completions.
1. Temperature (Randomness vs Determinism)
- Range:
0to2(default is1). - Behavior:
- Low Temperature (0.0 - 0.2): High predictability and accuracy. Ideal for database lookups, code generation, and factual summaries.
- High Temperature (0.8 - 1.5): High creativity and variety. Ideal for copywriting, creative fiction, or brainstorming.
- Important: Do not modify both temperature and top_p simultaneously. Keep one at default values when adjusting the other.
2. Top P (Nucleus Sampling)
- Range:
0to1(default is1). - Behavior:
- Instead of selecting randomly from all possible word tokens, the model only selects from tokens comprising the top percentage of probability mass.
- Setting
top_pto0.1means the model only considers the top 10 percent highest probability tokens.
3. Token Limits and Control
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Tell me a joke." }],
// 1. Set the maximum token count allowed in the response
max_tokens: 150,
// 2. Control creativity
temperature: 0.7,
// 3. Stop token sequences
stop: ["\n", "User:"],
});max_tokens: Prevents the model from writing overly long responses, helping control your API credit consumption.stop: Instructs the model to immediately cease generating content if a matching text pattern occurs.presence_penalty: Encourages the model to talk about new topics.frequency_penalty: Discourages the model from repeating identical words.
Published on Last updated: