The InstructLab Training library supports structured reasoning traces through the reasoning_content field in message samples. This feature enables training models that can separate their thinking process from their final output.
The reasoning_content field is an optional addition to the standard message format that allows you to include the model's internal reasoning process alongside the final response. This is particularly useful for:
- Training reasoning-capable models that show their work
- Supporting models that need to generate step-by-step reasoning
- Enabling chain-of-thought style training data
- Separating internal thinking from user-facing responses
{
"role": "assistant",
"content": "The answer is 42."
}{
"role": "assistant",
"content": "The answer is 42.",
"reasoning_content": "Let me think about this step by step. The question asks for the meaning of life, and according to The Hitchhiker's Guide to the Galaxy, the answer is 42."
}When processing messages during training:
- Unmasking Rules: Both
contentandreasoning_contentfields follow the same unmasking rules based on the message role - Template Integration: Both fields are processed by the chat template and included in the tokenized output
- Token Wrapping: If a role is configured to be unmasked, both fields (when present) are wrapped with unmask tokens
- Independent Fields: Either field can exist independently - messages can have only
content, onlyreasoning_content, or both
{
"messages": [
{
"role": "user",
"content": "What is 15 * 23?"
},
{
"role": "assistant",
"reasoning_content": "I need to multiply 15 by 23. Let me break this down: 15 * 23 = 15 * (20 + 3) = 15 * 20 + 15 * 3 = 300 + 45 = 345",
"content": "15 * 23 = 345"
}
]
}{
"messages": [
{
"role": "user",
"content": "Solve this math problem step by step: 2x + 5 = 13"
},
{
"role": "assistant",
"reasoning_content": "I need to solve for x. First, I'll subtract 5 from both sides: 2x = 8. Then divide by 2: x = 4.",
"content": "To solve 2x + 5 = 13:\n1. Subtract 5 from both sides: 2x = 8\n2. Divide by 2: x = 4\n\nTherefore, x = 4."
}
]
}{
"messages": [
{
"role": "user",
"content": "Think about the implications of AI safety."
},
{
"role": "assistant",
"reasoning_content": "This is a complex topic that requires careful consideration of multiple factors including alignment, capability control, and social implications..."
}
]
}During data processing, the library:
- Wraps both
contentandreasoning_contentwith special unmask tokens (<|UNMASK_BEGIN|>,<|UNMASK_END|>,<|UNMASK_REASONING_BEGIN|>,<|UNMASK_REASONING_END|>) - Applies the chat template to the combined message content
- Processes the tokenized sequence to create appropriate labels for training
- Removes the special unmask tokens from the final training data
The library validates that:
- Both
contentandreasoning_contentmust be strings if present - Special unmask tokens are properly processed and removed
- The final training data contains no residual unmask tokens
Common errors and their meanings:
"unmasking non-string data types is currently unsupported": Thecontentfield contains non-string data"received an entry for reasoning_content which was not a string": Thereasoning_contentfield contains non-string data
The reasoning_content field respects all existing unmasking policies:
- When
unmask=trueis set on a sample, both fields are unmasked for non-system roles - When
unmask=false(default), only assistant role messages are unmasked - Custom unmask role configurations work with both fields
The reasoning_content is unsupported by the legacy chat templates and will not be rendered.
The feature is fully backward compatible:
- Existing datasets without
reasoning_contentcontinue to work unchanged - All existing training configurations and arguments remain valid
The library includes comprehensive tests for reasoning content functionality:
- Unit tests for message wrapping and processing
- Integration tests with real tokenizers
- Validation tests for error conditions
- Backward compatibility tests
-
Always processed when present: If
reasoning_contentexists in a message, it will always be processed and unmasked as long as the message role is targeted for unmasking. This ensures that reasoning traces are properly included in the training data without requiring additional configuration. -
DeepSeek R1 and Qwen3 compatibility: Models using the DeepSeek R1 thought processor (such as Qwen3) must supply their thinking traces in the
reasoning_contentfield to be processed correctly. Failure to do so may result in improper handling of reasoning tokens and suboptimal training performance. -
Separate token handling: The library uses distinct unmask tokens for reasoning content (
<|UNMASK_REASONING_BEGIN|>and<|UNMASK_REASONING_END|>) versus regular content (<|UNMASK_BEGIN|>and<|UNMASK_END|>), allowing for proper differentiation during training.
- Consistent Usage: When applicable, use
reasoning_contentconsistently within a dataset for best results - Clear Separation: Keep reasoning traces separate from final outputs for clarity
- Template Compatibility: Ensure your chat template properly handles both fields
- Validation: Test your data processing pipeline with small samples before full training
To add reasoning content support to existing datasets:
- Add
reasoning_contentfields to relevant messages - Ensure content is in string format
- Test with a small sample using the data processing pipeline
- Verify that unmask tokens are properly processed
No changes to training arguments or configuration are required.