feat: add notebook explaining ChatCompletion to SFT transformation #2138

jalateras · 2025-09-13T00:52:24Z

Summary

Added comprehensive documentation explaining how ChatCompletion training data is transformed internally
Created new notebook with step-by-step examples and visualizations
Addresses the knowledge gap about OpenAI's internal SFT processing pipeline

What Changed

Created a new Jupyter notebook Understanding_ChatCompletion_SFT_transformation.ipynb that explains:

How structured ChatCompletion messages are converted to linear sequences
The role of special tokens and message boundaries
Tokenization process and token ID conversion
Loss mask creation and why only assistant tokens contribute to training loss
Gradient flow during backpropagation
Practical implications for designing effective training data

Why This Matters

Users frequently ask about what happens "under the hood" when they provide training data in ChatCompletion format for fine-tuning. This documentation fills that gap by providing a detailed technical explanation with code examples that demonstrate each transformation step.

Testing

The notebook includes runnable code examples that:

Demonstrate the transformation pipeline
Show tokenization in action
Explain loss calculation with concrete examples
Provide visualization of the training process

Fixes #2075

Added comprehensive notebook demonstrating how OpenAI's fine-tuning framework internally transforms ChatCompletion-style training data into model-ready format for Supervised Fine-Tuning. Covers message concatenation, tokenization, loss masking, and training process visualization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add notebook explaining ChatCompletion to SFT transformation #2138

feat: add notebook explaining ChatCompletion to SFT transformation #2138

jalateras commented Sep 13, 2025

Uh oh!

Uh oh!

feat: add notebook explaining ChatCompletion to SFT transformation #2138

Are you sure you want to change the base?

feat: add notebook explaining ChatCompletion to SFT transformation #2138

Conversation

jalateras commented Sep 13, 2025

Summary

What Changed

Why This Matters

Testing

Uh oh!

Uh oh!