Skip to content

Conversation

rjarun8235
Copy link

@rjarun8235 rjarun8235 commented Aug 23, 2025

Add GPT-5 Model Support

Fixes GPT-5 in MODEL_TO_ENCODING #428

Changes

  • Added GPT-5 model mapping: Maps gpt-5 to o200k_base encoding in MODEL_TO_ENCODING
  • Added test coverage: New test test_gpt5_encoding() verifies correct tokenization behavior
  • Verified token values: Confirmed tokens [24912, 2375] for "hello world"

Testing

  • ✅ All tests pass (34/34)
  • ✅ New GPT-5 test passes
  • ✅ Encoding behavior verified

Impact

Enables tiktoken to properly tokenize text for GPT-5 model using o200k_base encoding as requested in #428

@Sernight
Copy link

Hi! Would it be possible to add gpt-5-chat-latest to this PR while it is still open? 4o is like: 'chatgpt-4o-' 'o200k_base'. I think it can be added in the same way.

@rjarun8235
Copy link
Author

Hi! Would it be possible to add gpt-5-chat-latest to this PR while it is still open? 4o is like: 'chatgpt-4o-' 'o200k_base'. I think it can be added in the same way.

its there "gpt-5-": "o200k_base", doesnt work on gpt-5-chat-latest ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants