You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/oss/python/integrations/chat/openai.mdx
+107Lines changed: 107 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,6 +89,14 @@ llm = ChatOpenAI(
89
89
90
90
See the @[`ChatOpenAI`] API Reference for the full set of available model parameters.
91
91
92
+
<Note>
93
+
**Token parameter deprecation**
94
+
95
+
OpenAI deprecated `max_tokens` in favor of `max_completion_tokens` in September 2024. While `max_tokens` is still supported for backwards compatibility, it's automatically converted to `max_completion_tokens` internally.
96
+
</Note>
97
+
98
+
---
99
+
92
100
## Invocation
93
101
94
102
```python
@@ -115,6 +123,8 @@ print(ai_msg.text)
115
123
J'adore la programmation.
116
124
```
117
125
126
+
---
127
+
118
128
## Streaming usage metadata
119
129
120
130
OpenAI's Chat Completions API does not stream token usage statistics by default (see API reference [here](https://platform.openai.com/docs/api-reference/completions/create#completions-create-stream_options)).
@@ -127,6 +137,8 @@ from langchain_openai import ChatOpenAI
@@ -222,6 +234,8 @@ When using an async callable for the API key, you must use async methods (`ainvo
222
234
223
235
</Accordion>
224
236
237
+
---
238
+
225
239
## Tool calling
226
240
227
241
OpenAI has a [tool calling](https://platform.openai.com/docs/guides/function-calling) (we use "tool calling" and "function calling" interchangeably here) API that lets you describe tools and their arguments, and have the model return a JSON object with a tool to invoke and the inputs to that tool. tool-calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more generally.
@@ -463,6 +477,8 @@ Name: do_math
463
477
464
478
</Accordion>
465
479
480
+
---
481
+
466
482
## Responses API
467
483
468
484
<Info>
@@ -1066,6 +1082,16 @@ for block in response.content_blocks:
1066
1082
The user is asking about 3 raised to the power of 3. That's a pretty simple calculation! I know that 3^3 equals 27, so I can say, "3 to the power of 3 equals 27." I might also include a quick explanation that it's 3 multiplied by itself three times: 3 × 3 × 3 = 27. So, the answer is definitely 27.
1067
1083
```
1068
1084
1085
+
<Tip>
1086
+
**Troubleshooting: Empty responses from reasoning models**
1087
+
1088
+
If you're getting empty responses from reasoning models like `gpt-5-nano`, this is likely due to restrictive token limits. The model uses tokens for internal reasoning and may not have any left for the final output.
1089
+
1090
+
Ensure `max_tokens` is set to `None` or increase the token limit to allow sufficient tokens for both reasoning and output generation.
1091
+
</Tip>
1092
+
1093
+
---
1094
+
1069
1095
## Fine-tuning
1070
1096
1071
1097
You can call fine-tuned OpenAI models by passing in your corresponding `modelName` parameter.
OpenAI has models that support multimodal inputs. You can pass in images, PDFs, or audio to these models. For more information on how to do this in LangChain, head to the [multimodal inputs](/oss/langchain/messages#multimodal) docs.
@@ -1196,6 +1224,8 @@ content_block = {
1196
1224
```
1197
1225
</Accordion>
1198
1226
1227
+
---
1228
+
1199
1229
## Predicted output
1200
1230
1201
1231
<Info>
@@ -1268,6 +1298,7 @@ public class User
1268
1298
```
1269
1299
1270
1300
Note that currently predictions are billed as additional tokens and may increase your usage and costs in exchange for this reduced latency.
1301
+
---
1271
1302
1272
1303
## Audio Generation (Preview)
1273
1304
@@ -1326,6 +1357,82 @@ history = [
1326
1357
second_output_message = llm.invoke(history)
1327
1358
```
1328
1359
1360
+
---
1361
+
1362
+
## Prompt caching
1363
+
1364
+
OpenAI's [prompt caching](https://platform.openai.com/docs/guides/prompt-caching) feature automatically caches prompts longer than 1024 tokens to reduce costs and improve response times. This feature is enabled for all recent models (`gpt-4o` and newer).
1365
+
1366
+
### Manual caching
1367
+
1368
+
You can use the `prompt_cache_key` parameter to influence OpenAI's caching and optimize cache hit rates:
1369
+
1370
+
```python
1371
+
from langchain_openai import ChatOpenAI
1372
+
1373
+
llm = ChatOpenAI(model="gpt-4o")
1374
+
1375
+
# Use a cache key for repeated prompts
1376
+
messages = [
1377
+
{"role": "system", "content": "You are a helpful assistant that translates English to French."},
1378
+
{"role": "user", "content": "I love programming."},
OpenAI offers a variety of [service tiers](https://platform.openai.com/docs/guides/flex-processing). The "flex" tier offers cheaper pricing for requests, with the trade-off that responses may take longer and resources might not always be available. This approach is best suited for non-critical tasks, including model testing, data enhancement, or jobs that can be run asynchronously.
0 commit comments