You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/aipc/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -229,7 +229,7 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL
229
229
```bash
230
230
curl http://${host_ip}:9000/v1/chat/completions\
231
231
-X POST \
232
-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
232
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/xeon/README.md
+17-4Lines changed: 17 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -438,18 +438,31 @@ docker compose -f compose_vllm.yaml up -d
438
438
This service depends on above LLM backend service startup. It will be ready after long time, to wait for them being ready in first startup.
439
439
440
440
```bash
441
+
# TGI service
441
442
curl http://${host_ip}:9000/v1/chat/completions\
442
443
-X POST \
443
-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444
445
-H 'Content-Type: application/json'
445
446
```
446
447
448
+
For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
449
+
450
+
```bash
451
+
# vLLM Service
452
+
curl http://${your_ip}:9000/v1/chat/completions \
453
+
-X POST \
454
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
455
+
-H 'Content-Type: application/json'
456
+
```
457
+
458
+
For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/cpu/xeon/README_qdrant.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -304,7 +304,7 @@ docker compose -f compose_qdrant.yaml up -d
304
304
```bash
305
305
curl http://${host_ip}:6047/v1/chat/completions\
306
306
-X POST \
307
-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
307
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
Copy file name to clipboardExpand all lines: ChatQnA/docker_compose/intel/hpu/gaudi/README.md
+26-3Lines changed: 26 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -442,18 +442,41 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
442
442
7. LLM Microservice
443
443
444
444
```bash
445
+
# TGI service
446
+
curl http://${host_ip}:9000/v1/chat/completions\
447
+
-X POST \
448
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
449
+
-H 'Content-Type: application/json'
450
+
```
451
+
452
+
For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
453
+
454
+
```bash
455
+
# vLLM Service
445
456
curl http://${host_ip}:9000/v1/chat/completions \
457
+
-X POST \
458
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
459
+
-H 'Content-Type: application/json'
460
+
```
461
+
462
+
For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
463
+
464
+
```bash
465
+
# vLLM-on-Ray Service
466
+
curl http://${your_ip}:9000/v1/chat/completions \
446
467
-X POST \
447
-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
468
+
-d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
448
469
-H 'Content-Type: application/json'
449
470
```
450
471
472
+
For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
0 commit comments