Skip to content

Conversation

SherronBurtint
Copy link

Running python3 tools/end_to_end_test_llama.py, an error was prompted, [400] HTTP end point doesn't support models with decoupled transaction policy

]

try:
result = client.infer(model_name, inputs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the llama model is decoupled , so shouldn't the call be async_stream_infer instead of infer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I correct it to start_stream(call_back()) and async_stream_infer() and use old input(HttpInferInput), but got an error "TypeError: Not a cmessage" from tritonclient/grpc/_utils.py

SamuraiBUPT and others added 19 commits June 26, 2023 18:09
fix the int8_mode and decoupled mode backend support
when i follow the llame_guide.md to build this lib ,this error occur

```bash
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc: In member function 'std::shared_ptr<AbstractTransformerModel> triton::backend::fastertransformer_backend::ModelState::ModelFactory(triton::common::TritonJson::Value&, const string&)':
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc:340:98: error: 'int8_mode' was not declared in this scope
  340 |       ft_model = std::make_shared<LlamaTritonModel<__nv_bfloat16>>(tp, pp, custom_ar, model_dir, int8_mode);
      |                                                                                                  ^~~~~~~~~
[100%] Linking CXX executable ../../../../../bin/multi_gpu_gpt_interactive_example
[100%] Built target gptneox_example
[100%] Built target multi_gpu_gpt_triton_example
[100%] Built target llama_example
/workspace/build/fastertransformer_backend/src/libfastertransformer.cc:343:90: error: 'int8_mode' was not declared in this scope
  343 |       ft_model = std::make_shared<LlamaTritonModel<float>>(tp, pp, custom_ar, model_dir, int8_mode);
```
i think the variable should fix.  after i move it , the build success
Update libfastertransformer.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants