Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6983a7a
Main Readme updating for latest news
abukhoy May 21, 2025
84c7a43
Main Readme updating for latest news
abukhoy May 21, 2025
b172f89
Merge branch 'main' into docs-update
abukhoy May 30, 2025
bd1000c
docs modified
abukhoy May 30, 2025
195740e
Merge branch 'main' into docs-update
abukhoy Jun 9, 2025
de93706
Merge branch 'main' into docs-update
abukhoy Jun 19, 2025
dc7ae55
Readme update and validate
abukhoy Jun 19, 2025
aa5878b
Merge branch 'main' into docs-update
abukhoy Jun 23, 2025
4cbc841
Merge branch 'main' into docs-update
abukhoy Jun 24, 2025
50302ab
supported features updated
abukhoy Jun 24, 2025
627f7a2
Merge branch 'main' into docs-update
abukhoy Jun 25, 2025
c2280ba
Merge branch 'main' into docs-update
abukhoy Jun 27, 2025
0ca718e
CB, single and dual qpc column added in validation doc
abukhoy Jun 27, 2025
2353a76
CB, single and dual qpc column added in validation doc
abukhoy Jun 27, 2025
2c42d36
source/introduction modified
abukhoy Jun 30, 2025
8b3c362
source/validate modified
abukhoy Jun 30, 2025
dfda020
Merge branch 'main' into docs-update
abukhoy Jul 2, 2025
3e3656e
Comments are addressed
abukhoy Jul 2, 2025
d86b836
Comments are addressed
abukhoy Jul 2, 2025
56f56a9
comments are adressed
abukhoy Jul 2, 2025
8352e14
Merge branch 'quic:main' into docs-update
abukhoy Jul 8, 2025
b88d970
release docs added and granite MOE removed from validate list
abukhoy Jul 8, 2025
7e46180
release dcos modified
abukhoy Jul 8, 2025
50db4bc
release docs added for 1.20
abukhoy Jul 8, 2025
d16eeb3
Merge branch 'main' into docs-update
abukhoy Jul 10, 2025
640a61a
comments are adrressed
abukhoy Jul 10, 2025
03ccbb8
Merge branch 'main' into docs-update
abukhoy Jul 10, 2025
cb566e8
granite vision removed from docs
abukhoy Jul 11, 2025
271e623
granite vision removed from docs
abukhoy Jul 11, 2025
effac64
Comments Addressed
abukhoy Jul 14, 2025
aa77cc8
Merge branch 'main' into docs-update
abukhoy Jul 14, 2025
01a07fa
Comments Addressed
abukhoy Jul 14, 2025
cba26d3
Comments Addressed
abukhoy Jul 14, 2025
2467cde
Comments Addressed
abukhoy Jul 14, 2025
9cd323c
Merge branch 'main' into docs-update
abukhoy Jul 14, 2025
fa848c8
Merge branch 'main' into docs-update
abukhoy Jul 14, 2025
d58d224
Merge branch 'quic:main' into docs-update
abukhoy Sep 3, 2025
82b7e5a
Docs are updated for Auto class
abukhoy Sep 3, 2025
81d2ae0
formatting quick start page
abukhoy Sep 5, 2025
e09606d
docs updating
abukhoy Sep 9, 2025
65272f2
formatting
abukhoy Sep 15, 2025
c118389
formatting
abukhoy Sep 17, 2025
3ff7eef
docstring updating
abukhoy Sep 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 48 additions & 15 deletions QEfficient/cloud/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,24 +25,57 @@ def main(
full_batch_size: Optional[int] = None,
):
"""
Helper function used by execute CLI app to run the Model on ``Cloud AI 100`` Platform.

``Mandatory`` Args:
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.
:qpc_path (str): Path to the generated binary after compilation.
``Optional`` Args:
:device_group (List[int]): Device Ids to be used for compilation. if len(device_group) > 1. Multiple Card setup is enabled.``Defaults to None.``
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
:prompt (str): Sample prompt for the model text generation. ``Defaults to None.``
:prompts_txt_file_path (str): Path to txt file for multiple input prompts. ``Defaults to None.``
:generation_len (int): Number of tokens to be generated. ``Defaults to None.``
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to Constants.CACHE_DIR.``
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
Main function for the QEfficient execution CLI application.

This function serves as the entry point for running a compiled model
(QPC package) on the Cloud AI 100 Platform. It loads the necessary
tokenizer and then orchestrates the text generation inference.

Parameters
----------
model_name : str
Hugging Face Model Card name (e.g., ``gpt2``) for loading the tokenizer.
qpc_path : str
Path to the generated binary (QPC package) after compilation.

Other Parameters
----------------
device_group : List[int], optional
List of device IDs to be used for inference. If `len(device_group) > 1`,
a multi-card setup is enabled. Default is None.
local_model_dir : str, optional
Path to custom model weights and config files, used if not loading tokenizer
from Hugging Face Hub. Default is None.
prompt : str, optional
Sample prompt(s) for the model text generation. For batch size > 1,
pass multiple prompts separated by a pipe (``|``) symbol. Default is None.
prompts_txt_file_path : str, optional
Path to a text file containing multiple input prompts, one per line. Default is None.
generation_len : int, optional
Maximum number of tokens to be generated during inference. Default is None.
cache_dir : str, optional
Cache directory where downloaded HuggingFace files (like tokenizer) are stored.
Default is None.
hf_token : str, optional
HuggingFace login token to access private repositories. Default is None.
full_batch_size : int, optional
Ignored in this context as continuous batching is managed by the compiled QPC.
However, it might be passed through from CLI arguments. Default is None.

Example
-------
To execute a compiled model from the command line:

.. code-block:: bash

python -m QEfficient.cloud.execute OPTIONS
python -m QEfficient.cloud.execute --model-name gpt2 --qpc-path /path/to/qpc/binaries --prompt "Hello world"

For multi-device inference:

.. code-block:: bash

python -m QEfficient.cloud.execute --model-name gpt2 --qpc-path /path/to/qpc/binaries --device-group "[0,1]" --prompt "Hello | Hi"

"""
tokenizer = load_hf_tokenizer(
pretrained_model_name_or_path=(local_model_dir if local_model_dir else model_name),
Expand Down
73 changes: 52 additions & 21 deletions QEfficient/cloud/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,32 @@ def get_onnx_model_path(
local_model_dir: Optional[str] = None,
):
"""
exports the model to onnx if pre-exported file is not found and returns onnx_model_path

``Mandatory`` Args:
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.
``Optional`` Args:
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to None.``
:tokenizer (Union[PreTrainedTokenizer, PreTrainedTokenizerFast]): Pass model tokenizer. ``Defaults to None.``
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
Exports the PyTorch model to ONNX format if a pre-exported file is not found,
and returns the path to the ONNX model.

This function loads a Hugging Face model via QEFFCommonLoader, then calls
its export method to generate the ONNX graph.

Parameters
----------
model_name : str
Hugging Face Model Card name (e.g., ``gpt2``).

Other Parameters
----------------
cache_dir : str, optional
Cache directory where downloaded HuggingFace files are stored. Default is None.
hf_token : str, optional
HuggingFace login token to access private repositories. Default is None.
full_batch_size : int, optional
Sets the full batch size to enable continuous batching mode. Default is None.
local_model_dir : str, optional
Path to custom model weights and config files. Default is None.

Returns
-------
str
Path of the generated ONNX graph file.
"""
logger.info(f"Exporting Pytorch {model_name} model to ONNX...")

Expand All @@ -58,20 +74,35 @@ def main(
full_batch_size: Optional[int] = None,
) -> None:
"""
Helper function used by export CLI app for exporting to ONNX Model.

``Mandatory`` Args:
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.

``Optional`` Args:
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to None.``
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
Main function for the QEfficient ONNX export CLI application.

This function serves as the entry point for exporting a PyTorch model, loaded
via QEFFCommonLoader, to the ONNX format. It prepares the necessary
paths and calls `get_onnx_model_path`.

Parameters
----------
model_name : str
Hugging Face Model Card name (e.g., ``gpt2``).

Other Parameters
----------------
cache_dir : str, optional
Cache directory where downloaded HuggingFace files are stored. Default is None.
hf_token : str, optional
HuggingFace login token to access private repositories. Default is None.
local_model_dir : str, optional
Path to custom model weights and config files. Default is None.
full_batch_size : int, optional
Sets the full batch size to enable continuous batching mode. Default is None.

Example
-------
To export a model from the command line:

.. code-block:: bash

python -m QEfficient.cloud.export OPTIONS
python -m QEfficient.cloud.export --model-name gpt2 --cache-dir /path/to/cache

"""
cache_dir = check_and_assign_cache_dir(local_model_dir, cache_dir)
Expand Down
Loading