The request_handler.py module contains the RequestHandler class, which is responsible for preparing and processing requests to the GROQ API. It sits between the GroqClient and the QueueManager, validating requests and managing token counts.
The RequestHandler handles several key responsibilities:
- Validating request parameters
- Managing token counting for both prompt and completion using the
TokenCounter - Checking token limits against model context windows
- Preparing request payloads for the API
- Queueing requests with the appropriate priority through the
QueueManager - Providing request cancellation capabilities
For exception handling details, see the Exceptions Guide.
class RequestHandler:
def __init__(self,
queue_manager: QueueManager,
models_config: Dict[str, Dict[str, Any]]) -> None:
...The RequestHandler class prepares and processes GROQ API requests, ensuring they meet all requirements before being queued.
def __init__(self,
queue_manager: QueueManager,
models_config: Dict[str, Dict[str, Any]]) -> None:Initializes the RequestHandler with a queue manager and model configurations.
Parameters:
queue_manager(QueueManager): The queue manager to use for request processingmodels_config(Dict[str, Dict[str, Any]]): Dictionary of model configurations
async def prepare_chat_request(self,
model_name: str,
messages: List[Dict[str, str]],
temperature: float = 0.7,
max_tokens: Optional[int] = None,
priority: str = "low",
callback: Optional[Callable[[Dict[str, Any]], Awaitable[None]]] = None) -> str:Prepares and queues a chat completion request.
Parameters:
model_name(str): Name of the model to usemessages(List[Dict[str, str]]): List of message dictionaries with 'role' and 'content'temperature(float): Sampling temperature (0.0 to 1.0)max_tokens(Optional[int]): Maximum number of tokens to generatepriority(str): Priority level for the request ("high", "normal", or "low")callback(Optional[Callable]): Optional async callback function to be called with the API response
Returns:
str: Request ID that can be used to track or cancel the request
Raises:
ModelNotFoundException: If the model is not foundTokenLimitExceededException: If the token limit is exceeded
async def prepare_completion_request(self,
model_name: str,
prompt: str,
temperature: float = 0.7,
max_tokens: Optional[int] = None,
priority: str = "low",
callback: Optional[Callable[[Dict[str, Any]], Awaitable[None]]] = None) -> str:Prepares and queues a text completion request.
Parameters:
model_name(str): Name of the model to useprompt(str): Text prompttemperature(float): Sampling temperature (0.0 to 1.0)max_tokens(Optional[int]): Maximum number of tokens to generatepriority(str): Priority level for the request ("high", "normal", or "low")callback(Optional[Callable]): Optional async callback function to be called with the API response
Returns:
str: Request ID that can be used to track or cancel the request
Raises:
ModelNotFoundException: If the model is not foundTokenLimitExceededException: If the token limit is exceeded
async def cancel_request(self, request_id: str) -> bool:Cancels a request by ID.
Parameters:
request_id(str): The ID of the request to cancel
Returns:
bool: True if the request was cancelled, False otherwise
def get_available_models(self) -> List[str]:Gets a list of available models.
Returns:
List[str]: List of available model names
def _get_model_config(self, model_name: str) -> Dict[str, Any]:Gets the configuration for a specific model.
Parameters:
model_name(str): Name of the model
Returns:
Dict[str, Any]: Model configuration
Raises:
ModelNotFoundException: If the model is not found in the configuration
def _validate_token_limits(self,
model_name: str,
token_counts: Dict[str, int],
max_tokens: Optional[int] = None) -> None:Validates that the request is within token limits for the model.
Parameters:
model_name(str): Name of the modeltoken_counts(Dict[str, int]): Token count dictionary with prompt_tokens, completion_tokens, and total_tokensmax_tokens(Optional[int]): Maximum number of tokens in completion (overrides max_tokens in request)
Raises:
TokenLimitExceededException: If the token limit is exceeded
The RequestHandler uses several key components:
- TokenCounter: For estimating token usage of requests
- QueueManager: For queuing requests with appropriate priorities
- Model Configuration: For validating requests against model limits
# This is typically used internally by the GroqClient class
async def example_usage(queue_manager, models_config):
# Initialize handler
request_handler = RequestHandler(queue_manager, models_config)
# Prepare a chat request
request_id = await request_handler.prepare_chat_request(
model_name="llama-3.1-8b-instant",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the capital of France?"}
],
priority="high",
callback=async_callback_function
)
# Later, if needed, cancel the request
success = await request_handler.cancel_request(request_id)The RequestHandler sits between the GroqClient and the QueueManager:
- It receives requests from the
GroqClient - It validates requests using model configurations
- It checks token limits using the
TokenCounter - It forwards valid requests to the
QueueManager - It provides an interface for cancelling requests
- GroqClient Documentation - Main client interface
- QueueManager Documentation - Request queue management
- TokenCounter Documentation - Token counting utilities
- Implementation Examples - Usage examples
- Package Exports - Complete list of package exports