Update the hostfile so that the IP addresses are correct. You can test this with
uv run mlx.launch --hostfile hostfile.json test_bandwidth.py
The script should take 10 seconds or so to run.
The speculator is the host device and the verifier is the remote device by default.
time uv run ssd-mlx --draft mlx-community/Llama-3.2-1B-Instruct-4bit --target mlx-community/Meta-Llama-3.1-70B-Instruct-4bit --prompt "write a python function called filter_by_substring that takes a list of strings and a substring and returns only the strings that contain that substring. Example: filter_by_substring(['abc', 'bacd', 'cde', 'array'], 'a') returns ['abc', 'bacd', 'array']" --budget 8 --gamma 4
-
ssd-mlxlets you pass in a prompt and see the result in the cli. There is no support for streaming so the response appears all at once after waiting a few secons. -
ssd-mlx-servewill host a server at 0.0.0.0:8080.