In this milestone, you will learn how to use a voice recognition neural network, Whisper, and Text to Speech system, Espeak, and run LlaMa Large Language Model by Meta on Nvidia Jetson Xavier NX.
- Clone the repository
git clone https://github.com/CS7389K/Milestone-5.git
cd Milestone-5- ROS2 Foxy requires Ubuntu 20.04, so ensure it's what you're using. If you're using windows, run the following to use WSL:
wsl --install -d Ubuntu-20.04If you need to move WSL to a different drive (e.g. the F drive):
wsl --manage Ubuntu-20.04 --move F:\WSL
- Install Foxy
sh install-ros2-foxy-desktop.shEnsure you are in the project's root directory, then run:
colcon build --symlink-install
. install/setup.shUsing ESpeak and Whisper only:
ros2 run milestone5 server --ros-args -p use_espeak:=true use_whisper:=true -pUsing Llama's instruct model (for chat model, set llama_instruct:=false)
ros2 run milestone5 server --ros-args -p use_espeak:=true use_llama:=true -p llama_instruct:=true-
It is not advised to use both Whisper and Llama due to GPU, CPU, and RAM limitations
-
If
llama_model_pathis specified, it will use that. If not, the system will use the following fromLlamaBackendbased on whetherllama_instructistrueorfalse:_MODEL_CHAT_PATH = "/home/nvidia/llama.cpp/models/llama-2-7b-chat.Q4_K_M.gguf" _MODEL_INSTRUCT_PATH = "/home/nvidia/llama.cpp/models/llama-2-7b-instruct-q4_0.gguf"
ros2 run milestone5 client --ros-args -p use_espeak:=true use_llama:=false -p use_whisper:=true