unity_python\Assets\StreamingAssets\python\pytorch_models/*.pth
- Execute python on unity
- TCPIP between unity and python
- Deep Reinforcement Learning(DRL) without Unity ML-Agent
- python version: 3.9.12(64-bit)
- (option) Do Not Add Python 3.9 to path
C:\Users\[username]\AppData\Local\Programs\Python\Python39/python.exe
File-Open Folder
C:\Users\[username]\AppData\Local\Programs\Python\Python39\python.exe -m venv .venv
Assets/StreamingAssets/python/.venv/
.venv must exist in build path StreamingAssets folder can align both editor and build environment.
In my experience, It runs well without activate
but for whom need it..
set VIRTUAL_ENV=[Absolutepath]\.venv
=>
set VIRTUAL_ENV=%cd%\..\..\.venv
install numpy for just module import test
CUDA Toolkit 12.1 Downloads | NVIDIA Developer
Cuda version must be matched with pytorch cude version(specified on pytorch.org) In this project, I installed CUDA 12.1
[venv dir]\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
or
cd to parent of ./venv and
.\.venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
input:
ball position (x,y,z), ball speed (x,y,z), target position(x,y,z), current plate Euler angle (rx,rz), rotation speed(rad/s?)
3+3+3+2+1 = 12
PythonProcess.StandardInput.WriteLine($"{Vector3ToString(BallPosition)},{Vector3ToString(BallSpeed)},{PlateRX},{PlateRZ},{Vector3ToString(TargetPosition)},{PlateAngularSpeed}");output:
5-Action (RX+, RX-, RZ+, RZ-, NOTHING)
DQN:
class DQN(nn.Module):##v5.1
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 64)
self.relu = nn.ReLU()
self.fc4 = nn.Linear(64, action_size)
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.relu(self.fc3(x))
return self.fc4(x)- trained 1000 episodes
- 50~70%
So far, the linear distance reward system has performed the best.
Reward += (TargetThreshold-dist)/10f;-
Add guide wall(negative reward when collision)
-
multi agent
-
condition:
self.optimizer = optim.Adam(self.net.parameters(), lr=0.001)
self.batch_size = 64
self.gamma = 0.99self.local_memory = ReplayMemory(10000)
self.eps_start = 0.99
self.eps_end = 0.05
self.eps_decay = 50
self.steps_done = 0DQN:
class DQN(nn.Module):##v6
def __init__(self, state_size, action_size):
super(DQN, self).__init__()
self.fc1 = nn.Linear(state_size, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 64)
self.relu = nn.ReLU()
self.fc4 = nn.Linear(64, action_size)
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.relu(self.fc3(x))
return self.fc4(x)So far, the linear distance reward system has performed the best.
Reward += (TargetThreshold-dist)/10f;PlateRX = this.PlateRX/10f,
PlateRZ = this.PlateRZ/10f,


