|
| 1 | +# Training with Imitation(Supervised Learning) |
| 2 | + |
| 3 | +This algorithm is basically trying to train the neural network to remember what the correct action is in different states. See [Unity's document](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Imitation-Learning.md) for more explanation. |
| 4 | + |
| 5 | +The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use supervised learning to train the neural network from how you are playing the game yourself. |
| 6 | + |
| 7 | +## Overall Steps |
| 8 | +1. Create a environment using ML-Agent API. See the [instruction from Unity](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md) |
| 9 | +3. Change the BrainType of your brain to `InternalTrainable` in inspector. |
| 10 | +2. Create a Trainer |
| 11 | + 1. Attach a `TrainerMimic.cs` to any GameObject. |
| 12 | + 2. Create a `TrainerParamsMimic` scriptable object with proper parameters in your project and assign it to the Params field in `TrainerMimic.cs`. |
| 13 | + 3. Assign the Trainer to the `Trainer` field of your Brain. |
| 14 | +3. Create a Model |
| 15 | + 1. Attach a `SupervisedLearningModel.cs` to any GameObject. |
| 16 | + 2. Create a `SupervisedLearningNetwork` scriptable object in your project and assign it to the Network field in `SupervisedLearningModel.cs`. |
| 17 | + 3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs` |
| 18 | + |
| 19 | +4. Create a Decision |
| 20 | + 1. You can either use PlayerDecision.cs directly if you want the neural network to learn from human playing the game, or inherit from [AgentDependentDecision](AgentDependentDeicision.md) if you want the agent to learn from other scripted AI. |
| 21 | + 2. Attach the decision script to the agent that you want to learn from and check the `useDecision` in inspector. |
| 22 | + |
| 23 | +5. Play! But some notes: |
| 24 | + * The trainer only collect data from agents with Decision attached to it. |
| 25 | + * Only when enough data is collected, it will start training(set the value in trainer parameters) |
| 26 | + * The `isCollectinData` field in trainer needs to be true to collect training data. |
| 27 | + |
| 28 | +## Explanation of fields in the inspector |
| 29 | +### TrainerMimic.cs |
| 30 | +* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also, |
| 31 | +* `parameters`: You need to assign this field with a TrainerParamsMimic scriptable object. |
| 32 | +* `continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training. |
| 33 | +* `checkpointPath`: the path of the checkpoint, including the file name. |
| 34 | +* `steps`: Just to show you the current step of the training. |
| 35 | +* 'isCollectingData': If the training is collecting training data from Agents with Decision. |
| 36 | +* `dataBufferCount`: Current collected data count. |
| 37 | + |
| 38 | +### TrainerParamsMimic |
| 39 | +* `learningRate`: Learning rate used to train the neural network. |
| 40 | +* `maxTotalSteps`: Max steps the trainer will be training. |
| 41 | +* `saveModelInterval`: The trained model will be saved every this amount of steps. |
| 42 | +* `batchSize`: Mini batch size when training. |
| 43 | +* `numIterationPerTrain`: How many batches to train for each step(fixed update). |
| 44 | +* `requiredDataBeforeTraining`: How many collected data count is needed before it start to traing the neural network. |
| 45 | +* `maxBufferSize`: Max buffer size of collected data. If the data buffer count exceeds this number, old data will be overrided. Set this to 0 to remove the limit. |
| 46 | + |
| 47 | +### SupervisedLearningModel.cs |
| 48 | +* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer. |
| 49 | +* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs. |
| 50 | +* `optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here. |
| 51 | + |
| 52 | +### SupervisedLearningNetworkSimple |
| 53 | +This is a simple implementation of SuperviseLearningNetowrk that you can create a plug it in as a neural network definition for any SupervisedLearningModel. |
| 54 | +- `hiddenLayers`: Hidden layers of the network. The array size if the number of hidden layers. In each element, there are for parameters that defines each layer. Those do not have default values, so you have to fill them. |
| 55 | + - size: Size of this hidden layer. |
| 56 | + - initialScale: Initial scale of the weights. This might be important for training.Try something larger than 0 and smaller than 1. |
| 57 | + - useBias: Whether Use bias. Usually true. |
| 58 | + - activationFunction: Which activation function to use. Usually Relu. |
| 59 | +- `outputLayerInitialScale`/`visualEncoderInitialScale`: Initial scale of the weights of the output layers. |
| 60 | +- `outputLayerBias`/`visualEncoderBias`: Whether use bias. |
| 61 | +- `useVarianceForCoutinuousAction`: Whether also output a variance of the action if the action space is continuous. |
| 62 | +- `minStd`: If it does outputs a variance of the action, the standard deviation will always be larger than this value. |
| 63 | + |
| 64 | +## Training using GAN |
0 commit comments