-
Notifications
You must be signed in to change notification settings - Fork 2
Development Workflow
SSH into the dev-01 development VM with ssh [username]@34.105.63.9. If this command hangs for more than a couple seconds, the development machine is probably off. See Turning on Development VM to turn it on.
If you set up the Google Cloud SDK locally, run gcloud compute instances start dev-01 --zone us-west1-b. Note that the zone is only necessary if you set the default zone to another zone.
If you haven't set up the Google Cloud SDK, you can turn on the dev machine by going to the VM instances page on the GCP web console, selecting the dev-01 instance, and clicking START at the top of this page.
Only do this if you need to train on the cluster. On the dev machine, run gcloud container clusters resize kf-train-6 --node-pool kf-train-6-cpu-pool-v1 --num-nodes 1. Don't forget to turn it off after you're done using it.
Start the python virtual environment with the following command:
conda activate mlpipeline
To test changes to the component locally on the development machine (dev-01), run ./run_local.sh , which just executes the component’s main python script.
To test changes to the component remotely on the Kubernetes/Kubeflow cluster, first run ./build_image.sh. This will build and upload a Docker image containing your latest source code. Details on the next steps of deployment are listed below.
Note: Don’t set the image tag to “latest” when committing code or when deploying a real experiment. If you do, the pipeline will use the latest built image for the component, which may be a different version than you want to run.
To test a new pipeline on the cluster, you will need to update or upload a new pipeline using the GUI frontend.
Do this as follows:
- On the VM run
python baseline_repro_pipeline.pyto generate the pipeline filebaseline_repro_pipeline.py.tar.gz - scp this file to your local machine (can we do this from the VM?)
- Go to the Kubeflow frontend
- Click
Pipelines - Click
baseline_repro_pipeline - Click
+ Upload Version - Select
Upload Fileand choosebaseline_repro_pipeline.py.tar.gz - Click
Create - Click
+ Create Run - Set parameters for this run & click
Start
We should turn off the Kubeflow node and the dev machine when we’re not using them. The dev machine will turn off automatically (though you can preempt this), but the Kubeflow node will not.
The instance will automatically boot your ssh session after 30 minutes of inactivity, and after another 10 minutes of no user being logged in over ssh, will turn off the instance.
To manually turn off the dev machine, first run w to make sure no one else has active SSH sessions. If they don't, run sudo shutdown -h now to immediately shutdown the dev machine.
You can turn off the kubeflow node by running the following from inside the dev machine:
gcloud container clusters resize kf-train-6 --node-pool kf-train-6-cpu-pool-v1 --num-nodes 0