Skip to content

Development Workflow

Milo Webster edited this page Jul 7, 2020 · 5 revisions

Accessing Dev Machine

SSH into the dev-01 development VM with ssh [username]@34.105.63.9. If this command hangs for more than a couple seconds, the development machine is probably off. See Turning on Development VM to turn it on.

Turning On Resources

Development VM

If you set up the Google Cloud SDK locally, run gcloud compute instances start dev-01 --zone us-west1-b. Note that the zone is only necessary if you set the default zone to another zone.

If you haven't set up the Google Cloud SDK, you can turn on the dev machine by going to the VM instances page on the GCP web console, selecting the dev-01 instance, and clicking START at the top of this page.

Kubeflow GKE Node

Only do this if you need to train on the cluster. On the dev machine, run gcloud container clusters resize kf-train-6 --node-pool kf-train-6-cpu-pool-v1 --num-nodes 1. Don't forget to turn it off after you're done using it.

Python Environment

Start the python virtual environment with the following command:
conda activate mlpipeline

Deploying & Testing changes

Local to the VM

To test changes to the component locally on the development machine (dev-01), run ./run_local.sh , which just executes the component’s main python script.

Remote on cluster

To test changes to the component remotely on the Kubernetes/Kubeflow cluster, first run ./build_image.sh. This will build and upload a Docker image containing your latest source code. Details on the next steps of deployment are listed below.

Note: Don’t set the image tag to “latest” when committing code or when deploying a real experiment. If you do, the pipeline will use the latest built image for the component, which may be a different version than you want to run.

To test a new pipeline on the cluster, you will need to update or upload a new pipeline using the GUI frontend.
Do this as follows:

  1. On the VM run python baseline_repro_pipeline.py to generate the pipeline file baseline_repro_pipeline.py.tar.gz
  2. scp this file to your local machine (can we do this from the VM?)
  3. Go to the Kubeflow frontend
  4. Click Pipelines
  5. Click baseline_repro_pipeline
  6. Click + Upload Version
  7. Select Upload File and choose baseline_repro_pipeline.py.tar.gz
  8. Click Create
  9. Click + Create Run
  10. Set parameters for this run & click Start

Turning Off Resources

We should turn off the Kubeflow node and the dev machine when we’re not using them. The dev machine will turn off automatically (though you can preempt this), but the Kubeflow node will not.

Development VM

The instance will automatically boot your ssh session after 30 minutes of inactivity, and after another 10 minutes of no user being logged in over ssh, will turn off the instance.

To manually turn off the dev machine, first run w to make sure no one else has active SSH sessions. If they don't, run sudo shutdown -h now to immediately shutdown the dev machine.

Kubeflow GKE Node

You can turn off the kubeflow node by running the following from inside the dev machine:

  • gcloud container clusters resize kf-train-6 --node-pool kf-train-6-cpu-pool-v1 --num-nodes 0