diff --git a/content/en/tutorials/integration-tutorials/pytorch.md b/content/en/tutorials/integration-tutorials/pytorch.md index 1034ea1b3c..628b8b24dc 100644 --- a/content/en/tutorials/integration-tutorials/pytorch.md +++ b/content/en/tutorials/integration-tutorials/pytorch.md @@ -6,50 +6,73 @@ menu: title: PyTorch weight: 1 --- -{{< cta-button colabLink="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Simple_PyTorch_Integration.ipynb" >}} -Use [W&B](https://wandb.ai) for machine learning experiment tracking, dataset versioning, and project collaboration. +# Integrate PyTorch with Weights & Biases + +Use [Weights & Biases (W&B)](https://wandb.ai) to track machine learning experiments, version datasets, and collaborate on projects. + +{{< cta-button colabLink="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Simple_PyTorch_Integration.ipynb" >}} {{< img src="/images/tutorials/huggingface-why.png" alt="Benefits of using W&B" >}} -## What this notebook covers +## Overview + +This tutorial shows you how to integrate W&B with PyTorch. After you complete it, you'll be able to: -We show you how to integrate W&B with your PyTorch code to add experiment tracking to your pipeline. +- Log hyperparameters and metadata +- Track model gradients, parameters, and metrics +- Save models and artifacts +- Automate hyperparameter optimization with W&B Sweeps {{< img src="/images/tutorials/pytorch.png" alt="PyTorch and W&B integration diagram" >}} +## Before you begin + +You need the following: + +- Python 3.7 or later +- PyTorch installed +- A free [W&B account](https://wandb.ai) +- GPU hardware (optional but recommended) + +## Quickstart + +The following example shows how to add W&B tracking to a training loop. + ```python # import the library import wandb # start a new experiment with wandb.init(project="new-sota-model") as run: - - # capture a dictionary of hyperparameters with config + # capture a dictionary of hyperparameters with config run.config = {"learning_rate": 0.001, "epochs": 100, "batch_size": 128} - + # set up model and data model, dataloader = get_model(), get_data() - + # optional: track gradients run.watch(model) - + for batch in dataloader: - metrics = model.training_step() - # log metrics inside your training loop to visualize model performance - run.log(metrics) - + metrics = model.training_step() + # log metrics inside your training loop to visualize model performance + run.log(metrics) + # optional: save model at the end model.to_onnx() run.save("model.onnx") ``` -Follow along with a [video tutorial](https://wandb.me/pytorch-video). +For a walkthrough, see the [video tutorial](https://wandb.me/pytorch-video). + +> **Note:** Steps labeled *Step X* show only the minimal code needed for W&B integration. Other sections cover model and data setup. -**Note**: Sections starting with _Step_ are all you need to integrate W&B in an existing pipeline. The rest just loads data and defines a model. +--- -## Install, import, and log in +## Configure Pytorch +The following example shows how you can prepare your Pytorch code for W&B integration. ```python import os @@ -76,52 +99,33 @@ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors if not mirror.startswith("http://yann.lecun.com")] ``` +To integrate PyTorch with W&B: -### Step 0: Install W&B - -To get started, we'll need to get the library. -`wandb` is easily installed using `pip`. +## Step 1. Install W&B +Install the libraries: ```python !pip install wandb onnx -Uq ``` -### Step 1: Import W&B and Login - -In order to log data to our web service, -you'll need to log in. +## Step 2. Import and log in -If this is your first time using W&B, -you'll need to sign up for a free account at the link that appears. +Import W&B and log in: - -``` +```python import wandb wandb.login() ``` -## Define the Experiment and Pipeline - -### Track metadata and hyperparameters with `wandb.init` - -Programmatically, the first thing we do is define our experiment: -what are the hyperparameters? what metadata is associated with this run? - -It's a pretty common workflow to store this information in a `config` dictionary -(or similar object) -and then access it as needed. +If this is your first time, create an account at the link provided. -For this example, we're only letting a few hyperparameters vary -and hand-coding the rest. -But any part of your model can be part of the `config`. +--- -We also include some metadata: we're using the MNIST dataset and a convolutional -architecture. If we later work with, say, -fully connected architectures on CIFAR in the same project, -this will help us separate our runs. +## Step 3. Define the experiment +Track hyperparameters and metadata with `wandb.init()`. Use a config dictionary for reproducibility: ```python config = dict( @@ -134,19 +138,14 @@ config = dict( architecture="CNN") ``` -Now, let's define the overall pipeline, -which is pretty typical for model-training: - -1. we first `make` a model, plus associated data and optimizer, then -2. we `train` the model accordingly and finally -3. `test` it to see how training went. - -We'll implement these functions below. +A typical ML pipeline includes the following steps: +1. Build the model, data, and optimizer +2. Train the model +3. Test the performance ```python def model_pipeline(hyperparameters): - # tell wandb to get started with wandb.init(project="pytorch-demo", config=hyperparameters) as run: # access all HPs through run.config, so logging matches execution. @@ -158,45 +157,22 @@ def model_pipeline(hyperparameters): # and use them to train the model train(model, train_loader, criterion, optimizer, config) - # and test its final performance test(model, test_loader) return model ``` -The only difference here from a standard pipeline -is that it all occurs inside the context of `wandb.init`. -Calling this function sets up a line of communication -between your code and our servers. - -Passing the `config` dictionary to `wandb.init` -immediately logs all that information to us, -so you'll always know what hyperparameter values -you set your experiment to use. - -To ensure the values you chose and logged are always the ones that get used -in your model, we recommend using the `run.config` copy of your object. -Check the definition of `make` below to see some examples. - -> *Side Note*: We take care to run our code in separate processes, -so that any issues on our end -(such as if a giant sea monster attacks our data centers) -don't crash your code. -Once the issue is resolved, such as when the Kraken returns to the deep, -you can log the data with `wandb sync`. - - ```python def make(config): # Make the data train, test = get_data(train=True), get_data(train=False) train_loader = make_loader(train, batch_size=config.batch_size) test_loader = make_loader(test, batch_size=config.batch_size) - + # Make the model model = ConvNet(config.kernels, config.classes).to(device) - + # Make the loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam( @@ -205,28 +181,23 @@ def make(config): return model, train_loader, test_loader, criterion, optimizer ``` -### Define the Data Loading and Model - -Now, we need to specify how the data is loaded and what the model looks like. +--- -This part is very important, but it's -no different from what it would be without `wandb`, -so we won't dwell on it. +## Step 4. Load data and define the model +Load the data: ```python def get_data(slice=5, train=True): full_dataset = torchvision.datasets.MNIST(root=".", - train=train, - transform=transforms.ToTensor(), - download=True) + train=train, + transform=transforms.ToTensor(), + download=True) # equiv to slicing with [::slice] sub_dataset = torch.utils.data.Subset( full_dataset, indices=range(0, len(full_dataset), slice)) - return sub_dataset - def make_loader(dataset, batch_size): loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batch_size, @@ -235,20 +206,10 @@ def make_loader(dataset, batch_size): return loader ``` -Defining the model is normally the fun part. - -But nothing changes with `wandb`, -so we're gonna stick with a standard ConvNet architecture. - -Don't be afraid to mess around with this and try some experiments -- -all your results will be logged on [wandb.ai](https://wandb.ai). - - - +Define the model: ```python # Conventional and convolutional neural network - class ConvNet(nn.Module): def __init__(self, kernels, classes=10): super(ConvNet, self).__init__() @@ -262,7 +223,7 @@ class ConvNet(nn.Module): nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2)) self.fc = nn.Linear(7 * 7 * kernels[-1], classes) - + def forward(self, x): out = self.layer1(x) out = self.layer2(out) @@ -271,38 +232,24 @@ class ConvNet(nn.Module): return out ``` -### Define Training Logic - -Moving on in our `model_pipeline`, it's time to specify how we `train`. - -Two `wandb` functions come into play here: `watch` and `log`. - -## Track gradients with `run.watch()` and everything else with `run.log()` - -`run.watch` will log the gradients and the parameters of your model, -every `log_freq` steps of training. - -All you need to do is call it before you start training. +--- -The rest of the training code remains the same: -we iterate over epochs and batches, -running forward and backward passes -and applying our `optimizer`. +## Step 5. Train the model +Log gradients with `run.watch()` and metrics with `run.log()`: ```python def train(model, loader, criterion, optimizer, config): # Tell wandb to watch what the model gets up to: gradients, weights, and more. run = wandb.init(project="pytorch-demo", config=config) run.watch(model, criterion, log="all", log_freq=10) - + # Run training and track with wandb total_batches = len(loader) * config.epochs example_ct = 0 # number of examples seen batch_ct = 0 for epoch in tqdm(range(config.epochs)): for _, (images, labels) in enumerate(loader): - loss = train_batch(images, labels, model, optimizer, criterion) example_ct += len(images) batch_ct += 1 @@ -311,7 +258,6 @@ def train(model, loader, criterion, optimizer, config): if ((batch_ct + 1) % 25) == 0: train_log(loss, example_ct, epoch) - def train_batch(images, labels, model, optimizer, criterion): images, labels = images.to(device), labels.to(device) @@ -329,18 +275,7 @@ def train_batch(images, labels, model, optimizer, criterion): return loss ``` -The only difference is in the logging code: -where previously you might have reported metrics by printing to the terminal, -now you pass the same information to `run.log()`. - -`run.log()` expects a dictionary with strings as keys. -These strings identify the objects being logged, which make up the values. -You can also optionally log which `step` of training you're on. - -> *Side Note*: I like to use the number of examples the model has seen, -since this makes for easier comparison across batch sizes, -but you can use raw steps or batch count. For longer training runs, it can also make sense to log by `epoch`. - +Log metrics: ```python def train_log(loss, example_ct, epoch): @@ -351,33 +286,16 @@ def train_log(loss, example_ct, epoch): print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}") ``` -### Define Testing Logic - -Once the model is done training, we want to test it: -run it against some fresh data from production, perhaps, -or apply it to some hand-curated examples. - - - -## (Optional) Call `run.save()` - -This is also a great time to save the model's architecture -and final parameters to disk. -For maximum compatibility, we'll `export` our model in the -[Open Neural Network eXchange (ONNX) format](https://onnx.ai/). - -Passing that filename to `run.save()` ensures that the model parameters -are saved to W&B's servers: no more losing track of which `.h5` or `.pb` -corresponds to which training runs. +--- -For more advanced `wandb` features for storing, versioning, and distributing -models, check out our [Artifacts tools](https://www.wandb.com/artifacts). +## Step 6. Test and save the model +Evaluate and save the trained model: ```python def test(model, test_loader): model.eval() - + with wandb.init(project="pytorch-demo") as run: # Run the model on some test examples with torch.no_grad(): @@ -390,7 +308,7 @@ def test(model, test_loader): correct += (predicted == labels).sum().item() print(f"Accuracy of the model on the {total} " + - f"test images: {correct / total:%}") + f"test images: {correct / total:%}") run.log({"test_accuracy": correct / total}) @@ -399,64 +317,54 @@ def test(model, test_loader): run.save("model.onnx") ``` -### Run training and watch your metrics live on wandb.ai - -Now that we've defined the whole pipeline and slipped in -those few lines of W&B code, -we're ready to run our fully tracked experiment. - -We'll report a few links to you: -our documentation, -the Project page, which organizes all the runs in a project, and -the Run page, where this run's results will be stored. - -Navigate to the Run page and check out these tabs: - -1. **Charts**, where the model gradients, parameter values, and loss are logged throughout training -2. **System**, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar), and more -3. **Logs**, which has a copy of anything pushed to standard out during training -4. **Files**, where, once training is complete, you can click on the `model.onnx` to view our network with the [Netron model viewer](https://github.com/lutzroeder/netron). +--- -Once the run in finished, when the `with wandb.init` block exits, -we'll also print a summary of the results in the cell output. +## Step 7. Run and monitor the experiment +Run the pipeline: ```python # Build, train and analyze the model with the pipeline model = model_pipeline(config) ``` -### Test Hyperparameters with Sweeps +Monitor the run in W&B: -We only looked at a single set of hyperparameters in this example. -But an important part of most ML workflows is iterating over -a number of hyperparameters. +- **Charts**: Gradients, parameters, and loss curves +- **System**: CPU, GPU, and memory utilization +- **Logs**: Console outputs +- **Files**: Artifacts like `model.onnx` -You can use W&B Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies. +--- -Check out a [Colab notebook demonstrating hyperparameter optimization using W&B Sweeps](https://wandb.me/sweeps-colab). +## Step 8. Optimize hyperparameters with Sweeps -Running a hyperparameter sweep with W&B is very easy. There are just 3 simple steps: +Use W&B Sweeps to explore hyperparameters: -1. **Define the sweep:** We do this by creating a dictionary or a [YAML file]({{< relref "/guides/models/sweeps/define-sweep-configuration" >}}) that specifies the parameters to search through, the search strategy, the optimization metric et all. +1. Define the sweep configuration: + ```python + sweep_id = wandb.sweep(sweep_config) + ``` +2. Run the sweep agent: + ```python + wandb.agent(sweep_id, function=train) + ``` -2. **Initialize the sweep:** -`sweep_id = wandb.sweep(sweep_config)` +For a complete example, see the [Colab sweep notebook](https://wandb.me/sweeps-colab). -3. **Run the sweep agent:** -`wandb.agent(sweep_id, function=train)` +{{< img src="/images/tutorials/pytorch-2.png" alt="PyTorch training dashboard" >}} -That's all there is to running a hyperparameter sweep. +--- -{{< img src="/images/tutorials/pytorch-2.png" alt="PyTorch training dashboard" >}} +## Example gallery +See tracked projects in the [W&B Gallery](https://app.wandb.ai/gallery). -## Example Gallery +## Advanced configuration -Explore examples of projects tracked and visualized with W&B in our [Gallery →](https://app.wandb.ai/gallery). +For advanced use cases, see: -## Advanced Setup -1. [Environment variables]({{< relref "/guides/hosting/env-vars/" >}}): Set API keys in environment variables so you can run training on a managed cluster. -2. [Offline mode]({{< relref "/support/kb-articles/run_wandb_offline.md" >}}): Use `dryrun` mode to train offline and sync results later. -3. [On-prem]({{< relref "/guides/hosting/hosting-options/self-managed" >}}): Install W&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams. -4. [Sweeps]({{< relref "/guides/models/sweeps/" >}}): Set up hyperparameter search quickly with our lightweight tool for tuning. +- [Environment variables]({{< relref "/guides/hosting/env-vars/" >}}) +- [Offline mode]({{< relref "/support/kb-articles/run_wandb_offline.md" >}}) +- [On-premises hosting]({{< relref "/guides/hosting/hosting-options/self-managed" >}}) +- [Sweeps]({{< relref "/guides/models/sweeps/" >}})