Skip to content

Creating a Course

Ryan Holbrook edited this page Oct 21, 2020 · 15 revisions

TODO Automate this as much as possible.

Let's say we're going to create a new course on Kaggle Learn called Data Science.

Notebooks

  1. On the command line, navigate to the learntools/notebooks/ directory.
  2. Create a new branch on master with a name like ds-course. Be sure to check that there isn't already a branch with that name.
  3. Decide on a "track name" like data_science. This will be the name of the directory where your course files will exist. Check that there isn't already a directory with that name.
  4. There should be a Bash script called new_track.sh. Run /.new_track.sh data_science.
  5. Stage the new files: git add data_science.
  6. Commit the changes: git commit -m "Create track ds-course."
  7. Create a pull request on GitHub named [Data Science] New course.

Checking Code

  1. Navigate to the learntools root directory learntools/ (the directory containing setup.py). Do this from inside a Jupyter notebook, either with !cd or os.chdir.
  2. Install an editable version of learntools. Inside of a Jupyter notebook, run !pip install --editable . (note the period). Installing the local copy of learntools from inside Jupyter helps ensure the Python kernel can find the installation. Due to environment weirdness, installing it from the command line can be broken.
  3. Navigate to learntools/learntools.
  4. Create a directory for your course: mkdir data_science.
  5. Create an initialization file: touch data_science/__init__.py.
  6. Commit the changes.

Datasets

Create a folder to contain local copies of the course data: mkdir learntools/notebooks/input. This folder will just be for your own use while developing and won't be committed to the repository (it's in notebooks/.gitignore).

Create a folder for a course dataset: mkdir input/ds-course-data. Put all of the data you plan to use in here. If you develop your notebooks in the raw folder (notebooks/data_science/raw/), then you can access your datasets just like you would on Kaggle, like '../input/ds-course-data/data.csv'.

Jenkins

Add track name 'data_science' to TRACKS and TESTABLE_NOTEBOOK_TRACKS in learntools/notebooks/test.sh.

Create a new file setup_data.sh in learntools/notebooks/data_science/:

#!/bin/bash
# Download the datasets used in the ML notebooks to correct relative_paths (../input/...)

mkdir -p input

DATASETS="ryanholbrook/ds-course-data ryanholbrook/some-other-data"

for slug in $DATASETS
do
    name=`echo $slug | cut -d '/' -f 2`
    dest="input/$name"
    mkdir -p $dest
    kaggle d download -p $dest --unzip $slug
done

You'll need to keep this list of datasets in DATASETS up-to-date with those you use in your course (that is, those defined in track_meta.py).

Clone this wiki locally