-
Notifications
You must be signed in to change notification settings - Fork 266
Creating a Course
TODO Automate this as much as possible.
Let's say we're going to create a new course on Kaggle Learn called Data Science.
- On the command line, navigate to the
learntools/notebooks/directory. - Create a new branch on
masterwith a name likeds-course. Be sure to check that there isn't already a branch with that name. - Decide on a "track name" like
data_science. This will be the name of the directory where your course files will exist. Check that there isn't already a directory with that name. - There should be a Bash script called
new_track.sh. Run/.new_track.sh data_science. - Stage the new files:
git add data_science. - Commit the changes:
git commit -m "Create track ds-course." - Create a pull request on GitHub named
[Data Science] New course.
- Navigate to the learntools root directory
learntools/(the directory containingsetup.py). Do this from inside a Jupyter notebook, either with!cdoros.chdir. - Install an editable version of learntools. Inside of a Jupyter notebook, run
!pip install --editable .(note the period). Installing the local copy of learntools from inside Jupyter helps ensure the Python kernel can find the installation. Due to environment weirdness, installing it from the command line can be broken. - Navigate to
learntools/learntools. - Create a directory for your course:
mkdir data_science. - Create an initialization file:
touch data_science/__init__.py. - Commit the changes.
Create a folder to contain local copies of the course data: mkdir learntools/notebooks/input. This folder will just be for your own use while developing and won't be committed to the repository (it's in notebooks/.gitignore).
Create a folder for a course dataset: mkdir input/ds-course-data. Put all of the data you plan to use in here. If you develop your notebooks in the raw folder (notebooks/data_science/raw/), then you can access your datasets just like you would on Kaggle, like '../input/ds-course-data/data.csv'.
Add track name 'data_science' to TRACKS and TESTABLE_NOTEBOOK_TRACKS in learntools/notebooks/test.sh.
Create a new file setup_data.sh in learntools/notebooks/data_science/:
#!/bin/bash
# Download the datasets used in the ML notebooks to correct relative_paths (../input/...)
mkdir -p input
DATASETS="ryanholbrook/ds-course-data ryanholbrook/some-other-data"
for slug in $DATASETS
do
name=`echo $slug | cut -d '/' -f 2`
dest="input/$name"
mkdir -p $dest
kaggle d download -p $dest --unzip $slug
doneYou'll need to keep this list of datasets in DATASETS up-to-date with those you use in your course (that is, those defined in track_meta.py).