Skip to content

tools4ds/DS701-Course-Notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

872 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DS701 Course Book

This is the repository for DS701 course notes and lecture slides. The slides are derived from the same source content.

The site and slides are created using Quarto. In order to build the site and slides you need to install Quarto. You can install quarto from this link. Follow the instructions in the Get Started section for VSCode. For VSCode you need to also install the Quarto extension. This allows you to preview the content you have created in VSCode.

Python environment

Tested on MacOS Sonoma 14.5 with python 3.12.4.

To execute the Python code used in the book requires several Python packages.

We recommend using venv to create a virtual environment with all the necessary pacakges. To set up this environment use the following terminal commands

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Quarto Project Selection

We use Quarto Projects to manage the YAML configurations specific to the output type (e.g. website or lecture slides) using project profiles.

Per profiles documentation, we define YAML configuration files for each project type:

  • _quarto-web.yml to create the website in output directory _site
  • _quarto-slides.yml to create slides in output directory _revealjs
  • _quarto-book.yml to create the HTML book in output directory _book (deprecated)

along with a base _quarto.yml configuration.

Building the Site Locally

Tested with Quarto 1.5.55 on MacOS Sonoma 14.5.

To build the site you need to be in the ds701_book directory.

cd ds701_book

Since the site contains many lectures you may need to set the following environment variable using the terminal command:

export QUARTO_DENO_V8_OPTIONS=--stack-size=8192

Once this environment variable has been set you can render the entire site using the terminal commands:

# From ds701_book/ dir
quarto render

The html files are all located in the _site directory. The _site directory is not committed in the repository.

Previewing the Lecture Notes Website

To preview and individual chapter using VSCode, open that chapter's qmd file in VSCode and run Shift-Command-K in the terminal or click on the preview icon on the upper right of the code window,.

Alternatively you can run quarto preview from a terminal prompt in the ds701_book directory. To exit preview, hit Ctl-c in the same terminal window.

Rendering Slides

To render slides for each lecture run

quarto render --profile slides

from ds701_book/. The resulting slides are writtein to _revealjs which is ignored by git.

Any easy way to select slides to preview is to open the folder in a browser such as file:///<path-to-project-parent-folder>/DS701-Course-Notes/ds701_book/_revealjs/. Then you can just click on one of the .html files to view the slides.

You can render just one slide with a command like

quarto render 05-Distances-Timeseries.qmd --profile slides

Stripping Slides for Presentation

To strip the slides for presentation, run

./strip-tags-with-profile.py 11-Dimensionality-Reduction-SVD-II.qmd --profile slides

This will create a new file 11-Dimensionality-Reduction-SVD-II-stripped.qmd in the same directory.

Then add the new file to the _quarto-slides.yml file and render the stripped slides for presentation.

TODO: The script for jupyter notebook creation could run this first so all the divs don't get put in the jupyter notebooks.

Creating PDFs from the Slides

To create PDFs from the reveal.js slides, you can use decktape.

For example, from ds701_book, run

decktape _revealjs/04-Linear-Algebra-Refresher.html 04-Linear-Algebra-Refresher.pdf

Creating Jupyter Notebooks

To create Jupyter notebook versions of each of the lecture notes, run ./cmd-cnvt-to-jupyter.sh from the ds701_book folder. It renders the Jupyter notebooks into the jupyter_notebooks folder if the associated .qmd file has been modified.

Whenever you change in any .qmd file that has python in it, re-run ./cmd-cnvt-to-jupyter.sh and commit jupyter_notebooks.

(deprecated) Rendering the Book

Rendering a Quarto book has been deprecated and replaced by rendering to a website. This might still be useful, for example to render to a PDF.

To render the the book run

quarto render --profile book

This will render an HTML book format to _book which is ignored by git.

Citations and Bibliography

In many cases citations are directly referenced in the text, but in some cases we use Quarto's support for citations in the BibTeX format.

For bibtex citations, add entries to ds701_book/references.bib and cite them as directed Quarto citations.

As stated in the Quarto documentation, the list of works will be placed at the end of the web page or the last slide. You can control the location by including a div with id refs such as:

## References

::: {#refs}
:::

Bibliographies are not included by default in the configuraiton files. Instead, include the configuration in the .qmd file front matter, e.g.

--- 
title: document title
bibliography: references.bib
---

The WikiBook on LaTeX Bibliography Management is a good reference on BibTeX format.

Formatting Tips

Display

You can display text from a code cell in a much nicer format with the display and Markdown functions.

from IPython.display import display, Markdown

n_users = df["UserId"].unique().shape[0]
n_movies = df["ProductId"].unique().shape[0]
n_reviews = len(df)
display(Markdown(f'There are:\n'))
display(Markdown(f'* {n_reviews:,} reviews\n* {n_movies:,} movies\n* {n_users:,} users'))

display(Markdown(f'There are {n_users * n_movies:,} potential reviews, meaning sparsity of {(n_reviews/(n_users * n_movies)):0.4%}'))

About

Repository for DS701 course book and slides

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •