Stat 133: Concepts in Computing with Data

Calendar

Instructor: Gaston Sanchez
Lecture: MWF 9:00-10:00am, 245 Li Ka Shing
OH: MWF 10:30am - 11:30am, 309 Evans
Tentative topics and dates, subject to change depending on the pace of the course.
Notes (:pencil:) involves material discussed in lecture.
Reading (:book:) involves material that expands lecture topics, as well as coding examples that you should review/practice outside of class.

0. Course Introduction

📇 Dates: Aug-28
💬 Topics: Welcome to Stat 133. We begin with the usual review of the course policies, logistics, overall expectations, topics in a nutshell, etc.
📝 Notes:
- Welcome to Stat 133 (talk and chalk)
📖 Reading:
- Course policies
- FAQs
🔬 Lab: No lab
🔈 To Do:
- Install R
- Install RStudio Desktop (open source version, free)

1. The Big Picture

📇 Dates: Aug-30, Sep-04
💬 Topics: Let's start with an unconventional introduction to computing with data using my favorite analogy "Data Analysis is a lot like Cooking". At the conceptual level we'll identify the main stages of the data analysis cycle. Also, we should keep in mind that data analysis projects usually start with a Research Question. In addition, we'll describe how Data can actually be seen from a triangular perspective (i.e. my "3 Views of Data").
📝 Notes:
📖 Reading:
- What is Data Science?

2. R Survival Skills

📇 Dates: Sep 04-05, ⚠️ mostly in lab
💬 Topics: In addition to the "Big Picture" concepts, you'll begin learning basic survival skills for R. The main idea is to get a first contact with the RStudio workspace, and the Markdown syntax.
📖 Reading:
- First contact with R (tutorial)
- Intro to Rmd files (tutorial)
- Introduction to R Markdown (by RStudio)
💡 Cheatsheet:

3. Intro to Data Technologies: Data Types, and Data Objects.

📇 Dates: Sep 06-11
💬 Topics: How do programming languages and computing environments handle data? To answer this question we'll discuss a couple of fundamental topics such as data types and their implementation in R around vectors and arrays. More specifically, we'll focus on concepts like atomicity, vectorization, recycling, and subsetting. Likewise, we will also describe more generic data objects such as lists.
📝 Notes:
📖 Reading:
- Intro to vectors (tutorial)
- chapter 20: Vectors (R for Data Science by Grolemund and Wickham)
💡 Cheatsheet:
- Base R

4. Intro to Data Technologies (cont'd): Data Frames

📇 Dates: Sep 11-12, ⚠️ mostly in lab
💬 Topics: Besides atomic data objects, we also need to talk about R data frames which provide a nice structure to handle tabular data. You will learn how to manipulate data frames from two approaches: 1) using classic bracket notation, and 2) using a more modern and syntactic way following the data plying framework provided by the package "dplyr".
📝 Notes:
- Data Frames part 1
- Data Frames part 2
- "dplyr" tutorial slides (by Hadley Wickham)
📖 Reading:
- Introduction to dplyr (by Hadley Wickham)
- tibbles vignette
💡 Cheatsheet:
- Data transformation cheat sheet

5. Housekeeping: Filesystem and Bash Commands

📇 Dates: Sep 13-16
💬 Topics: At the lowest level, Data Analysis Projects (DAPs) are essentially made of files and directories. Therefore, we need to review some fundamental concepts such as the file-system, the command line interface, and some basic shell commands.
📝 Notes:
📖 Reading:
- Linux Tutorial lessons 1-5 (by Ryan Chadwick)
- The Unix Shell lessons 1-3 (by Software Carpentry)
- Linux Command Line tutorial (by Guru99)
💡 Cheat sheet:
- command line cheat sheet

6. Data Tables: Storage, Organization, Importing, and Unix filters

📇 Dates: Sep 18-25
💬 Topics: We continue with a fundamental topic of data technologies: Data Tables, the most common form in which data is stored, handled, and manipulated. Because datasets in tabular format are so ubiquitous, we need to talk about how tables are typically stored, learn good principles of data organization, and the so-called notion of "tidy data". You will also learn how to perform basic manipulation of data-table files with some unix filters. Also, we'll examine the relationship between tables and R data frames, as well as some considerations when importing (and exporting) tables in R.
📝 Notes:
- Data Tables (introduction)
- Spreadsheets
- Unix command line: Redirection and Pipes
- Unix filters: cut, sort, uniq
- Importing tables part 1 and part 2
📖 Reading:
- Organizing data in spreadsheets (by Karl Broman)
- Intro to Data Technologies (preface, chapter 1, and chapter 5) (by Paul Murrell)
- Tidy Data (by Hadley Wickham)
💡 Cheat sheet:
- command line cheat sheet

7. Housekeeping: Version Control with Git and GitHub

📇 Dates: Oct -02-03, ⚠️ mostly in lab
💬 Topics: We continue talking about filestructure topics, and we introduce basic notions of version control systems (VCS) using Git, and the companion hosting platform GitHub.
📝 Notes:
- Git Basics
- Git Workflow
📖 Reading:
- Read sections 4 to 9 in Part I Installation (Happy Git and GitHub for the useR by Jenny Bryan et al.)
💡 Cheat sheet:
- Data import cheat sheet
- git cheat sheet

8. Data Visualization

📇 Dates: Sep 30, Oct-09
💬 Topics: Paraphrasing the old saying "a graphic is word a thousand numbers". No other means of data representation allows us to understand data than visual displays. But in order to make good graphics we need to learn the fundamental concepts for data visualization.
📝 Notes:
📖 Reading:
- "ggplot2" lecture (by Karthik Ram)
💡 Cheat sheet:
- Data visualization with ggplot2
🎓 MIDTERM 1: Friday Oct-11

9. Transition to Programming Basics for Data Analysis (part 1)

📇 Dates: Oct 14-18
💬 Topics: You don’t need to be an expert programmer to be a data scientist, but learning more about programming allows you to automate common tasks, and solve new problems with greater ease. We'll discuss how to write basic functions, the notion of R expressions, and an introduction to conditionals.
📝 Notes:
- Creating functions (tutorial)
- Introduction to functions (tutorial)
- Introduction to R expressions and conditionals (tutorial)
📖 Reading:
- chapter 19: Functions (R for Data Science by Grolemund and Wickham)

10. Programming Basics for Data Analysis (part 2)

📇 Dates: Oct 21-25
💬 Topics: In addition to writing functions to reduce duplication in your code, you also need to learn about iteration, which helps you when you need to do the same operation several times. Namely, we review control flow structures such as for loops, while loops, repeat loops, and the apply family functions.
📝 Notes:
- Introduction to loops (tutorial)
- More about functions (tutorial)
- Functions (Advanced R by H. Wickham)
📖 Reading:
- Environments (Advanced R by H. Wickham)

11. Testing Functions

📇 Dates: Oct 28-Nov 01
💬 Topics: We begin with an introduction to the package "testthat" which provides a nice framework for testing functions. Jointly, we will discuss Shiny apps which provide an interesting companion to R, making it quick and simple to deliver interactive analysis and graphics on any web browser. In lab, you'll learn how to perform basic manipulation of strings.
📝 Notes:
- Intro to testing functions (tutorial)
- shiny tutorial (by Grolemund)
📖 Reading:
- testthat: Get started with testing (by Wickham)
- Character strings in R (r4strings by Sanchez)
- Basic string manipulations (r4strings by Sanchez)
- chapter 14: Strings (R for Data Science by Grolemund and Wickham)
💡 Cheat sheet:
- Stringr cheat sheet

12. Shiny Apps

📇 Dates: Oct 28-Nov 01
💬 Topics: We will discuss Shiny apps which provide an interesting companion to R, making it quick and simple to deliver interactive analysis and graphics on any web browser.
📝 Notes:
- shiny tutorial (by Grolemund)
📖 Reading:
- testthat: Get started with testing (by Wickham)
- Character strings in R (r4strings by Sanchez)
- Basic string manipulations (r4strings by Sanchez)
- chapter 14: Strings (R for Data Science by Grolemund and Wickham)
💡 Cheat sheet:
- Stringr cheat sheet

13. More Shiny Apps and Introduction to Regular Expressions

📇 Dates: Nov 04-08
💬 Topics: Random numbers have many applications in science and computer programming, especially when there are significant uncertainties in a phenomenon of interest. In this part of the course we'll look at some basic problems involving working with random numbers and creating simulations. Additionally, we continue the discussion about character strings with a first contact to Regular Expressions.
📝 Notes:
- Introduction to random numbers
- Coin toss shiny app
- Regexpal tester tool.
📖 Reading:
- Part 1 - How to build a Shiny app (video)
- Part 2 - How to customize reactions (video)
- Part 3 - How to customize appearance (video)
💡 Cheat sheet:
- shiny cheat sheet

14. More Regular Expressions

📇 Dates: Nov 11-15
💬 Topics: At its heart, computing involves working with numbers. However, a considerable amount of information and data is in the form of text. To unleash the power of strings manipulation, we need to take things to the next level and learn about Regular Expressions. Namely, Regular expressions are a tool that allows us to describe a certain amount of text called "patterns". We'll describe the basic concepts of regex and the common operations to match text patterns.
📝 Notes:
- Long Jump World Record example
- Log file example
📖 Reading:
- Handling Strings in R (by Sanchez)
💡 Cheat sheet:
- Regular Expressions cheat sheet

15. R packaging (part 1)

📇 Dates: Nov 18-22
💬 Topics: Packages are the fundamental units of reproducible R code. They include reusable functions, the documentation that describes how to use them, and sample data. In this part we'll start describing how to turn your code into an R package.
📝 Notes:
- Programming S3 Classes
- Methods (by Sanchez)
📖 Reading:
- Package Structure (R packages by Wickham)
- See package components: http://r-pkgs.had.co.nz/ (R packages by Wickham)
💡 Cheat sheet:
- Package Development cheat sheet

16. R Packaging (part 2)

📇 Dates: Dec 02-06
💬 Topics: Creating an R package can seem overwhelming at first. So we'll keep working on the creation of a relatively basic package. This will give you the opportunity to apply most of the concepts seen in the course.
📝 Notes:
- Pack YouR Code (by Sanchez)
📖 Reading:
- See package components: http://r-pkgs.had.co.nz (R packages by Wickham)
💡 Cheat sheet:
- Package Development cheat sheet

17. RRR Week and Final Exam

📇 Dates: Dec 09-13
💬 Topics: Prepare for final examination
📝 Notes:
- No lecture. Instructor will hold OH (in 309 Evans)
🎓 FINAL: Dec-19th, 7-10 pm, room TBD
- More details about the final will be posted on bCourses

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
cheatsheets		cheatsheets
data		data
images		images
labs		labs
slides		slides
syllabus		syllabus
tutorials		tutorials
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stat 133: Concepts in Computing with Data

Calendar

0. Course Introduction

1. The Big Picture

2. R Survival Skills

3. Intro to Data Technologies: Data Types, and Data Objects.

4. Intro to Data Technologies (cont'd): Data Frames

5. Housekeeping: Filesystem and Bash Commands

6. Data Tables: Storage, Organization, Importing, and Unix filters

7. Housekeeping: Version Control with Git and GitHub

8. Data Visualization

9. Transition to Programming Basics for Data Analysis (part 1)

10. Programming Basics for Data Analysis (part 2)

11. Testing Functions

12. Shiny Apps

13. More Shiny Apps and Introduction to Regular Expressions

14. More Regular Expressions

15. R packaging (part 1)

16. R Packaging (part 2)

17. RRR Week and Final Exam

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stat 133: Concepts in Computing with Data

Calendar

0. Course Introduction

1. The Big Picture

2. R Survival Skills

3. Intro to Data Technologies: Data Types, and Data Objects.

4. Intro to Data Technologies (cont'd): Data Frames

5. Housekeeping: Filesystem and Bash Commands

6. Data Tables: Storage, Organization, Importing, and Unix filters

7. Housekeeping: Version Control with Git and GitHub

8. Data Visualization

9. Transition to Programming Basics for Data Analysis (part 1)

10. Programming Basics for Data Analysis (part 2)

11. Testing Functions

12. Shiny Apps

13. More Shiny Apps and Introduction to Regular Expressions

14. More Regular Expressions

15. R packaging (part 1)

16. R Packaging (part 2)

17. RRR Week and Final Exam

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages