NOTE: This is a live document and is subject to change throughout the semester.
Data is everywhere and often a database is a convenient way to store and process it. But is a relational database always the best way? In this class we will explore several advanced database models, computational paradigms for processing large data sets, and searching (indexing) techniques. Database models include spatial, key-value, columnar, document, and graph; Computational paradigms for large data sets include MapReduce and Streaming; Searching techniques include approx-NN, LSH, and inverted indices.
Mon, Wed, Fri 09:00-09:50, 332 Reid Hall
David L. Millman, Ph.D.
Email: david.millman@montana.edu
Office hours: Mon 15:00 - 15:50, Thurs 13:00-13:50, or by appointment
Office: Barnard Hall 359
Github: dlm
Bitbucket: david_millman
After successfully completing this course, students will be able to:
- Identify and Explain why a database or collections of databases is appropriate for a task
- Build a system using polyglot persistence
- Design and implement algorithms for searching and processing massive data sets
No required text book but optional are highly recommended
Optional and highly recommended:
- Database System Concepts by Abraham Silberschatz, Henry F. Korth and S. Sudarshan
- Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson (7DB in reading below) (DO NOT USE 1st edition)
- Probabilistic Data Structures and Algorithms for Big Data Applications by Andrii Gakhov (PDS in reading below)
- Lecture Notes from Modern Algorithmic Toolbox by Tim Roughgarden and Greg Valiant (MAT below)
- Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman (MMD below)
- SE-Radio:
Others will be added as relevant.
-
CSCI 440-- Database Systems: DBMS architecture; major database models; relational algebra fundamentals; SQL query language; index file structures, data modeling and management, entity relationship diagrams.
-
Comfort with a Unix based operating system.
-
Willingness to get your hands dirty installing and working with multiple
The lecture schedule is subject to change throughout the semester, but here is the current plan. Assignments and due dates will be updated as they're assigned in class.
| Date | Description | Quiz | Assigned | Due | Recommended Reading |
|---|---|---|---|---|---|
| 08/26 | Intro | ||||
| 08/28 | Env setup | ||||
| 08/30 | Relational | Quiz 1 (Solution) | Homework 0 | 7DB-Relational Day 1 | |
| Date | Description | Quiz | Assigned | Due | Recommended Reading |
|---|---|---|---|---|---|
| 09/02 | NO CLASS (LABOR DAY) | ||||
| 09/04 | Relational | 7DB-Relational Day 2 | |||
| 09/06 | Relational | Homework 0 | 7DB-Relational Day 3 | ||
| 09/09 | Relational | Homework 1 | 7DB-Relational Day 2 | ||
| 09/11 | Column | 7DB-Hbase Day 1 | |||
| 09/13 | Column | Homework 2 | Homework 1 | 7DB-Hbase Day 2 | |
| 09/16 | Document | 7DB-Mongo Day 1 | |||
| 09/18 | Document | 7DB-Mongo Day 2 | |||
| 09/20 | Document | Quiz 2 | Homework 3 | Homework 2 | 7DB-Mongo Day 2 |
| 09/23 | Graph | 7DB-Neo4j Day 1 | |||
| 09/25 | Graph | Quiz 3 | 7DB-Neo4j Day 2 | ||
| 09/27 | Graph | Homework 4 | Homework 3 | 7DB-Neo4j Day 2 | |
| 09/30 | Redis | 7DB-Redis Day 1 |
| Date | Description | Quiz | Assigned | Due | Recommended Reading |
|---|---|---|---|---|---|
| 10/02 | Redis | 7DB-Redis Day 2 | |||
| 10/04 | Redis | Homework 5 | Homework 4 | 7DB-Redis Day 3 | |
| 10/07 | Hashing | Quiz 4 | PDS-Ch 1 | ||
| 10/09 | Hashing | PDS-Ch 2 | |||
| 10/11 | Set Membership | Homework 5 | PDS-Ch 2 | ||
| 10/14 | Cardinality | PDS-Ch 3 | |||
| 10/16 | NO CLASS (DAVE SICK) | ||||
| 10/18 | NO CLASS (DAVE SICK) | ||||
| 10/21 | Cardinality | Presentation | PDS-Ch 3 | ||
| 10/23 | Cardinality | Quiz 5 | PDS-Ch 3 | ||
| 10/25 | NO CLASS - Frequency - Video | Quiz 6 | Homework 6 | Presentation | PDS-Ch 4 |
| 10/28 | Frequency | PDS-Ch 4 | |||
| 10/30 | Frequency | PDS-Ch 4 |
| Date | Description | Quiz | Assigned | Due | Recommended Reading |
|---|---|---|---|---|---|
| 11/01 | MapReduce | Quiz 7 | Homework 7 | Homework 6 | MMD-Ch 2 |
| 11/04 | MapReduce | Quiz 8 | Proj Proposal | MMD-Ch 2 | |
| 11/06 | Similarity | MMD-Ch 3 / PDS-Ch 6 | |||
| 11/08 | Similarity | Homework 7 | MMD-Ch 3 / PDS-Ch 6 | ||
| 11/11 | NO CLASS (VETERANS DAY) | ||||
| 11/13 | Similarity | MMD-CH 3 / PDS-CH 6 | |||
| 11/15 | Realtime DBs (Saha, Rahman) | Exam | Proj Proposal | Setup Overview Of RealtimeDBs | |
| 11/18 | DB Security (Kelly, Turksonmez) | Proj Discussion | Setup | ||
| 11/20 | Rainbow Tables (Johnson) | password hashing & salt | |||
| 11/22 | Blockchain DBs (Nelson) | Proj Discussion | Subspace BigchainDB Blockchains | ||
| 11/25 | Multi-Obj Query Plan (Harris, Zou) | Proj | |||
| 11/27 | NO CLASS (THANKSGIVING BREAK) | Multi-Obj PQO | |||
| 11/29 | NO CLASS (THANKSGIVING BREAK) |
| Date | Description | Assigned | Due | Recommended Reading |
|---|---|---|---|---|
| 12/02 | Community Detection (Gibbs, Hewitt) | Graph Algos Ch 6 | ||
| 12/04 | Streaming Clustering (Folkman, Whitman) | clustering MMD--7.6 | ||
| 12/06 | CouchDB versioning & Conflict Resolution (Hoy, Watson) | about couch |
| | | | | | | | 12/09 | (Finals week) 08:00-09:50 | | Proj Writeup & Presentation | |
- Journalling/Write ahead logging
- Compression
Your grade for this class will be determined by:
- 10% Quizzes (lowest quiz is dropped)
- 35% Homework (lowest homework is dropped)
- 25% Exam
- 10% Group Presentation
- 20% Group Project
Attendance in class with not be taken but students are responsible for all material covered in class. If you are not in class, you cannot receive credit for quizzes. Attendance is strongly recommended.
There will be regular homework assignments (about every week or every other week depending on the difficulty of the assignment) consisting of written problems and coding exercises. Homeworks will be posted in the schedule. If not specified, solutions should be submitted as a PDF on Brightspace. (The tool that I use for grading documents only works with PDFs, so any file format other than PDF will receive a 0.) Homework is due at 23:59 on the due date. Late homework will not be accepted.
You do NOT need to write up your solutions with LaTex, but I highly encourage you to do so. You can find some resources for getting started with latex (and for making figures, and keeping all those files safe with git) in the student resources repo.
I encourage collaboration, see collaboration section for details.
Group discussions, questions, and announcements will take place on the Brightspace message board. is okay to send me a direct message or email if you have a question that you feel is not appropriate to share with the class. If, however, you send me an message with a question for which the response would be useful to the rest of the class, I will likely ask you to post publicly.
Collaboration IS encouraged, however, all submitted individual work must be your own and you must acknowledge your collaborators at the beginning of the submission.
On any group project, every team member is expected to make a substantial contribution. The distribution of the work, however, is up to the team.
A few specifics for the assignments. You may:
- Work with anyone in the course.
- Share ideas with others in the course
- Help other teams debug their code or proofs.
You may NOT:
- Submit a proof or code that you did not write.
- Modify another's proof or code and claim it as your own.
Using resources in addition to the course materials is encouraged. But, be sure to properly cite additional resources. Remember, it is NEVER acceptable to pass others work off as your own.
Paraphrasing or quoting another's work without citing the source is a form of academic misconduct. Even inadvertent or unintentional misuse or appropriation of another's work (such as relying heavily on source material that is not acknowledged) is considered plagiarism. If you have any questions about using and citing sources, you are expected to ask for clarification. My rule of thumb is if I am in doubt, I cite.
By participating in this class, you agree to abide by the student code of conduct. Please review the policy.
Except for note taking and coding, please keep electronic devices off during class, they can be distractions to other students. Disruptions to the class will result in you being asked to leave the lecture and will negatively impact your grade.
If you have a documented disability for which you are or may be requesting an accommodation(s), you are encouraged to contact me and Disabled Student Services as soon as possible.