Skip to content

Homeworks

Ken Anderson edited this page Mar 23, 2015 · 11 revisions

I will be keeping track of Homework assignments (big and small) for the seminar on this page. Homework assignments will typically be small programming assignments or tasks that need to be completed by all members of the class. I will make it clear for each assignment how it should be submitted.

Homework 5: Semester Project

Over the last five weeks of the semester, your last assignment for our seminar is to work on a complete prototype that makes use of some of the software frameworks that we have reviewed or discussed in class this semester. You should work on this assignment in teams of 2 to 4 students.

Your main tasks are the following:

  1. Identify a data set of interest. You can choose to collect a large Twitter data set (with our Twitter framework if you want) and base your project off that. Or, as mentioned in class, you can take a look at a number of open data websites on the web (such as DATA.GOV) to base your project on.

  2. Identify a set of questions you would like to answer using this data set or identify the types of activities you would like to support over this data set. How would you like your users to browse the data set? What sort of metrics would you generate using this data set? What sort of visualizations can be built on top of this data set?

  3. Identify a NoSQL data store that you will use to store the data set. Import your data set into that data store. Keep track of any scripts you write to clean up the data set or to import the data set into your selected data store.

  4. Write a web service that allows you to perform CRUD (create, read, update, delete) and query operations on the elements in the data set. You can use any of the technologies that we've seen in class this semester for doing that (nodejs, express, Sinatra, Ruby on Rails, Flask, etc.).

  5. Write a client-side web application that makes use of the web service to display the data set and to answer the questions and/or implement the activities you identified in step 2 above.

  6. Create a public repository in Github for your project. Document your application in your README.md; place supplemental information about your project in the repository's wiki. Make use of issues and pull requests to document the development history of your project.

The requirements of number 6 in the list above means that one member of your team should create an official version of the repository, perhaps with just a README file to start, and then each member of your team should fork that repository and work in the forked repositories for day-to-day development.

Develop a process for creating pull requests on the official repository and merging in changes after they have been reviewed. Make sure that your team members are good about synching the latest changes from the upstream repository into their forked repositories as you make progress on the project.

The goal by the end of the semester is to have each team demonstrate mastery of the data life cycle for a system that works with a large data set and the software engineering tools and techniques that are needed to manage such a project.

You should send Prof. Anderson an e-mail message with the following details as soon as possible:

  • your team members
  • the data set you will be using
  • the questions/activities you plan to implement on top of that data set
  • the technology stack you will use to implement your project (NoSQL database plus the frameworks you will use to create your web service and web application)
  • the URL of your official GitHub repo for this project

This project is due by the last day of the semester: Friday, May 1, 2015. Stay tuned for additional details; Prof. Anderson plans to establish deadlines to make sure that each team is making adequate progress each week.

Homework 4: Schedule your Presentations

Homework 4 asks you to get serious about making a 20-30 minute presentation at an upcoming lecture on a topic related to data engineering. Take a look at the suggested topics (or add your own as long as Prof. Anderson approves your choice). Then sign up to give the presentation.

Every student should aim to make two presentations in this class. Please sign up for your presentations by Friday, March 13, 2015. Note: I need six students to sign up for talks for the week of March 16th (not this week but next week)!

Homework 3: Make a Contribution to the Twitter Data Collection Framework

Homework 3 will provide you with experience in adding functionality to an existing software framework—in this case the Twitter Data Collection Framework presented in Lecture 11—and with making pull requests in Github.

Forking the Repository

The first thing you must do for this assignment is to fork the Twitter Data Collection Framework. Use the instructions presented in this article: Fork a Repo; each time that article says octocat/Spoon-Knife, substitute the Twitter Data Collection Framework instead. (Although, you should definitely practice forking a repository with the octocat/Spoon-Knife repository to get some practice with the concept.) Be sure to follow step 3 of these instructions, such that you can keep your copy of the Twitter Data Collection Framework in sync with the original.

To do that, you will follow the instructions contained in this article: Syncing a fork. You will need to do this several times during the course of this homework assignment since your fellow students will be submitting changes to the original framework via pull requests and you will need to ensure you have the latest changes before submitting your own pull request. In addition, I might continue to make changes to the framework in my role as the original author. Note: you will be working in a topic branch on your forked repository. The instructions for syncing a fork demonstrate merging changes into the master branch. You will instead merge changes into your topic branch. You can do that by replacing the step at that page that says git checkout master with a command that says get checkout <topic_branch_name>; that will ensure that your topic branch stays up-to-date.

Creating your Topic Branch

Once you have your copy of the Twitter Data Collection Framework properly configured, you need to create a topic branch. You will use your name to generate the topic branch. If your name is Ken Anderson, your topic branch will be named anderson_ken. More generally, your topic branch should be named <lastname>_<firstname> or some variation of that format that makes sense for your name. Once you have determined the name for your branch, the command to create it is simple:

git checkout -b <branch_name>

This command simultaneously creates the branch and checks it out. You're now ready to work on modifying the software framework.

Make Your Changes

Now, you must work on the framework and make a change to it. The most straightforward change is to look for a new Twitter endpoint to support. You can find the list of endpoints at the documentation page for Twitter's REST API. Scan this list, read the documentation, and make your selection. Once you have identified an endpoint that you would like to support, send it to Prof. Anderson to claim it. Prof. Anderson will create a page on the wiki to keep track of the currently claimed endpoints. You will need to consult that list to make sure you pick an endpoint that has not yet been claimed. Your endpoint will exist as a new class in the requests directory of the framework. You will show that the endpoint has been implemented successfully by creating a new command line application in the apps directory of the framework that makes use of your endpoint.

As you work on your changes, let this be your motto: Commit early, commit often! Do not be afraid in making changes to the framework. You have your own copy of the repo to work with, so no need to worry about breaking anything and you have git to allow you to explore freely and always be able to roll back to a stable commit. As you get something to work, commit. As you add a new method, commit. Get in the habit of making small changes, testing them, and then committing them. The freedom this style of development gives you as a developer is transforming.

Note: adding support for a new endpoint is just one type of change you can make for this assignment. If you are interested in modifying the framework in some other way, please send a message to Prof. Anderson to describe your change and receive approval. If you have any questions about this section of Homework 3, contact Prof. Anderson early to get them answered!

Create a Pull Request

Once you have a) finished making your changes to the framework—with at least one additional request type implemented and one additional command-line app implemented— and b) synced your repository one last time with the original, and c) pushed your changes to your copy of the forked repo on Github (i.e. git push) you're ready to make a pull request. To do this, you will follow the instructions at this document: Using pull requests. The base branch of your pull request will be the master branch of the original repo. The head branch will be your topic branch in your repo.

Once you have created your pull request, Prof. Anderson will be notified and he will review your changes. He may ask you to make a few changes which will require you to update your topic branch and push those changes to your repo on GitHub (which will then automatically update your pull request). When Prof. Anderson is satisfied with your changes, he will then pull them into the official repository and your work on Homework 3 will be done.

Logistics

You must initiate your pull request by 11:59 PM on Wednesday, March 4th. If you have ANY questions on the existing framework or on how to do something with the Ruby programming language, send them to Prof. Anderson right away. He will answer them as quickly as possible (typically by posting code you can use in Gists). This assignment is NOT about your knowledge of the ruby programming language; instead it is meant to give you experience implementing client-side code of professionally-developed web services (in this case, Twitter's REST API) and working with pull requests on Github.

Homework 2: Review Four Web Services

Homework 2 asks you to go and review four of the RESTful web services that you and your classmates produced for Homework 1.

To do this, you need to learn about issues on Github. Issues allow you to organize tasks and bug reports with respect to a repository. You can learn more about issues via Github's on-line documentation. In particular, take a look at how to create an issue. Once created, the issue can act as a forum to discuss a particular subject, highlight problems, make requests for enhancements, keep track of progress, etc.

In our case, we will use issues to leave an evaluation of the service. What did you like about it? How might the service be improved? Were you able to run the service? Did you learn something new by looking at the code?

Homework 2 will consist of several stages.

Scan Repositories

Scan the available repos and identify four that you want to review. (Note: You’re not allowed to evaluate your own web service! 😃) Look carefully at the issues associated with those repos. If you see at least four evaluation issues then look for another repo that has less than four evaluation issues associated with them.

Reserve Repositories

To reserve a repo for your use in this homework, create an issue with the title “[FIRST_NAME LASTNAME]’s Evaluation”; for example “Ken Anderson’s Evaluation”.

In the initial text of that issue, simply state that you will be creating an evaluation of this repository.

Download Repositories

Next, download each of the repositories using the Download ZIP option that GitHub provides to download the most recent version of the repo’s files as a .zip file. Unpack the .zip file and start reviewing the files. Read the README. Examine the source code and the tests. Try to run the web service if possible.

Write your Evaluation

Return to the issues you created before and fill them out with your evaluation of each service. You can write the evaluation in one big post or split it across multiple posts (all in the same issue of course), whatever feels best to you. If you want to praise something the authors did, then do so! If you want to criticize something, you can do that too but be constructive in your criticism and make the critique in the knowledge that many people here are writing RESTful services for the very first time.

(Optional) Make a Pull Request

If you found a bug in one of the repositories that you reviewed or you have a simple enhancement you would like to suggest, you can optionally fork the repository, create a branch, make your changes in the branch, commit the changes (on the branch in your forked repository) and then create a pull request to report your suggestion/changes back to the original team.

If the team likes your suggestions, they may decide to merge them into their original repo.

Logistics

Your four evaluations need to be completed (as issues on four separate repositories) by 11:59 PM on Friday, February 13th. If you have questions, send them to Prof. Anderson.

Homework 1: Simple RESTful service

Write a simple RESTful service like the one that Prof. Anderson presented in lecture on Thursday. You can use the source code from the Contacts repo as inspiration or you can write a service in a programming language / framework that you know better.

Your service should have one model object (what REST calls a resource) and respond to GET (display all objects and display one object), POST (create new object), PUT (update an existing object), and DELETE (delete an existing object) requests for that resource. Your simple service should have a set of tests that test each of the endpoints that it supports and a client library that can be used to access the service. Finally, your simple service should make use of a database of some sort to store its data. Your service should be able to be launched in "test mode" or "production mode" and use the appropriate database.

You are encouraged to work in teams on this assignment. You will submit this assignment by creating a public repo on GitHub (commit early! commit often!) and adding Prof. Anderson as a collaborator on it (as well as your team members). Your service is due by 11:59 PM on Friday, January 30th.

Homework 0: GitHub User Name

Send Prof. Anderson your GitHub user name, so he can add you to the Data Engineering GitHub Organization that will, in turn, allow you to edit this wiki and provide access to the repos that we create for the class throughout the semester.

Clone this wiki locally