CS 194-16 Introduction to Data Science, Spring 2014 – Final Projects

The final projects for this class are to be done in groups of 3 students (or 2 with special permission). The idea is to perform an end-to-end data science project of your choosing. The idea is to exercise the entire data science lifecycle.

PROJECTS - OVERVIEW

In the project you will identify two or more data sets you would like to study. Write the code to collect and integrate those data sets, then build two or three visualizations of the data. Once you’ve got a decent feeling for the data, perform an analysis of the data to identify insights, answer questions, examine hypotheses, etc.

You should produce some interesting visualizations of the data, and develop a prototype of a data product that uses the data and the analyses.

We recommend that throughout the project you keep a diary of your successes and failures. Did you run into problems fetching the data? Coding it? Were there a lot of missing values? Were your visualizations insightful? What are your concerns about the quality of the inference you can draw. The final submission will consist of a paper document documenting your project and experiences and a presentation/demo. (details to be provided)

INITIAL PROJECT PROPOSALS (due Tuesday 3/11 at Midnight)

For the first stage, we would like you to produce a 1-2 page Initial Project Proposal. This proposal should outline:

  1. the problem you intend to address,
  2. the data you intended to use,
  3. how you plan to obtain that data,
  4. the analyses you hope to do,
  5. an idea of the software/tools you may use,
  6. the data product(s) you plan to build (e.g., recommendation system, web site, application, etc.).

We understand that these proposals are preliminary. We will meet with the project groups to discuss the proposals so that we can agree on direction and scope as well as to try to identify gotcha’s that may arise.

RESOURCES

For some inspiration, you can have a look at the slides from the presentations of the 2011 offering of 194-16 (note that the requirements were somewhat different that year)

There are lots of places collecting interesting data sets or pointers - here are a few:

Stanford 224w Page

Quandl - Find Use and Share numerical data

Amazon Public Data Sets

If you find other good sources of available data, please post to the piazza group.