Data Sciences

Adapted from the Fall 2016 Data8 course at Berkeley (http://data8.org/)

Resources

Native course resources on Github at: https://github.com/data-8/data8assets
Forked with additional material at: https://github.com/lars1050/data8assets
Jupyter Notebook: http://jupyter.org/
Python datasciences library on Github at: https://github.com/data-8/datascience
 

General Approach

The course was designed as a multi-disciplinary course for non-computer science majors. The first several weeks of the course introduce programming concepts, as well as Python packages created for the course. Sprinkled among programming are concepts related to data sciences and classification. The goal over the 3 weeks of the REU DataSciences BootCamp will be to complete the three course projects, in which you will learn classification techniques. The last week of BootCamp will explore Weka, Matlab, and processing.org. The pace with which you move through this material will depend on your experience.

The textbook and lecture slides present identical material designed to prepare you for the labs. Much of the material is presented within Jupyter Notebook, but we will implement everything in native Python. Outside of lab time, it is expected that you read the textbook and/or watch the video. In class, you can implement the labs to build skills or work directly on a project.

Getting Prepared

If you are using your own laptop, then you will need to install Jupyter Notebook, as well as Python. Instructions can be found here: http://jupyter.org/install.html.

If you need help getting started with Python, try Dr. Larson's tutorial on the language: http://www-users.cs.umn.edu/~larson/repo-website-resources/website/examples/csresources/python.html

Download or clone the repo data8assets from GitHub. If you don't know git, start with Dr. Larson's tutorial: http://www-users.cs.umn.edu/~larson/repo-website-resources/website/examples/csresources/git.html

Download or clone the repo datasciences from GitHub (https://github.com/data-8/datascience)

Readings, Lectures, and Labs

The Berkeley course schedule can be found here: http://data8.org/fa16/ . Lab material can be found in the data8assets repo under materials/fa16/lab. See the readme in the forked repo (https://github.com/lars1050/data8assets/tree/gh-pages/materials/fa16) for directions on how to open these in Jupyter notebook.

Berkeley Course Schedule (see Technical Training Schedule for REU assignments and deadlines)

 
LAB
READING
VIDEO
1
lab01: Jupyter and Basic Python
Introduction, Data Science
2016-08-24
starts at 5:15
NEW: 15:33, Data Science
2
 
Causality and Experiments
2016-08-26
starts at 22:00
3
 
Programming in Python
2016-08-29
starts at 11:30
4
lab02: Modules, Arrays, Iterators
Data Types
2016-08-31
starts at 8:55
5
 
Tables
2016-09-02
starts at 16:15
6
lab03: Tables module, data queries
Tables
2016-09-07
starts at 17:50
7
 
Visualization
2016-09-09
starts at 9:20
8
 
Visualization
2016-09-12
starts at 8:05
9
lab04: Defining Functions, Histograms
Functions and Tables
2016-09-14
starts at 13:03 (Regression and Nearest Neighbor)
10
 
Functions and Tables
2016-09-16
starts at 10:00
11
 
Functions and Tables
2016-09-19
starts at 5:30
(This video is more supplemental to reading.)
12
PROJECT 1
California Water Usage
Randomness Intro, Conditional Statements
2016-09-21
starts at 11:00 (NEW)
13
 
Iteration, Monty Hall
2016-09-23
starts at 4:30 (PROJECT)
14
 
Finding Probabilities, Sampling
2016-09-26
starts at 7:30
20:40 (Probabilities)
15
 
Empirical Distributions
2016-09-28
starts at 10:40 (NEW: Sampling)
22:55 (Distributions)
16
 
Empirical Distributions
2016-09-30
17
 
Testing Hypotheses
2016-10-03
18
 
Testing Hypotheses
2016-10-05
19
lab05: Statistics
Testing Hypotheses
2016-10-07
20
 
Testing Hypotheses
2016-10-10
21
 
Estimation
2016-10-17
22
lab06: Resampling and the Bootstrap
Estimation
2016-10-19
23
 
Estimation
2016-10-21
24
 
Why the Mean Matters
2016-10-24
25
PROJECT 2
Inference and Capital Punishment
Why the Mean Matters
2016-10-26
26
 
Why the Mean Matters
2016-10-28
27   Prediction 2016-10-31
28 lab07: Regression Prediction 2016-11-02
29   Prediction 2016-11-04
30   Inference for Regression 2016-11-07
31 lab08: Age of the Universe Inference for Regression 2016-11-09
32   Classification 2016-11-14
33
Project 3
Classification
Classification 2016-11-16
34 lab09: classification discussion Comparing Two Samples 2016-11-18
35   Comparing Two Samples 2016-11-21
36   Comparing Two Samples 2016-11-28
37 lab10: Conditional Probability Updating Predictions 2016-11-30