Adapted from the Fall 2016 Data8 course at Berkeley (http://data8.org/)
Resources
General Approach
The course was designed as a multidisciplinary course for noncomputer science majors. The first several weeks of the course introduce programming concepts, as well as Python packages created for the course. Sprinkled among programming are concepts related to data sciences and classification. The goal over the 3 weeks of the REU DataSciences BootCamp will be to complete the three course projects, in which you will learn classification techniques. The last week of BootCamp will explore Weka, Matlab, and processing.org. The pace with which you move through this material will depend on your experience.
The textbook and lecture slides present identical material designed to prepare you for the labs. Much of the material is presented within Jupyter Notebook, but we will implement everything in native Python. Outside of lab time, it is expected that you read the textbook and/or watch the video. In class, you can implement the labs to build skills or work directly on a project.
Getting Prepared
If you are using your own laptop, then you will need to install Jupyter Notebook, as well as Python. Instructions can be found here: http://jupyter.org/install.html.
If you need help getting started with Python, try Dr. Larson's tutorial on the language: http://wwwusers.cs.umn.edu/~larson/repowebsiteresources/website/examples/csresources/python.html
Download or clone the repo data8assets from GitHub. If you don't know git, start with Dr. Larson's tutorial: http://wwwusers.cs.umn.edu/~larson/repowebsiteresources/website/examples/csresources/git.html
Download or clone the repo datasciences from GitHub (https://github.com/data8/datascience)
Readings, Lectures, and Labs
The Berkeley course schedule can be found here: http://data8.org/fa16/ . Lab material can be found in the data8assets repo under materials/fa16/lab. See the readme in the forked repo (https://github.com/lars1050/data8assets/tree/ghpages/materials/fa16) for directions on how to open these in Jupyter notebook.
Berkeley Course Schedule (see Technical Training Schedule for REU assignments and deadlines)
LAB

READING

VIDEO


1

lab01: Jupyter and Basic Python

Introduction, Data Science

20160824
starts at 5:15
NEW: 15:33, Data Science

2


Causality and Experiments

20160826
starts at 22:00

3


Programming in Python

20160829
starts at 11:30

4

lab02: Modules, Arrays, Iterators

Data Types

20160831
starts at 8:55

5


Tables

20160902
starts at 16:15

6

lab03: Tables module, data queries

Tables

20160907
starts at 17:50

7


Visualization

20160909
starts at 9:20

8


Visualization

20160912
starts at 8:05

9

lab04: Defining Functions, Histograms

Functions and Tables

20160914
starts at 13:03 (Regression and Nearest Neighbor)

10


Functions and Tables

20160916
starts at 10:00

11


Functions and Tables

20160919
starts at 5:30
(This video is more supplemental to reading.)

12

PROJECT 1
California Water Usage

Randomness Intro, Conditional Statements

20160921
starts at 11:00 (NEW)

13


Iteration, Monty Hall

20160923
starts at 4:30 (PROJECT)

14


Finding Probabilities, Sampling

20160926
starts at 7:30
20:40 (Probabilities)

15


Empirical Distributions

20160928
starts at 10:40 (NEW: Sampling)
22:55 (Distributions)

16


Empirical Distributions

20160930

17


Testing Hypotheses

20161003

18


Testing Hypotheses

20161005

19

lab05: Statistics

Testing Hypotheses

20161007

20


Testing Hypotheses

20161010

21


Estimation

20161017

22

lab06: Resampling and the Bootstrap

Estimation

20161019

23


Estimation

20161021

24


Why the Mean Matters

20161024

25

PROJECT 2
Inference and Capital Punishment

Why the Mean Matters

20161026

26


Why the Mean Matters

20161028

27  Prediction  20161031  
28  lab07: Regression  Prediction  20161102 
29  Prediction  20161104  
30  Inference for Regression  20161107  
31  lab08: Age of the Universe  Inference for Regression  20161109 
32  Classification  20161114  
33 
Project 3
Classification

Classification  20161116 
34  lab09: classification discussion  Comparing Two Samples  20161118 
35  Comparing Two Samples  20161121  
36  Comparing Two Samples  20161128  
37  lab10: Conditional Probability  Updating Predictions  20161130 