## Course Information

Rutgers University - Newark

21:219:329 Statistics & Machine Learning

**Instructor** - Professor Chi-Ken Lu

**Teaching Assistant** - Jose Dominguez

###### Course Description

Machine Learning is about machines making justifiable inference in various practical situations (e.g. auto-driving, computer vision, GO playing) with previously explored data. The emphasis is on formulation and justification of the inference on the basis of statistics. The course will focus on three major machine learning categories: regression, classification, and unsupervised learning. The flow of class usually starts by motivating with simple examples, followed by formulating the problem in a mathematically sensible way, and finally ending up with statistical justification of our answers. Implementation is very important for students who aim for a data science career. Programming in Python as well as computational hardware will be needed throughout the entire semester

###### Learning objectives

The course will cover the following key concepts in statistical learning: supervised versus unsupervised learning, regression, classification, validation, decision tree and random forest, support vector machine, k-means, and k-nearest-neighbor algorithm. Project topics such as natural language processing, neural network, dimension reduction, etc. will be introduced. Guest seminars on cutting edge research topics can be expected as well.

###### Prerequisite(s)

Students should have taken Calculus II or are familiar with one variable calculus (e.g. taking gradient, integration, finding extreme values, etc) before taking this course. Prior knowledge of probability and statistics would be helpful but not necessary. Experience with python programming is required.

###### Reference book and materials:

An Introduction to Statistical Learning with Application in R (James, Witten, Hastie,
Tibshirani)

A high-bias, low-variance introduction to Machine Learning for physicists (Mehta, Bukov, Wang, Day,
Richardson, Fisher, Schwab)

###### Course requirements & policies

- Reading (and watching) assignments should be completed before the associated lecture.
- Quizzes at multiple points during the class will be used to help learning. These are IMPORTANT.
- Grades will be based on in-class quizzes, homework, and the project.
- Hand in all assignments on time. Late assignments will receive a maximum of 50% credit. Assignments that are more than a week late will be marked 0 automatically
- You are expected to do your own work. Those caught cheating will receive a failing grade for the assignment and/or course. The university policy on academic integrity is available here: http://academicintegrity.rutgers.edu
- For many assignments, collaboration will be strongly encouraged. You should take advantage of these opportunities to learn from others!
- If you need accommodation for a learning disability, I require an official Letter of Accommodation from Disability Services (http://robeson.rutgers.edu/studentlife/disability.html/). This ensures that I know the best way to help you.

###### Grading

Quizzes & homework (60%). Preparation and participation are necessary to ensure that students learn well and to ensure that misconceptions and problems are identified and addressed properly. These include in-class quizzes. Homeworks will be assigned throughout the course. The goal of these is for students to use what they have learned in an integrative fashion. Discussion among students is allowed; however, sharing of materials is not (unless I explicitly say otherwise). You must turn in your own work.

Midterm (40%). A 25-question midterm will usually take place at the 10th week. It mainly covers the basic ideas in machine learning and some conceptual questions.

Project (optional). The final project will be of your own choosing (although we will give some guidelines). You may form groups of 3 to 5 for the final project. Details will be discussed in class.

###### The course will cover the following topics

- Python: numpy, pandas, jupyter notebook
- Linear regression
- Logistic regression (linear classification)
- K-nearest neighbor regression and classification
- Decision-tree regression and classification
- Bootstrapping
- Bias-variance tradeoff
- Random forest and boosting
- Principal component analysis
- Hard and soft K-means
- Support vector machine

## Course Material

Welcome to the Spring class of Statistics and Machine Learning. Before we start the class, you should have a workable laptop or personal computer for watching the lecture video and programming with python as well. Another thing is to make sure you have python on your device. Examples are done in a jupyter notebook environment, so please also make sure you can run it on your device.

View more information on Getting Started With Jupyter Notebooks

###### Tentative Course/Assignment schedule

Week 1: 01/19 -- 01/22

Week 2: 01/25 -- 01/29

Week 3: 02/01 -- 02/05

Week 4: 02/08 -- 02/12

Week 5: 02/15 -- 02/19

Week 6: 02/22 -- 02/26

Week 7: 03/01 -- 03/05

Week 8: 03/08 -- 03/12

Week 9: 03/15 -- 03/19

Week 10: 03/22 -- 03/26

Week 11: 03/29 -- 04/02

Week 12: 04/05 -- 04/09

Week 13: 04/12 -- 04/16

Week 14: 04/19 -- 04/23

Week 15: 04/26 -- 05/03