Lectures: Mondays and Wednesdays, 11:30 AM - 12:45 PM, White Hall 205
Instructor: Ruoxuan Xiong, Psychology and Interdisciplinary Sciences Building 581, ruoxuan.xiong@emory.edu
Office Hours: Mondays 3:00 PM - 4:00 PM, Psychology and Interdisciplinary Sciences Building 581
This course introduces students to the field of machine learning, a critical toolset for analyzing and interpreting complex data sets across diverse domains, including biology, finance, marketing, and astrophysics. Students will explore foundational modeling and prediction techniques widely used in machine learning, artificial intelligence, and data science, with a focus on both practical applications and the statistical principles underlying these methods.
Topics covered include:
Supervised Learning (Regression and Classification): Linear regression, logistic regression, K-nearest neighbors, linear and quadratic discriminant analysis, regularization methods (Ridge, LASSO, and Elastic Net), and tree-based methods.
Model Evaluation and Resampling: Bias-variance tradeoffs, cross-validation, and bootstrapping.
Unsupervised Learning: Principal components analysis, clustering, and k-means algorithms.
Advanced topics: Neural networks, transformers, diffusion models, and foundation models
By the end of this course, students will gain both theoretical knowledge and practical skills to apply machine learning techniques to real-world problems.
Week 1, W Jan 15: Introduction
Logistics, introduction of supervised machine learning (regression, tree-based methods, and neural networks) and unsupervised machine learning (clustering and principal components analysis).
Week 2, W Jan 22: Preliminaries
Parametric and nonparametric methods, training and test mean-squared error.
Week 3, M Jan 27: Bias-Variance, W Jan 29: KNN
Bias-variance decomposition.
K nearest neighbors regression and classification.
Week 4, M Feb 3: Classification, W Feb 5: LDA and QDA
Classification problem, logistic regression, generative vs discriminative methods.
Linear discriminant analysis and quadratic discriminant analysis.
Week 5, M Feb 10: Cross-Validation, W Feb 12: Bootstrap
Leave-one-out cross-validation and k-fold cross-validation.
Bootstrap
See here for the syllabus.
The goal of the course project is to prepare you for some project experience in machine learning. By the end of the project, we hope that you will have gained some hands-on experience in applying ML to a real-world problem, or learned some research frontiers in machine learning. You have two options to complete the project. The first option is to pick a dataset that interests you, and apply the knowledge we have gained this semester to analyze this dataset. The second option is to replicate a research paper and explore the possible extensions/improvements of the paper.
There is a project proposal presentation on Mar 17, 2025. Each group needs to prepare a five-minute presentation that includes all the group members (up to four students) and the topic of your group.
There are final project presentations on Apr 23, 2025 and Apr 28, 2025. Each group needs to prepare a ten-minute presentation that includes the motivation, setup, and results of the project. Before the full project presentation, we ask that you set up a publicly available GitHub repository about your work, along with detailed documentation about how to use the code repository and what findings you currently have about the project.
We expect when each group presents, other groups will provide critical feedback, which will be counted toward the participation in this course.
Finally, by May 7, 2025, refine the GitHub repository and the accompanying documentation.
See here for more details about the course project instructions and the sample list of datasets.
You are responsible for keeping up with all announcements made in class and for all changes in the schedule that are posted on the Canvas website.
The grade will be based on the following:
Homeworks: 30%
Exam (takehome, choose 24 hours): 30%
Course project report (submitted on GitHub): 20%
Course project presentations (one proposal and one final presentation): 15%
Participation: 5%
An Introduction to Statistical Learning (ISL). Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
Elements of Statistical Learning (ESL). Trevor Hastie, Rob Tibshirani, and Jerome Friedman.