AMSC 8o8N/CMSC828V

AMSC 808N/CMSC828V: Numerical Methods for Data Science and Machine Learning

Fall 2020

Instructor: Maria Cameron

A brief description: Optimization (fundamentals of constrained and unconstrained optimization, algorithms for large-scale problems, Tikhonov and lasso regularization). Matrix data and latent factor models (Ky-Fan norms, nonlinear matrix factorization, CUR decomposition, applications). Dimensionality reduction for data visualization and organization (PCA, MDS, isomap, LLE, t-SNE, diffusion maps). Graph data analysis (basic graph algorithms (DFS and BFS), random graph models, site and edge percolation, mining large graphs).

Expectations: The students are expected to have solid knowledge of linear algebra and multivariable calculus and be able to program.

Coursework:

Lectures
Homework (theoretical excercises) 30%
Group projects (programming, real data analysis, benchmark examples) 35%
Final exam 35%

1. Introduction (1 week)

What is data science?
Review of linear algebra

Slides: Introduction.pdf ReviewLinAlgebra.pdf

Lecture Notes: 1-Introduction.pdf

2. Optimization (4.5 weeks)

Classification problems. Basics of constrained optimization. The KKT optimality conditions. The active-set method for constrained minimization problem with linear constraints. SVM: soft margins, duality, implementation, numerical issues.
Unconstrained minimization problem for DNNs, an overview of methods for unconstrained optimization. Convergence and properties of gradient descend, gradient descend with errors, a motivation for stochastic gradient descend. Stochastic gradient descend: assumptions, lemmas, convergence theorem for fixed stepsize. Stochastic gradient descend with decreasing stepsizes, convergence theorem.
Subsampled inexact Newton’s method. Features and components of second-order methods: scale-invariance, conjugate gradient method for obtaining search direction, backtracking line search. BFGS, L-BFGS, stochastic L-BFGS.
Gauss-Newton and Levenberg-Marquardt methods for solving nonlinear least-squares problems. Solving a BVP for the Poisson PDE in 2D by means of NNs.
Geometry of linear least squares problems. Tikhonov and Lasso regularization, coordinate descend.

Slides: ClassificationIntro.pdf

Lecture Notes: 2-Optimization.pdf

3. Matrix Data and latent factor models (3 weeks)

Examples: Latent Semantic Analysis and k-means clustering algorithm, eigenfaces, collaborative filtering.
Ky-Fan norms. Eckart-Young-Mirsky theorem. NMF. Methods for computing NMF: projected gradient descend, Lee-Seung, coordinate descend (one entry at-a-time, hierarchical alternating least squares (HALS), alternative nonnegative least squares (ANLS)).
Collaborative filtering and matrix completion. Nuclear norm.
CUR matrix decomposition.

Slides: CUR.pdf

Lecture Notes: 3-MatrixFactorization.pdf

4. Nonlinear dimersionality reduction (3 weeks)

Linear and Nonlinear dimensionality reduction.
Linear methods: PCA, MDS.
Isomap.
Locally linear embedding (LLE).
Student’s t-distribution stochastic neighbor embedding (t-SNE).
Diffusion maps. Diffusion maps and Laplacian eigenmaps. Diffusion maps with renormalization parameter alpha.

Lecture Notes: 4-DimReduction.pdf

Homework: hw5.pdf, HW5.zip

5. Numerical methods for graph data analysis (3.5 weeks)

Basic graph definitions.
Breadth-first search. Depth-first search.
Poisson random graphs. Generating functions. Random graphs with specified degree distribution.
Scale-free random networks. Growth and preferential attachment. Vulnerability to random failures and targeted attacks.
Site percolation in random graphs with power-law degree distribution SIR model on random graphs. Power-law degree distribution case study.
Mining large graphs. Brin’s and Page’s PageRank, that ranks pages according to their importance.