Maria K. Cameron
University of Maryland, Department of Mathematics
AMSC 808N/CMSC828V: Numerical Methods for Data Science and Machine Learning
Instructor: Maria Cameron
A brief description: Optimization (fundamentals of constrained and unconstrained optimization, algorithms for large-scale problems, Tikhonov and lasso regularization). Matrix data and latent factor models (Ky-Fan norms, nonlinear matrix factorization, CUR decomposition, applications). Dimensionality reduction for data visualization and organization (PCA, MDS, isomap, LLE, t-SNE, diffusion maps). Graph data analysis (basic graph algorithms (DFS and BFS), random graph models, site and edge percolation, mining large graphs).
Expectations: The students are expected to have solid knowledge of linear algebra and multivariable calculus and be able to program.
- Homework (theoretical excercises) 30%
- Group projects (programming, real data analysis, benchmark examples) 35%
- Final exam 35%
- What is data science?
- Review of linear algebra
- 2. Optimization (4.5 weeks)
Classification problems. Basics of constrained optimization. The KKT optimality conditions.
The active-set method for constrained minimization problem with linear constraints.
SVM: soft margins, duality, implementation, numerical issues.
Unconstrained minimization problem for DNNs, an overview of methods for unconstrained optimization.
Convergence and properties of gradient descend, gradient descend with errors, a motivation for stochastic gradient descend.
Stochastic gradient descend: assumptions, lemmas, convergence theorem for fixed stepsize.
Stochastic gradient descend with decreasing stepsizes, convergence theorem.
Subsampled inexact Newton’s method.
Features and components of second-order methods: scale-invariance, conjugate gradient method for obtaining search direction, backtracking line search. BFGS, L-BFGS, stochastic L-BFGS.
Gauss-Newton and Levenberg-Marquardt methods for solving nonlinear least-squares problems.
Solving a BVP for the Poisson PDE in 2D by means of NNs.
Geometry of linear least squares problems.
Tikhonov and Lasso regularization, coordinate descend.
- 3. Matrix Data and latent factor models (3 weeks)
Examples: Latent Semantic Analysis and k-means clustering algorithm, eigenfaces, collaborative filtering.
Ky-Fan norms. Eckart-Young-Mirsky theorem. NMF.
Methods for computing NMF: projected gradient descend, Lee-Seung, coordinate descend (one entry at-a-time, hierarchical alternating least squares (HALS), alternative nonnegative least squares (ANLS)).
Collaborative filtering and matrix completion. Nuclear norm.
CUR matrix decomposition.
- 4. Nonlinear dimersionality reduction (3 weeks)
Linear and Nonlinear dimensionality reduction.
Linear methods: PCA, MDS.
Locally linear embedding (LLE).
Student’s t-distribution stochastic neighbor embedding (t-SNE).
Diffusion maps and Laplacian eigenmaps. Diffusion maps with renormalization parameter alpha.
- 5. Numerical methods for graph data analysis (3.5 weeks)
Basic graph definitions.
Breadth-first search. Depth-first search.
Poisson random graphs. Generating functions.
Random graphs with specified degree distribution.
Scale-free random networks.
Growth and preferential attachment. Vulnerability to random failures and targeted attacks.
Site percolation in random graphs with power-law degree distribution
SIR model on random graphs. Power-law degree distribution case study.
Mining large graphs. Brin’s and Page’s PageRank, that ranks pages according to their importance.
Some key references:
- David Bindel, Cornell University, http://www.cs.cornell.edu/~bindel/class/sjtu-summer19/index.html
- Austin, Benson, Cornell University, http://www.cs.cornell.edu/courses/cs6241/2019sp/
https://leon.bottou.org/publications/pdf/tr-optml-2016.pdf L. Bottow, F. Curtis, J,. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, Vol. 60, #2, pp. 223—311
- https://pdfs.semanticscholar.org/e6c4/925fb114d13a8568f88957c167c928f0c9f1.pdf?_ga=2.161164477.1061462531.1568569080-1140510733.1567996867 M. E. Newman, The Structure and Function of Complex Networks, SIAM Review, Vol. 45, #2, pp. 167—256