- Lectures
- Homework (theoretical excercises) 30%
- Group projects (programming, real data analysis, benchmark examples) 35%
- Final exam 35%

- 1. Introduction (1 week)

- What is data science?
- Review of linear algebra

- 2. Optimization (4.5 weeks)

- Classification problems. Basics of constrained optimization. The KKT optimality conditions. The active-set method for constrained minimization problem with linear constraints. SVM: soft margins, duality, implementation, numerical issues.
- Unconstrained minimization problem for DNNs, an overview of methods for unconstrained optimization. Convergence and properties of gradient descend, gradient descend with errors, a motivation for stochastic gradient descend. Stochastic gradient descend: assumptions, lemmas, convergence theorem for fixed stepsize. Stochastic gradient descend with decreasing stepsizes, convergence theorem.
- Subsampled inexact Newton’s method. Features and components of second-order methods: scale-invariance, conjugate gradient method for obtaining search direction, backtracking line search. BFGS, L-BFGS, stochastic L-BFGS.
- Gauss-Newton and Levenberg-Marquardt methods for solving nonlinear least-squares problems. Solving a BVP for the Poisson PDE in 2D by means of NNs.
- Geometry of linear least squares problems. Tikhonov and Lasso regularization, coordinate descend.

- 3. Matrix Data and latent factor models (3 weeks)

- Examples: Latent Semantic Analysis and k-means clustering algorithm, eigenfaces, collaborative filtering.
- Ky-Fan norms. Eckart-Young-Mirsky theorem. NMF. Methods for computing NMF: projected gradient descend, Lee-Seung, coordinate descend (one entry at-a-time, hierarchical alternating least squares (HALS), alternative nonnegative least squares (ANLS)).
- Collaborative filtering and matrix completion. Nuclear norm.
- CUR matrix decomposition.

- 4. Nonlinear dimersionality reduction (3 weeks)

- Linear and Nonlinear dimensionality reduction.
- Linear methods: PCA, MDS.
- Isomap.
- Locally linear embedding (LLE).
- Student’s t-distribution stochastic neighbor embedding (t-SNE).
- Diffusion maps. Diffusion maps and Laplacian eigenmaps. Diffusion maps with renormalization parameter alpha.

- 5. Numerical methods for graph data analysis (3.5 weeks)

- Basic graph definitions.
- Breadth-first search. Depth-first search.
- Poisson random graphs. Generating functions. Random graphs with specified degree distribution.
- Scale-free random networks. Growth and preferential attachment. Vulnerability to random failures and targeted attacks.
- Site percolation in random graphs with power-law degree distribution SIR model on random graphs. Power-law degree distribution case study.
- Mining large graphs. Brin’s and Page’s PageRank, that ranks pages according to their importance.

- David Bindel, Cornell University, http://www.cs.cornell.edu/~bindel/class/sjtu-summer19/index.html
- Austin, Benson, Cornell University, http://www.cs.cornell.edu/courses/cs6241/2019sp/
- https://epubs.siam.org/doi/abs/10.1137/16M1080173

https://leon.bottou.org/publications/pdf/tr-optml-2016.pdf L. Bottow, F. Curtis, J,. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, Vol. 60, #2, pp. 223—311 - https://pdfs.semanticscholar.org/e6c4/925fb114d13a8568f88957c167c928f0c9f1.pdf?_ga=2.161164477.1061462531.1568569080-1140510733.1567996867 M. E. Newman, The Structure and Function of Complex Networks, SIAM Review, Vol. 45, #2, pp. 167—256