LECTURE KEYWORD SUMMARY, MULTIVARIATE STATISTICS, STAT 750, Spring 2022 ======================================================================= Lec.1, 1/24/22 Keywords to begin with: Data structure ** table n x (p+d), n=#obs, p=#Outcomes Y_i, d=#predictors X_j Data Display ** summaries of Outcomes and Predictors by variable, scatterplots of Y's vs X's Data Transformation ** linear transformation, projection, centering and rescaling, subsetting by group, conversion to ranks, other nonlinear recoding Subsetting X's -- Variable Selection Model Selection Simultaneous subsetting of X's and Y's so that groups of X's are suitable for predicting subsets of Y's (Examples: recommender systems or genomics) Statistics ** sampling distribution (theoretical) ** exact calculation of density under model ** versus Monte Carlo empirical distribution (use multivariate t or Wishart as examples) ** reference distribution under null hypothesis Univariate models ** single Y modeled conditionally given multiple X Multivariate models ** multiple outcomes Y modeled, maybe conditionally given X ================ Lec.2, 1/26/22 Data Display ** correlation (Pearson or Spearman?) pairwise within Y's , within X's Data Transformation ** conversion to ranks, other nonlinear recoding Classification/ ** groups g pre-defined via Y's, mapping to be defined as f(X) Discrimination primarily "supervised" with true labels, sometimes "semi-supervised" ================ Lec.3, 1/28/22 Matrix algebra (see Appendix with that title in Mardia, Kent and Bibby) Definitions of column-space, row-space, rank, nonnegative-definite Master result: Singular Value Decomposition, contains Spectral representation of Symmetric Nonnegative-definite (covariance) matrices Corollaries: Projection Matrices via SVD, symmetric square-root of covariance matrix verification of formulas for trace and det respectively as sum and product of eigenvalues Expression for joint density f(x) as limiting probability per unit volume for small boxes decreasing to the point x ================ Lec.4, 1/31/22 Review Jacobian change of variable formula for probability densities of smooth and smoothly invertible function Y=g(X) of random vector X with density f(x) Spherical symmetry (rotational invariance) for random vector Examples of spherically symmetric joint densities Fact: for rotationally symmetric random p-vector X, R=length(X) and X/R are indep random variables, with X/R uniformly distributed on the surface of p-dim sphere (See pdf handout 2. on this topic). ================ Lec.5, 2/2/22 Conclusion of rotational-symmetry topic; hints on Exercises Equivalent definitions of multivariate normal: via density, via ch.f., and as affine transformation for vector with iid N(0,1) entries. ================= Lec.6, 2/4/22 Run-through of properties of multivariate normal: mean , variance, independence equivalent to uncorrelatdness; generalized inverse of covariance matrix in singular case; density of multivariate normal on affine subspace in singular-covariance case; maximum-probability (or minimum-volume for fixed probability) sets as ellipsoids. ================ Lec.7, 2/7/22 Conditional density of Y given X when these random vectors are jointly multivariate normal Multivariate CLT as justifification for multivariate normal Mixtures of multivariate normal densities Maximum likelihood estimation from iid multivariate normal samples Sufficient statistics and likelihood ratio tests for the mean in multivariate-normal setting =============== Lec.8, 2/9/22 Conditional densities for one multivariate normal subvector given another =============== Lec.9, 2/11/22 Xbar and S as MLE's Formulation of multivariate normal parameter space and hypotheses =============== Lec.10, 2/14/22 Likelihood ratio test (LRT) and Wilks' Theorem LRT for null hypothesis of specified multivariate normal mean (one-sample case) with unrestricted unknown covariance matrix Wishart distribution, Malanobis distance =============== Lec.11, 2/16/22 Hotelling T^2 distribution Independence of Xbar and S based on multivariate normal data matrix Independence of weighted combinations of rows of n x p multivariate normal data matrix based on n-dimensional orthonormal vectors of weights =============== Lec.12, 2/18/22 Two-sample tests of same versus different means in sample populations with unknown unrestricted variance matrix assumed to be the same across samples R Script and demonstration of one- and two-sample tests and simulation of p-values Further distributions arising in Multivariate Normal hypothesis tests (end of Ch.3 MKB) ---------------- Lec.13, 2/21/22 Accuracy of Monte Carlo calculations of distributional percentage points and p-values Proof that T^2(p,m) Hotelling T^2 distribution is the same as ((mp/(m-p+1))* F_{p,n-p+1} ---------------- Lec.14, 2/23/22 Catalogue of hypothesis tests we obtain for multivariate normal means and variances using Likelihood Ratio Test, and also using Union Intersection Test idea Template for obtaining new hypothesis tests based on differently constrained parameters Relationship between UIT's and simultaneous confidence intervals. ------------------ Lec.15, 2/25/22 Two-sample LRT for equality of covariance matrices More on UITs and simultaneous CIs: derivations in cases ------------------ Lec.16, 2/28/22 Introduction of Multivariate Regression Model, Motivation by comparison with univariate regression models and derivation of MLEs for coefficient matrix B and outcome covariance matrix Sigma ------------------ Lec.17, 3/2/22 Demonstration that B-hat and residual-matrix U-hat are independent in multivariate-normal regression model, and verification of Wishart distribution for Sigma-hat Computational demonstration of model fitting and hypothesis tests for correlation between outcome variables in multivariate regression, and of relation between conditional distribution of residuals (one column given others) and copmprehensive univariate regression model for one column $Y^{(j)}$ in terms of X and of other outcome columns $Y^{(-j)}$ ------------------ Lec.18, 3/4/22 Completion of Ch.6 MKB: covered Sec 6.3 through 6.3.1 LRT hypothesis test for C1 B M1 = D uin multivariate regression plus: Multiple Correlation, Partial Correlation ------------------ Lec.19, 3/7/22 MANOVA as regression, LRT with Wilks' Lambda def'n of Pillai's Trace as alternative ------------------- Lec.20, 3/9/22 MANOVA table demonstration in R discussion of Wilks Lambda and relationship to product of independent Beta's (Thm 3.7.3) and approximation in cases k=2 or 3 by F's -------------------- Lec.21, 3/11/22 Brief discussion of sample test review problems Introduction to Ideal Principal Components (ie, the principal- component eigenspaces fo the true variance matrix Sigma) --------------------- Lec.22, 3/14/22 Discussion of HW problem (II) extact T^2(p-1,n-1) distribution using alternate representation of H0: mu proportional to mu_0 as R mu = 0 , where R (px(p-1)) has rows forming an orthonormal basis for {mu_0}-orthcomplement (cf. MKB, pp.132-133) Extended discussion/hints on problems of Sample Test --------------------- Lec.23, 3/16/22 Further discussion on sample test & review for in-class test Further introduction to PCA: sample principal components, general properties, and Principal Component regression ------------------TEST ON 3/18/22 Lec.24, 3/28/22 Discussion of test solutions and further definitions concerning principal components. ------------------ Lec.25, 3/30/22 Illustration of PC software and R calculations "from scratch" on Boston Housing data in R Script PrinCompBHous.RLog. ------------------ Lec.26, 4/1/22 Large sample theory for estimates of PCs. PC regression to reduce dimensionality of an outcome dataset. Use of PCs of a variable-set as predictive variables for a different outcome. ------------------ Lec.27, 4/4/22 Introduction of Factor Analysis model. Nonidentifiability due to orthogonal rotations of loadings. Side condtions (several different versions) to restore identifiability. Orthogonal-column loadings as one possible side condition for identifiable loadings matrix. ------------------ Lec.28, 4/6/22 Illustration of Factor Analysis R-functions in the 5-company stock-returns example (#9.4 in Johnson & Wichern) including interpretation of loadings in FactorExmp.RLog script. ------------------ Lec.29, 4/8/22 Principal Factor Method (3 versions: using correlation matrix R in place of S) (i) direct use of PCs with top-k eigenvectors of S as loadings, then Psi as diag (matrix residual) (ii) estimate communalities via max correlations (of j'th variable on others), then Psi, then Lambda via spectral decomposition of R - Psi. (iii) same plan as (ii) but communalities estimated via multiple corr of j'th variable on others. Contrasted these approximate "principal factor methods" with MLE Factor Model estimates, used in formal goodness of fit test for model. ------------------ Lec.30, 4/11/22 Introduction of EM Algorithm & Woodbury Identity for Factor Analysis MLE Calculation via EM following C. Bishop book's Chapter 12, esp. Sec.12.4 ------------------ Lec.31, 4/13/22 Completion of EM Algorithm implementation for Factor Analysis (Rubin & Thayer 1982) Computational Illustration on 103x5 stock-returns dataset of LRT Goodness of Fit test for "Probabilistic PCA" which is the factor model with Psi = sigma^2 * I_{pxp} ------------------ Lec.32, 4/15/22 Canonical Correlation, motivation and linear-algebra solution including goodness-of-fit test (under normality) for independence of X, Y ------------------ Lec.33, 4/18/22 Introduction/overview of clustering from all 3 books 1. Model-based a. Mixture and label-identifier models b. Density Estimation 2. Criterion/ Algorithm-based 3. Hierarchical Agglomerative/Divisive 4. Other (particularly, Spectral Clustering) Clustering ** grouping or rule-based subsetting, with the general objective (subsetting Y's) that Y observations within group are more alike (homogeneous) than observations across groups, primarily "unsupervised" without labels, sometimes "semi-supervised" ------------ Lec.34, 4/20 Clustering, continued Software (library cluster for hierarchical, kmeans, mcluster for mixture models) Dendrogram data representations Illustration of clustering and "confusion" matrices for (sample from) iris data where the true species-based clusters are known. ------------- Lec.35, 4/22 More on clustering Further discussion of the IrisCluster.RLog script showing the software implementation and interpretation of clusters from methods kmeans, agnes, diana, mclust. The discussion is enriched by the model-based clustering analyses (with parametric mixture-of-normal models). General Question: how to assess clustering reliability or quality. Introduce idea of clustering data-sample replicates ("bootstrapping clusters") to assess reliability. ----------- Lec.36, 4/25 Bootstrapping -- in general and in Clustering Nonparametric vs parametric bootstrapping. Intermediate case of bootstrapping from a "parametric density" defined from a kernel-density estimator defined from observed data. Some illustration using the R Script BootMultivar.RLog. ----------- Lec.37, 4/27 More discussion of bootstrapping specifically related to clustering, using confusion matrices and metrics like Sensitivity and Positive Predictive Value. Further R Script illustration using iris data, cf. BootClus.RLog. Lec.38, 4/29 Illustration of the bootstrapping of clustering with the R Script Lec.39, 5/2 Kernel methods -- intro of kernels, basic theory Lec.40, 5/4 Kernel clustering methods -- radial basis function (Gaussian) kernel and variants Lec.41, 5/6 More on Kernel clustering, including bootstrapping of the kernel-based clustering, using script involving iris data. Lec.42, 5/9 Kernel PCA -- with script illustration, KernelMethods.RLog. --------------- Eventually we left off the Sparse PCA topics, and several students did them for final projects: Sparse PCA, Simultaneous PCA, regularization of PCA in high dimensions