Handling missing values in PCA

missMDA imputes the incomplete data set in such a way that the imputed values will not have any weight on the results of PCA. Thus missMDA returns an imputed data set that can be analysed with the function PCA of FactoMineR.

Video to handle missing values in PCA

Here is the video, don't hesitate to put it in full screen:

Steps and lines of code

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. impute the data set with the imputePCA function using the number of dimensions previously calculated (by default, 2 dimensions are chosen)
  3. perform the PCA on the completed data set using the PCA function of the FactoMineR package

Example library(missMDA)
data(orange)
nb = estim_ncpPCA(orange,ncp.max=5)
res.comp = imputePCA(orange,ncp=2)
res.pca = PCA(res.comp$completeObs)

Multiple imputation in PCA

missMDA generates multiple imputed data sets for continuous data using the PCA model.

See this video from 11'07 to the end.

Steps to generate 1000 imputed datasets with missMDA

  1. estimate the number of dimensions used in the reconstruction formula with the estim_ncpPCA function
  2. generate 1000 data sets

Example library(missMDA)
data(orange)
nb = estim_ncpPCA(orange,ncp.max=5)
res.comp = MIPCA(orange, ncp = nbdim$ncp, nboot = 1000)

Visualizing uncertainties due to imputation of missing values

missMDA allows you to visualize the uncertainties generated by the multiple imputation.

Example library(missMDA)
data(orange)
nb = estim_ncpPCA(orange,ncp.max=5)
res.comp = MIPCA(orange, ncp = nbdim$ncp, nboot = 1000)
plot(res.comp)

This post gives you more information on this visualisation.